pandrs 0.3.0

A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
# JIT Compilation in pandrs

This document provides a comprehensive guide to using Just-In-Time (JIT) compilation features in the pandrs library, which offer Numba-like functionality for accelerating data operations.

## Introduction

Just-In-Time (JIT) compilation is a technique that compiles code at runtime, just before execution, rather than ahead of time. This allows for both the flexibility of interpreted code and the performance of compiled code. In the Python ecosystem, Numba provides JIT compilation for numerical Python code, significantly accelerating operations on NumPy arrays and pandas DataFrames.

pandrs implements a similar JIT compilation system for Rust, allowing users to:

1. Accelerate performance-critical operations, especially custom aggregation functions
2. Create reusable JIT-compiled functions
3. Apply these functions to GroupBy operations
4. Achieve performance benefits similar to writing optimized Rust code, with the flexibility of runtime-defined functions

## Getting Started

### Enabling JIT Compilation

JIT compilation is an optional feature in pandrs. To enable it, add the `jit` feature flag when building:

```bash
cargo build --features jit
```

Or when running an example:

```bash
cargo run --example jit_aggregation_example --features jit
```

### Basic Usage

Here's a simple example of using JIT compilation with GroupBy operations:

```rust
use pandrs::optimized::dataframe::OptimizedDataFrame;
use pandrs::optimized::jit::{jit, GroupByJitExt};
use pandrs::core::error::Result;

fn main() -> Result<()> {
    // Create a dataframe
    let mut df = OptimizedDataFrame::new();
    // ... add data ...
    
    // Group by a column
    let grouped = df.group_by(&["category"])?;
    
    // Use built-in JIT-compiled operations
    let result1 = grouped.sum_jit("value", "sum_value")?;
    let result2 = grouped.mean_jit("value", "mean_value")?;
    
    // Create a custom JIT function
    let weighted_mean = jit("weighted_mean", |values: Vec<f64>| -> f64 {
        if values.is_empty() {
            return 0.0;
        }
        
        let mut weighted_sum = 0.0;
        let mut weight_sum = 0.0;
        
        for (i, val) in values.iter().enumerate() {
            let weight = (i + 1) as f64;
            weighted_sum += val * weight;
            weight_sum += weight;
        }
        
        weighted_sum / weight_sum
    });
    
    // Apply the custom function
    let result3 = grouped.aggregate_jit("value", weighted_mean, "weighted_mean")?;
    
    Ok(())
}
```

## Core Concepts

### JIT Functions

A JIT function in pandrs is a function that:

1. Takes a vector of values (supports `Vec<f64>`, `Vec<f32>`, `Vec<i64>`, `Vec<i32>`)
2. Returns a single value (of the corresponding type)
3. Can be compiled at runtime for optimal performance
4. Falls back to a native Rust implementation when JIT is disabled

JIT functions are created using type-specific constructors for type safety:

```rust
// For f64 values
let f64_function = jit_f64("function_name", |values: Vec<f64>| -> f64 {
    // Your implementation here
});

// For f32 values
let f32_function = jit_f32("function_name", |values: Vec<f32>| -> f32 {
    // Your implementation here
});

// For i64 values
let i64_function = jit_i64("function_name", |values: Vec<i64>| -> i64 {
    // Your implementation here
});

// For i32 values
let i32_function = jit_i32("function_name", |values: Vec<i32>| -> i32 {
    // Your implementation here
});
```

For backward compatibility, the simplified `jit` function is also available for f64 values:

```rust
let my_function = jit("function_name", |values: Vec<f64>| -> f64 {
    // Your implementation here
});
```

### Built-in JIT Operations

pandrs provides several built-in JIT operations in the `array_ops` module:

- `sum()` - Calculate the sum of values
- `mean()` - Calculate the arithmetic mean
- `std(ddof)` - Calculate standard deviation with degrees of freedom
- `var(ddof)` - Calculate variance with degrees of freedom
- `min()` - Find the minimum value
- `max()` - Find the maximum value
- `median()` - Calculate the median value
- `quantile(q)` - Calculate a specific quantile
- `count()` - Count the number of values
- `count_non_nan()` - Count non-NaN values
- `prod()` - Calculate the product of all values
- `first()` - Get the first value
- `last()` - Get the last value
- `trimmed_mean(trim_fraction)` - Calculate mean after removing outliers
- `skew()` - Calculate skewness (3rd moment)
- `kurt()` - Calculate kurtosis (4th moment)
- `abs_diff()` - Calculate absolute difference (max - min)
- `iqr()` - Calculate interquartile range
- `weighted_avg()` - Calculate weighted average

Example usage:

```rust
use pandrs::optimized::jit::array_ops;

let sum_fn = array_ops::sum();
let mean_fn = array_ops::mean();
let std_fn = array_ops::std(1); // ddof=1 for sample standard deviation
```

### GroupBy JIT Extensions

The `GroupByJitExt` trait extends GroupBy with JIT capabilities:

```rust
use pandrs::optimized::jit::GroupByJitExt;

// Group by a column
let grouped = df.group_by(&["category"])?;

// Use JIT-compiled operations
let result1 = grouped.sum_jit("value", "sum_value")?;
let result2 = grouped.mean_jit("value", "mean_value")?;
let result3 = grouped.std_jit("value", "std_value")?;

// Use parallel execution for large groups
let parallel_result = grouped.parallel_sum_jit("value", "parallel_sum", None)?;
```

Key methods include:

- `aggregate_jit(column, jit_fn, alias)` - Apply a JIT function to a column
- `sum_jit(column, alias)` - Sum a column using JIT
- `mean_jit(column, alias)` - Calculate mean using JIT
- `std_jit(column, alias)` - Calculate standard deviation using JIT
- `var_jit(column, alias)` - Calculate variance using JIT
- `min_jit(column, alias)` - Find minimum using JIT
- `max_jit(column, alias)` - Find maximum using JIT
- `median_jit(column, alias)` - Calculate median using JIT
- `aggregate_multi_jit(aggregations)` - Apply multiple JIT functions at once
- `parallel_sum_jit(column, alias, config)` - Sum a column using parallel JIT
- `parallel_mean_jit(column, alias, config)` - Calculate mean using parallel JIT
- `parallel_std_jit(column, alias, config)` - Calculate standard deviation using parallel JIT
- `parallel_min_jit(column, alias, config)` - Find minimum using parallel JIT
- `parallel_max_jit(column, alias, config)` - Find maximum using parallel JIT

## Advanced Usage

### Multiple JIT Aggregations

You can apply multiple JIT functions in a single operation:

```rust
let result = grouped.aggregate_multi_jit(vec![
    ("value".to_string(), array_ops::sum(), "sum_value".to_string()),
    ("value".to_string(), array_ops::mean(), "mean_value".to_string()),
    ("value".to_string(), array_ops::std(1), "std_value".to_string()),
]);
```

### Custom Functions

Creating custom JIT functions allows you to implement specialized aggregations:

```rust
// Calculate trimmed mean (removing outliers)
let trimmed_mean = jit("trimmed_mean", |values: Vec<f64>| -> f64 {
    if values.len() <= 2 {
        return values.iter().sum::<f64>() / values.len() as f64;
    }
    
    let mut sorted = values;
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
    
    // Remove the lowest and highest values
    let trimmed = &sorted[1..sorted.len() - 1];
    trimmed.iter().sum::<f64>() / trimmed.len() as f64
});

// Apply it
let result = grouped.aggregate_jit("value", trimmed_mean, "trimmed_mean")?;
```

### Benchmarking

The `benchmark` module provides tools for measuring and comparing the performance of JIT-compiled functions:

```rust
use pandrs::optimized::jit::benchmark;

// Create a JIT function
let sum_fn = jit("sum", |values: Vec<f64>| -> f64 {
    values.iter().sum()
});

// Benchmark against different input sizes
let result = benchmark::benchmark_suite(
    vec![("sum".to_string(), sum_fn)],
    vec![100, 1000, 10000],
    100 // iterations
);

// Print results
for r in result {
    println!("{}", r);
}
```

## Implementation Details

### JIT Compilation Process

1. When a JIT function is created, it stores a native Rust implementation
2. When the `jit` feature is enabled, it attempts to compile the function using Cranelift
3. On execution, it uses the JIT-compiled version if available, falling back to the native implementation if not
4. Performance statistics are collected for analysis

### Feature Flags

The JIT system is modular and controlled via feature flags:

- `jit` - Enables JIT compilation (requires Cranelift dependencies)

To use JIT compilation, you must enable this feature:

```bash
cargo build --features jit
```

### Performance Considerations

- JIT compilation has overhead, so it's most beneficial for functions that:
  - Are called repeatedly with different inputs
  - Perform complex operations
  - Process large datasets
- Small, simple operations may not see significant benefits
- The first execution includes compilation time
- Subsequent executions benefit from cached compiled code

## Best Practices

1. **Use built-in operations when possible**: The built-in JIT operations are already optimized.

2. **Reuse JIT functions**: Create JIT functions once and reuse them to avoid recompilation.

3. **Benchmark before optimizing**: Use the benchmarking tools to identify bottlenecks.

4. **Consider input size**: JIT compilation benefits increase with larger inputs.

5. **Provide meaningful names**: Name your JIT functions for better debugging and benchmarking.

6. **Fall back gracefully**: Always ensure your code works even when JIT is disabled.

## Examples

See the examples directory for complete examples:

- `jit_aggregation_example.rs` - Demonstrates basic JIT usage with GroupBy
- `jit_function_example.rs` - Shows creating and using custom JIT functions
- `jit_benchmark_example.rs` - Demonstrates benchmarking JIT performance
- `jit_types_example.rs` - Shows using different numeric types with JIT functions
- `jit_simd_example.rs` - Demonstrates SIMD vectorization for JIT functions
- `jit_parallel_example.rs` - Shows parallel execution for improved performance
- `jit_parallel_groupby_example.rs` - Demonstrates parallel JIT operations with GroupBy

## Type System

The JIT compilation system in pandrs supports multiple numeric types through a flexible type system:

1. **Type-Specific JIT Functions:**
   - `jit_f64`: For 64-bit floating point operations
   - `jit_f32`: For 32-bit floating point operations
   - `jit_i64`: For 64-bit integer operations
   - `jit_i32`: For 32-bit integer operations

2. **Type Traits:**
   - `JitType`: Marker trait for types that can be used in JIT functions
   - `JitNumeric`: Trait for numeric operations in JIT functions

3. **Type-Erased Values:**
   - `TypedVector`: Type-safe vector for JIT operations
   - `NumericValue`: Type-erased numeric value for interoperability

4. **Generic JIT Functions:**
   - `GenericJitFunction`: Type-parameterized JIT function
   - Automatic type conversion between compatible types

### Example of Type-Parameterized JIT Functions

```rust
// Create JIT functions for different types
let f64_sum = jit_f64("f64_sum", |values: Vec<f64>| -> f64 {
    values.iter().sum()
});

let i32_sum = jit_i32("i32_sum", |values: Vec<i32>| -> i32 {
    values.iter().sum()
});

// Use with appropriate types
let f64_result = f64_sum.execute(vec![1.5, 2.5, 3.5]);
let i32_result = i32_sum.execute(vec![1, 2, 3]);
```

## SIMD Vectorization

The JIT system supports SIMD (Single Instruction, Multiple Data) vectorization for improved performance on modern CPUs. This allows operations to process multiple data points in parallel using CPU vector instructions.

### SIMD JIT Functions

SIMD-accelerated JIT functions can be created using the following helper functions:

```rust
// Create SIMD-accelerated sum functions
let simd_sum_f32 = simd_sum_f32(); // For f32 arrays
let simd_sum_f64 = simd_sum_f64(); // For f64 arrays

// Create SIMD-accelerated mean functions
let simd_mean_f32 = simd_mean_f32(); // For f32 arrays
let simd_mean_f64 = simd_mean_f64(); // For f64 arrays
```

### Auto-Vectorization

You can automatically vectorize custom functions using the `auto_vectorize` helper:

```rust
// Define a custom function
let squared_sum = |values: Vec<f64>| -> f64 {
    values.iter().map(|x| x * x).sum()
};

// Create an auto-vectorized version
let vectorized_fn = auto_vectorize("squared_sum", squared_sum);

// Use the vectorized function
let result = vectorized_fn.execute(data);
```

### Performance Benefits

SIMD vectorization can provide significant performance improvements, especially for large arrays and compute-intensive operations. The exact speedup depends on:

- The operation being performed
- The size of the input data
- The CPU architecture and SIMD instruction set available
- The memory access patterns

Typical speedups range from 2x to 4x for basic operations like sum and mean.

## Parallel Execution

The JIT system supports parallel execution for improved performance on multi-core systems. This allows operations to process data in parallel using multiple CPU cores through Rayon.

### Parallel JIT Functions

Parallel JIT functions can be created using the following helper functions:

```rust
// Default configuration
let parallel_sum = parallel_sum_f64(None);
let parallel_mean = parallel_mean_f64(None);
let parallel_std = parallel_std_f64(None);
let parallel_min = parallel_min_f64(None);
let parallel_max = parallel_max_f64(None);

// Custom configuration
let config = ParallelConfig::new()
    .with_min_chunk_size(10000)   // Set minimum chunk size
    .with_max_threads(4)          // Set maximum number of threads
    .with_thread_local(true);     // Use thread-local storage

let parallel_sum_custom = parallel_sum_f64(Some(config));
```

### Custom Parallel Functions

You can create custom parallel functions using the map-reduce pattern:

```rust
// Define the sequential function
let squared_sum = |values: Vec<f64>| -> f64 {
    values.iter().map(|x| x * x).sum()
};

// Define map function (processes each chunk)
let map_fn = |chunk: &[f64]| -> f64 {
    chunk.iter().map(|x| x * x).sum()
};

// Define reduce function (combines results)
let reduce_fn = |results: Vec<f64>| -> f64 {
    results.iter().sum()
};

// Create the parallel function
let parallel_squared_sum = parallel_custom(
    "parallel_squared_sum",
    squared_sum,
    map_fn,
    reduce_fn,
    None  // Default configuration
);
```

### Performance Benefits

Parallel execution can provide significant performance improvements, especially for large datasets and compute-intensive operations. The benefits scale with:

- The number of CPU cores available
- The size of the input data
- The complexity of the operation
- The balance between computation and memory access

Typical speedups range from 2x to 8x on modern multi-core systems, depending on the operation and data size.

### Tuning Parallel Performance

The `ParallelConfig` object allows fine-tuning of parallel execution:

- `min_chunk_size`: Minimum size for each parallel chunk (smaller chunks = more parallelism but higher overhead)
- `max_threads`: Maximum number of threads to use (can be used to limit CPU usage)
- `use_thread_local`: Whether to use thread-local storage for intermediate results

```rust
// Example of tuning parallel performance
let config = ParallelConfig::new()
    .with_min_chunk_size(100000)  // Larger chunks for fewer thread switches
    .with_max_threads(6);         // Limit to 6 threads even if more cores available

let tuned_parallel_sum = parallel_sum_f64(Some(config));
```

## Limitations and Future Work

Current limitations:

- Limited to numeric types (f64, f32, i64, i32)
- Limited error handling for JIT compilation failures
- Basic SIMD and parallel support

Planned improvements:

- Support for more data types (bool, string, etc.)
- Advanced SIMD optimizations
- Improved error handling and debugging
- Integration with GPU acceleration
- Memory optimization for large datasets
- Integration with actual JIT compilers beyond Cranelift

## Comparison with Python's Numba

| Feature | pandrs JIT | Python Numba |
|---------|------------|--------------|
| Input Types | Vec<f64>, Vec<f32>, Vec<i64>, Vec<i32> | numpy.ndarray (any dtype) |
| Output Types | f64, f32, i64, i32 | Any Python type |
| Compilation Backend | Cranelift | LLVM |
| Parallel Execution | Supported | Supported |
| SIMD Vectorization | Supported | Supported |
| GPU Acceleration | Planned | Supported |
| Ease of Use | Simple decorator-like API | Python decorator |
| Fallback | Native Rust implementation | Python interpreter |
| Performance | Very fast | Very fast |