evlib 0.8.6 - Docs.rs

# Zero-Copy Architecture: Technical Deep Dive

## Overview

evlib achieves exceptional performance through a "zero-copy" architecture that eliminates intermediate data structure copies and leverages Apache Arrow's columnar memory format. This document explains the technical implementation and performance benefits.

## What is "Zero-Copy" in evlib Context

The term "zero-copy" in evlib refers to **eliminating intermediate data structure copies**, not true zero-copy from disk to final format. We achieve this through direct construction of the final data format in a single pass.

### Before: Multi-Copy Architecture

```rust
// OLD APPROACH - Multiple copies and conversions
let events: Vec<Event> = load_from_file()?;           // Copy 1: File → Event structs
let arrays = events_to_numpy_arrays(&events)?;       // Copy 2: Event → NumPy arrays
let dict = numpy_to_python_dict(arrays)?;            // Copy 3: NumPy → Python dict
let dataframe = polars_from_dict(dict)?;             // Copy 4: Dict → Polars DataFrame
```

**Problems:**
- 4 separate memory allocations
- Intermediate Python objects
- Type conversions at each step
- Peak memory = 4x final data size

### After: Direct Construction Architecture

```rust
// NEW APPROACH - Single iteration, direct construction
fn build_polars_dataframe(events: &[Event], format: EventFormat) -> Result<DataFrame, PolarsError> {
    let len = events.len();

    // Pre-allocate builders with exact capacity
    let mut x_builder = PrimitiveChunkedBuilder::<Int16Type>::new("x", len);
    let mut y_builder = PrimitiveChunkedBuilder::<Int16Type>::new("y", len);
    let mut timestamp_builder = PrimitiveChunkedBuilder::<Int64Type>::new("timestamp", len);
    let mut polarity_builder = PrimitiveChunkedBuilder::<Int8Type>::new("polarity", len);

    // SINGLE ITERATION - Direct population, no intermediate structures
    for event in events {
        x_builder.append_value(event.x as i16);
        y_builder.append_value(event.y as i16);
        timestamp_builder.append_value(convert_timestamp(event.t));
        polarity_builder.append_value(convert_polarity(event.polarity, &format));
    }

    // Build final DataFrame directly from builders
    DataFrame::new(vec![
        x_builder.finish().into_series(),
        y_builder.finish().into_series(),
        timestamp_builder.finish().into_series().cast(&DataType::Duration(TimeUnit::Microseconds))?,
        polarity_builder.finish().into_series(),
    ])
}
```

## Apache Arrow: The Foundation Technology

### Why Arrow Matters

Polars uses Apache Arrow as its foundational columnar memory format, which enables our performance optimizations:

```rust
// Under the hood, Polars Series are Arrow Arrays
pub struct Series {
    inner: Arc<dyn Array>,  // This is an Arrow Array!
}

// Our builders create Arrow arrays directly
let mut builder = PrimitiveChunkedBuilder::<Int16Type>::new("x", len);
// This becomes an Arrow PrimitiveArray<Int16Type>
```

### Columnar Memory Layout

```
Traditional Row Format (what we avoided):
[x1,y1,t1,p1][x2,y2,t2,p2][x3,y3,t3,p3]...

Arrow Columnar Format (what we build directly):
X Column: [x1,x2,x3,x4,x5,...]  <- Contiguous memory
Y Column: [y1,y2,y3,y4,y5,...]  <- Contiguous memory
T Column: [t1,t2,t3,t4,t5,...]  <- Contiguous memory
P Column: [p1,p2,p3,p4,p5,...]  <- Contiguous memory
```

### Arrow Memory Efficiency

```
Arrow Array Structure:
┌─────────────┬──────────────┬─────────────┐
│   Metadata  │  Null Bitmap │    Data     │
│   (bytes)   │   (bits)     │   (typed)   │
└─────────────┴──────────────┴─────────────┘

For 1M Int16 values:
- Metadata: ~100 bytes
- Null bitmap: 125KB (1 bit per value)
- Data: 2MB (2 bytes × 1M values)
- Total: ~2.125MB = ~2.2 bytes per value overhead
```

## Key Technologies and Optimizations

### 1. Polars Series Builders

```rust
// Direct memory management without intermediate allocations
let mut builder = PrimitiveChunkedBuilder::<Int16Type>::new("x", capacity);
for value in data {
    builder.append_value(value);  // Direct write to pre-allocated buffer
}
let series = builder.finish();  // Zero-copy conversion to Series
```

**Technology**: Polars `ChunkedBuilder` API allows direct construction of columnar data structures.

### 2. Memory Pre-allocation

```rust
// We know exact size upfront - no reallocations
let len = events.len();  // Known from file parsing
let mut builder = PrimitiveChunkedBuilder::<Int16Type>::new("x", len);  // Pre-allocate exact size
```

**Technology**: Rust's memory management + knowing exact event count allows single allocation.

### 3. Optimal Data Types

```rust
// Memory-efficient types chosen specifically
Int16Type  // x, y coordinates (was Int64 - 4x smaller)
Int8Type   // polarity (was Int64 - 8x smaller)
Int64Type  // timestamp (appropriate size)
```

**Technology**: Polars typed builders allow choosing optimal memory layout.

### 4. Single-Pass Processing

```rust
// ONE iteration over data, populate ALL columns simultaneously
for event in events {
    x_builder.append_value(event.x as i16);      // Direct write
    y_builder.append_value(event.y as i16);      // Direct write
    timestamp_builder.append_value(event.t);     // Direct write
    polarity_builder.append_value(event.p);      // Direct write
}
```

**Technology**: Columnar processing - build all columns in parallel, single iteration.

## Performance Impact

### Memory Efficiency Breakdown

```rust
// Old approach (all Int64)
struct EventOld {
    x: i64,        // 8 bytes
    y: i64,        // 8 bytes
    t: i64,        // 8 bytes
    p: i64,        // 8 bytes
}               // Total: 32 bytes per event

// New approach (optimized types)
struct EventNew {
    x: i16,        // 2 bytes
    y: i16,        // 2 bytes
    t: i64,        // 8 bytes (timestamp needs precision)
    p: i8,         // 1 byte
}               // Total: 13 bytes per event (60% reduction)
```

### Memory Layout Optimization

```
OLD: Event → NumPy → Dict → Polars
     [32B]   [32B]   [64B]  [32B] = 160 bytes/event peak

NEW: Event → Polars (direct)
     [32B]   [13B] = 45 bytes/event peak (3.5x improvement)
```

### CPU Cache Efficiency

```
Arrow Columnar (Cache-Friendly):
When filtering by polarity, only touch polarity column:
[p1][p2][p3][p4]... <- Sequential access, stays in CPU cache

Row Format (Cache-Unfriendly):
[x1,y1,t1,p1][x2,y2,t2,p2]... <- Skip x,y,t to get p, cache misses
```

## Arrow Ecosystem Compatibility

### Zero-Copy Between Arrow Systems

```python
# Your Polars DataFrame can zero-copy to other systems through Arrow format
import polars as pl
import pandas as pd

df = evlib.load_events("data/slider_depth/events.txt").collect()

# Efficient conversions through Arrow format (requires pyarrow)
try:
    arrow_table = df.to_arrow()         # Zero-copy Polars → PyArrow
    pandas_df = arrow_table.to_pandas() # Zero-copy PyArrow → Pandas
    print("Arrow conversion successful")
except ImportError:
    print("PyArrow not installed, converting to numpy arrays instead")
    # Convert to numpy arrays as an alternative
    x_array = df['x'].to_numpy()
    y_array = df['y'].to_numpy()
    t_array = df['timestamp'].dt.total_seconds().to_numpy()
    p_array = df['polarity'].to_numpy()
    print(f"Converted to numpy arrays: {len(x_array)} events")
```

### SIMD Vectorization

```rust
// Arrow enables SIMD operations on contiguous data
let polarity_mask = polarity_array.eq_scalar(1);  // Vectorized comparison
let filtered = x_array.filter(&polarity_mask);    // Vectorized filtering
```

## Performance Results

### Achieved Metrics

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| **Memory per event** | ~200+ bytes | 35.8 bytes | **5.6x reduction** |
| **Loading speed** | ~600k events/s | 2.18M events/s | **3.6x faster** |
| **Filter speed** | ~50M events/s | 463M events/s | **9.3x faster** |

### Why We Achieve 35 bytes/event

```
Arrow overhead per column:
- Array metadata: ~100 bytes
- Null bitmap: len/8 bytes
- Data buffer: len × sizeof(type)

For 1M events with 4 columns:
- Metadata: 4 × 100 = 400 bytes ≈ 0 bytes/event
- Null bitmaps: 4 × 125KB = 500KB ≈ 0.5 bytes/event
- Data: 2+2+8+1 = 13 bytes/event
- Arrow overhead: ~0.5 bytes/event
- Total: ~13.5 bytes/event for pure data

Our measured 35 bytes/event includes:
- Arrow data: ~13.5 bytes
- Rust Vec overhead: ~8 bytes
- Python object overhead: ~10 bytes
- Memory fragmentation: ~3.5 bytes
```

## Implementation Details

### Complete Arrow Pipeline

```rust
// 1. Parse file into Event structs (unavoidable copy from disk)
let events: Vec<Event> = parse_file()?;

// 2. Build Arrow arrays directly via Polars builders
let mut x_builder = PrimitiveChunkedBuilder::<Int16Type>::new("x", len);
// Under the hood: creates Arrow PrimitiveArrayBuilder<Int16Type>

// 3. Single iteration populates Arrow buffers
for event in events {
    x_builder.append_value(event.x as i16);  // Direct write to Arrow buffer
}

// 4. Finish creates Arrow Array wrapped in Polars Series
let x_series = x_builder.finish();  // Arrow Array + Polars metadata

// 5. DataFrame is collection of Arrow Arrays
let df = DataFrame::new(vec![x_series, y_series, t_series, p_series])?;
```

### PyO3 Integration

```rust
// Return DataFrame directly to Python, no dict conversion
#[pyfunction]
pub fn load_events_py(file_path: &str) -> PyResult<PyObject> {
    let events = load_events(file_path)?;
    let df = build_polars_dataframe(&events, format)?;  // Direct DataFrame

    // Convert to Python LazyFrame directly
    let py_dict = df.lazy().to_python_dict()?;  // Single conversion step
    Ok(py_dict)
}
```

## Conclusion

The "zero-copy" architecture in evlib leverages Apache Arrow's columnar memory format to:

1. **Eliminate intermediate copies** through direct construction
2. **Optimize memory layout** with appropriate data types
3. **Enable vectorized operations** through contiguous memory
4. **Provide ecosystem compatibility** with Arrow-based tools
5. **Achieve exceptional performance** with minimal memory overhead

This architecture provides the foundation for evlib's industry-leading performance while maintaining full API compatibility and ease of use.

## Further Reading

- [Apache Arrow Documentation](https://arrow.apache.org/docs/)
- [Polars Architecture](https://pola-rs.github.io/polars-book/user-guide/concepts/data-types/)
- [evlib Performance Benchmarks](../examples/benchmarks.md)
- [Memory Optimization Guide](../getting-started/performance.md)