evlib 0.8.2 - Docs.rs

# DuckDB Integration with evlib

This document demonstrates how to use evlib's zero-copy Arrow integration with DuckDB for high-performance event data analysis.

## Overview

evlib provides seamless integration with the Apache Arrow ecosystem through zero-copy data transfer. This enables efficient interoperability with analytics engines like DuckDB, allowing you to perform complex SQL queries on event camera data without expensive data copies.

## Basic Usage

### Loading Events to Arrow Format

```python
import evlib
import polars as pl

# Load events as Polars DataFrame (Arrow integration coming soon)
events_df = evlib.load_events("data/slider_depth/events.txt")
# Convert to Arrow for DuckDB integration
events = events_df.collect().to_arrow()
```

### DuckDB Integration

```python
import evlib
import duckdb

# Load events and convert to Arrow format for DuckDB
events_df = evlib.load_events("data/slider_depth/events.txt")

# Register with DuckDB
con = duckdb.connect()
con.register("events", events_df.collect().to_arrow())

# Basic queries
print("=== Basic Event Statistics ===")

# Total event count
total_events = con.execute("SELECT COUNT(*) as total_events FROM events").fetchone()[0]
print(f"Total events: {total_events:,}")

# Event distribution by polarity
polarity_dist = con.execute("""
    SELECT polarity, COUNT(*) as count,
           ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
    FROM events
    GROUP BY polarity
    ORDER BY polarity
""").fetchall()

print(f"\\nPolarity distribution:")
for pol, count, pct in polarity_dist:
    print(f"  Polarity {pol}: {count:,} events ({pct}%)")

# Temporal statistics
temporal_stats = con.execute("""
    SELECT
        MIN(EXTRACT(microseconds FROM t)) as min_timestamp,
        MAX(EXTRACT(microseconds FROM t)) as max_timestamp,
        AVG(EXTRACT(microseconds FROM t)) as avg_timestamp,
        STDDEV(EXTRACT(microseconds FROM t)) as std_timestamp
    FROM events
""").fetchone()

print(f"\\nTemporal statistics:")
print(f"  Time range: {temporal_stats[0]:,.0f} to {temporal_stats[1]:,.0f} microseconds")
print(f"  Average time: {temporal_stats[2]:,.0f} microseconds")
print(f"  Standard deviation: {temporal_stats[3]:,.0f} microseconds")

# Spatial statistics
spatial_stats = con.execute("""
    SELECT
        MIN(x) as min_x, MAX(x) as max_x,
        MIN(y) as min_y, MAX(y) as max_y,
        COUNT(DISTINCT x) as unique_x_coords,
        COUNT(DISTINCT y) as unique_y_coords
    FROM events
""").fetchone()

print(f"\\nSpatial statistics:")
print(f"  X range: {spatial_stats[0]} to {spatial_stats[1]} ({spatial_stats[4]} unique values)")
print(f"  Y range: {spatial_stats[2]} to {spatial_stats[3]} ({spatial_stats[5]} unique values)")
```

## Advanced Analytics

### Time-based Analysis

```python
# Events per second
import duckdb
con = duckdb.connect()
events_df = evlib.load_events("data/slider_depth/events.txt")
con.register("events", events_df.collect().to_arrow())

events_per_second = con.execute("""
    SELECT
        EXTRACT(microseconds FROM t) // 1000000 as second,  -- Convert microseconds to seconds
        COUNT(*) as events_count
    FROM events
    GROUP BY EXTRACT(microseconds FROM t) // 1000000
    ORDER BY second
    LIMIT 10
""").fetchall()

print("Events per second (first 10 seconds):")
for second, count in events_per_second:
    print(f"  Second {second}: {count:,} events")
```

### Spatial Density Analysis

```python
# Spatial density heatmap data
import duckdb
con = duckdb.connect()
events_df = evlib.load_events("data/slider_depth/events.txt")
con.register("events", events_df.collect().to_arrow())

density_map = con.execute("""
    SELECT
        x // 10 * 10 as x_bin,  -- 10x10 pixel bins
        y // 10 * 10 as y_bin,
        COUNT(*) as event_density
    FROM events
    GROUP BY x_bin, y_bin
    HAVING event_density > 100  -- Filter low-density areas
    ORDER BY event_density DESC
    LIMIT 20
""").fetchall()

print("\\nTop 20 highest density regions (10x10 pixel bins):")
for x_bin, y_bin, density in density_map:
    print(f"  Region ({x_bin},{y_bin}): {density:,} events")
```

### Performance Comparison

The zero-copy Arrow integration provides significant performance benefits:

- **Memory efficiency**: No data copying between evlib and DuckDB
- **Query performance**: DuckDB's columnar engine optimized for Arrow data
- **Ecosystem compatibility**: Works seamlessly with Polars, Pandas, and other Arrow-compatible tools

### Alternative Ecosystems

The same Arrow data can be used with other analytics engines:

```python
# With Polars (already loaded as DataFrame)
import polars as pl
events_df = evlib.load_events("data/slider_depth/events.txt")
result = events_df.group_by("polarity").agg(pl.len())

# With Pandas (includes copy overhead)
import pandas as pd
df = events_df.collect().to_pandas()
result = df.groupby("polarity").size()
```

## Supported File Formats

evlib's Arrow integration supports all major event camera formats:

- **HDF5**: Prophesee, iniVation datasets
- **EVT2/EVT3**: Prophesee RAW format
- **AEDAT**: iniVation format (v1.0, v2.0, v3.1, v4.0)
- **AER**: Address-Event Representation
- **Text**: Custom CSV-like formats

## Performance Notes

- Arrow integration is enabled with the `arrow` feature flag
- Zero-copy transfer requires compatible data layouts
- For maximum performance, use file formats that align with Arrow's columnar structure
- Large datasets benefit most from the zero-copy approach

## Dependencies

To use DuckDB integration:

```bash
pip install evlib[polars] duckdb
```

Or for development:

```bash
maturin develop --features polars
pip install duckdb
```