oxits 0.1.0 - Docs.rs

# Performance: oxits (Rust) vs pyts (Python)

## Methodology

Both libraries are benchmarked with identical methodology:

- **Runs**: 51 timed iterations after 5 warmup runs
- **Statistic**: P25 (25th percentile) — more robust than median for workloads with upward-skewed noise from thread scheduling
- **Data**: Seeded random normal data with matched dimensions
- **Build**: Rust release profile with `lto = "fat"`, `codegen-units = 1`, `target-cpu=native`

### System

| | |
|---|---|
| **CPU** | 13th Gen Intel Core i9-13900H (14 cores / 20 threads, up to 5.4 GHz) |
| **OS** | Manjaro Linux 6.18.8 |
| **Rust** | 1.93.0 (2026-01-19) |
| **Python** | 3.14.2 |
| **pyts** | 0.13.0 |

### Running the benchmarks

```bash
# Rust
cargo run --release --example benchmark --features decomposition

# Python
cd test_harness
pip install -r requirements.txt
python benchmark_pyts.py

# Comparison report
python compare_benchmarks.py
```

## Results

### Preprocessing

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| StandardScaler | 100 x 500 | 0.392 | 0.032 | **12.3x** |
| MinMaxScaler | 100 x 500 | 0.258 | 0.037 | **7.0x** |
| KBinsDiscretizer | 100 x 500 | 1.653 | 0.155 | **10.7x** |

### Approximation

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| PAA (output_size=50) | 100 x 500 | 0.022 | 0.024 | 0.9x |
| SAX (n_bins=4) | 100 x 500 | 0.748 | 0.095 | **7.9x** |
| DFT (n_coefs=20) | 100 x 500 | 0.410 | 0.056 | **7.4x** |
| SFA (fit+transform) | 100 x 500 | 0.828 | 0.330 | **2.5x** |

### Metrics (DTW)

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| DTW classic | n=100 | 0.093 | 0.049 | **1.9x** |
| DTW classic | n=500 | 1.932 | 1.361 | **1.4x** |
| DTW classic | n=1000 | 9.831 | 5.528 | **1.8x** |
| DTW Sakoe-Chiba | n=500 | 0.532 | 0.339 | **1.6x** |
| DTW fast | n=500 | 6.201 | 0.181 | **34.4x** |

DTW classic has a hard performance floor due to the sequential O(nm) dynamic programming recurrence. The large DTW fast speedup comes from the multiscale approximation implementation.

### Bag of Words

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| BagOfWords | 50 x 200 | 16.73 | 8.857 | **1.9x** |

### Image Transforms

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| GASF | 50 x 100 | 0.742 | 0.243 | **3.1x** |
| GADF | 50 x 100 | 0.735 | 0.254 | **2.9x** |
| MTF | 50 x 100 | 1.412 | 0.234 | **6.0x** |
| RecurrencePlot | 50 x 100 | 0.815 | 0.222 | **3.7x** |

### Decomposition

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| SSA (window=10) | 20 x 200 | 3.335 | 0.316 | **10.5x** |

### Transformation

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| BOSS (fit+transform) | 50 x 300 | 26.64 | 9.052 | **2.9x** |
| ROCKET (500 kernels) | 50 x 300 | 58.89 | 5.664 | **10.4x** |
| ShapeletTransform | 50 x 300 | 37,240 | 284 | **131.2x** |
| BagOfPatterns | 50 x 300 | 29.90 | 12.53 | **2.4x** |

### Classification

| Algorithm | Dataset | pyts (ms) | oxits (ms) | Speedup |
|-----------|---------|-----------|------------|---------|
| KNN (k=3, Euclidean) | 50/20 x 200 | 1.651 | 0.098 | **16.8x** |
| BOSSVS | 50/20 x 200 | 26.19 | 7.968 | **3.3x** |
| TimeSeriesForest | 50/20 x 200 | 44.18 | 9.127 | **4.8x** |

SAXVSM, TSBF, and LearningShapelets are not available in pyts and have no comparison baseline.

## Summary

| Statistic | Value |
|-----------|-------|
| **Geometric mean speedup** | **5.0x** |
| Arithmetic mean speedup | 11.6x |
| Median speedup | 3.7x |
| Min speedup | 0.9x (PAA) |
| Max speedup | 131.2x (ShapeletTransform) |

25 out of 28 algorithms have Python baselines. All 25 are faster or within measurement noise. The largest gains come from compute-intensive algorithms (ShapeletTransform, DTW fast, KNN, ROCKET, SSA) where Rust's compiled code, SIMD autovectorization, and Rayon parallelism compound.

### Where oxits is closest to pyts

- **PAA** (0.9x): At 22 microseconds, PAA is dominated by measurement overhead. Both implementations are near-instant.
- **DTW classic** (1.4-1.9x): The sequential DP recurrence prevents parallelization and limits SIMD gains.
- **BagOfWords** (1.9x): Most time is spent in the SFA pipeline and string hashing.

### Where oxits pulls ahead most

- **ShapeletTransform** (131x): Brute-force O(n^2 * m * k) search benefits enormously from compiled code + Rayon.
- **DTW fast** (34x): The multiscale approximation uses coarsening + band projection; the Rust implementation avoids Python's recursive overhead.
- **KNN** (17x): Distance matrix computation is embarrassingly parallel and benefits from inlined DTW.
- **StandardScaler** (12x): Rayon parallelism over 100 samples + SIMD-friendly memory layout.
- **SSA** (11x): In-place Jacobi eigendecomposition (zero allocations per iteration), FFT plan reuse, and Rayon.
- **ROCKET** (10x): Random kernel generation and convolution is pure compute with no Python overhead.