# Performance: oxits (Rust) vs pyts (Python)
## Methodology
Both libraries are benchmarked with identical methodology:
- **Runs**: 51 timed iterations after 5 warmup runs
- **Statistic**: P25 (25th percentile) — more robust than median for workloads with upward-skewed noise from thread scheduling
- **Data**: Seeded random normal data with matched dimensions
- **Build**: Rust release profile with `lto = "fat"`, `codegen-units = 1`, `target-cpu=native`
### System
| **CPU** | 13th Gen Intel Core i9-13900H (14 cores / 20 threads, up to 5.4 GHz) |
| **OS** | Manjaro Linux 6.18.8 |
| **Rust** | 1.93.0 (2026-01-19) |
| **Python** | 3.14.2 |
| **pyts** | 0.13.0 |
### Running the benchmarks
```bash
# Rust
cargo run --release --example benchmark --features decomposition
# Python
cd test_harness
pip install -r requirements.txt
python benchmark_pyts.py
# Comparison report
python compare_benchmarks.py
```
## Results
### Preprocessing
| StandardScaler | 100 x 500 | 0.392 | 0.032 | **12.3x** |
| MinMaxScaler | 100 x 500 | 0.258 | 0.037 | **7.0x** |
| KBinsDiscretizer | 100 x 500 | 1.653 | 0.155 | **10.7x** |
### Approximation
| PAA (output_size=50) | 100 x 500 | 0.022 | 0.024 | 0.9x |
| SAX (n_bins=4) | 100 x 500 | 0.748 | 0.095 | **7.9x** |
| DFT (n_coefs=20) | 100 x 500 | 0.410 | 0.056 | **7.4x** |
| SFA (fit+transform) | 100 x 500 | 0.828 | 0.330 | **2.5x** |
### Metrics (DTW)
| DTW classic | n=100 | 0.093 | 0.049 | **1.9x** |
| DTW classic | n=500 | 1.932 | 1.361 | **1.4x** |
| DTW classic | n=1000 | 9.831 | 5.528 | **1.8x** |
| DTW Sakoe-Chiba | n=500 | 0.532 | 0.339 | **1.6x** |
| DTW fast | n=500 | 6.201 | 0.181 | **34.4x** |
DTW classic has a hard performance floor due to the sequential O(nm) dynamic programming recurrence. The large DTW fast speedup comes from the multiscale approximation implementation.
### Bag of Words
| BagOfWords | 50 x 200 | 16.73 | 8.857 | **1.9x** |
### Image Transforms
| GASF | 50 x 100 | 0.742 | 0.243 | **3.1x** |
| GADF | 50 x 100 | 0.735 | 0.254 | **2.9x** |
| MTF | 50 x 100 | 1.412 | 0.234 | **6.0x** |
| RecurrencePlot | 50 x 100 | 0.815 | 0.222 | **3.7x** |
### Decomposition
| SSA (window=10) | 20 x 200 | 3.335 | 0.316 | **10.5x** |
### Transformation
| BOSS (fit+transform) | 50 x 300 | 26.64 | 9.052 | **2.9x** |
| ROCKET (500 kernels) | 50 x 300 | 58.89 | 5.664 | **10.4x** |
| ShapeletTransform | 50 x 300 | 37,240 | 284 | **131.2x** |
| BagOfPatterns | 50 x 300 | 29.90 | 12.53 | **2.4x** |
### Classification
| KNN (k=3, Euclidean) | 50/20 x 200 | 1.651 | 0.098 | **16.8x** |
| BOSSVS | 50/20 x 200 | 26.19 | 7.968 | **3.3x** |
| TimeSeriesForest | 50/20 x 200 | 44.18 | 9.127 | **4.8x** |
SAXVSM, TSBF, and LearningShapelets are not available in pyts and have no comparison baseline.
## Summary
| **Geometric mean speedup** | **5.0x** |
| Arithmetic mean speedup | 11.6x |
| Median speedup | 3.7x |
| Min speedup | 0.9x (PAA) |
| Max speedup | 131.2x (ShapeletTransform) |
25 out of 28 algorithms have Python baselines. All 25 are faster or within measurement noise. The largest gains come from compute-intensive algorithms (ShapeletTransform, DTW fast, KNN, ROCKET, SSA) where Rust's compiled code, SIMD autovectorization, and Rayon parallelism compound.
### Where oxits is closest to pyts
- **PAA** (0.9x): At 22 microseconds, PAA is dominated by measurement overhead. Both implementations are near-instant.
- **DTW classic** (1.4-1.9x): The sequential DP recurrence prevents parallelization and limits SIMD gains.
- **BagOfWords** (1.9x): Most time is spent in the SFA pipeline and string hashing.
### Where oxits pulls ahead most
- **ShapeletTransform** (131x): Brute-force O(n^2 * m * k) search benefits enormously from compiled code + Rayon.
- **DTW fast** (34x): The multiscale approximation uses coarsening + band projection; the Rust implementation avoids Python's recursive overhead.
- **KNN** (17x): Distance matrix computation is embarrassingly parallel and benefits from inlined DTW.
- **StandardScaler** (12x): Rayon parallelism over 100 samples + SIMD-friendly memory layout.
- **SSA** (11x): In-place Jacobi eigendecomposition (zero allocations per iteration), FFT plan reuse, and Rayon.
- **ROCKET** (10x): Random kernel generation and convolution is pure compute with no Python overhead.