oxits 0.1.0

Time series classification and transformation library for Rust
Documentation
# oxits

[![CI](https://github.com/sipemu/oxits-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/sipemu/oxits-rs/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/oxits.svg)](https://crates.io/crates/oxits)
[![Documentation](https://docs.rs/oxits/badge.svg)](https://docs.rs/oxits)
[![Coverage](https://img.shields.io/badge/coverage-96.9%25-brightgreen)](https://github.com/sipemu/oxits-rs/actions/workflows/ci.yml)
[![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)

A high-performance time series classification and transformation library for Rust, validated against [pyts](https://github.com/johannfaouzi/pyts).

## Features

### Preprocessing
- **StandardScaler** — zero-mean, unit-variance normalization
- **MinMaxScaler** — scale to arbitrary range
- **MaxAbsScaler** — scale by maximum absolute value
- **RobustScaler** — median/IQR-based scaling
- **KBinsDiscretizer** — binning with normal, uniform, and quantile strategies
- **PowerTransform** — Box-Cox and Yeo-Johnson transforms
- **QuantileTransform** — uniform or normal output distribution
- **Imputer** — fill NaN values (nearest, previous, next, linear)

### Approximation
- **PAA** — Piecewise Aggregate Approximation
- **SAX** — Symbolic Aggregate Approximation
- **DFT** — Discrete Fourier Transform coefficients
- **SFA** — Symbolic Fourier Approximation (DFT → MCB discretization, ANOVA feature selection)

### Metrics
- **DTW** — Dynamic Time Warping (classic, Sakoe-Chiba band, Itakura parallelogram, multiscale, fast)
- **Lower bounds** — LB_Kim, LB_Keogh, LB_Improved, LB_Yi
- **BOSS metric** — histogram intersection distance

### Image Transforms
- **GASF** — Gramian Angular Summation Field
- **GADF** — Gramian Angular Difference Field
- **MTF** — Markov Transition Field
- **RecurrencePlot** — recurrence plot with time-delay embedding

### Decomposition
- **SSA** — Singular Spectrum Analysis with automatic trend/seasonal/residual grouping

### Transformation
- **BOSS** — Bag of SFA Symbols
- **ROCKET** — Random Convolutional Kernel Transform
- **BagOfPatterns** — sliding-window SAX bag of words with TF-IDF
- **ShapeletTransform** — shapelet-based feature extraction
- **WEASEL** — Word ExtrAction for time SEries cLassification

### Classification
- **KNN** — k-nearest neighbors with pluggable distance metrics
- **BOSSVS** — BOSS in Vector Space (TF-IDF cosine similarity)
- **SAXVSM** — SAX-VSM classifier
- **TimeSeriesForest** — interval-based random forest
- **TSBF** — Time Series Bag of Features
- **LearningShapelets** — gradient-descent shapelet learning

### Multivariate
- **JointRecurrencePlot** — joint recurrence plots for multivariate time series
- **Multivariate wrapper** — apply univariate transforms/classifiers per channel

### Datasets
- **UCR Archive** — fetch datasets from the UCR Time Series Archive
- **Synthetic generators** — Cylinder-Bell-Funnel (CBF) dataset
- **Built-in** — GunPoint, Coffee synthetic datasets

### Infrastructure
- **Parallel computation** — feature-gated Rayon parallelism across all modules
- **Core traits**`Transformer`, `FittableTransformer`, `Classifier`, `DistanceMetric`
- **SIMD autovectorization** — AVX2 on x86-64 via `target-cpu=native`

## Performance

See [PERFORMANCE.md](PERFORMANCE.md) for detailed benchmark tables against pyts across all algorithms.

Benchmarked on Intel i9-13900H (P25 of 51 runs, 5 warmup):

| Algorithm | Speedup | | Algorithm | Speedup |
|-----------|--------:|-|-----------|--------:|
| StandardScaler | **12.3x** | | GASF | **3.1x** |
| MinMaxScaler | **7.0x** | | MTF | **6.0x** |
| KBinsDiscretizer | **10.7x** | | RecurrencePlot | **3.7x** |
| SAX | **7.9x** | | SSA | **10.5x** |
| DFT | **7.4x** | | BOSS | **2.9x** |
| DTW fast | **34.4x** | | ROCKET | **10.4x** |
| KNN | **16.8x** | | ShapeletTransform | **131.2x** |
| BOSSVS | **3.3x** | | TimeSeriesForest | **4.8x** |
| **Geometric mean** | **5.0x** | | **Median** | **3.7x** |

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
oxits = "0.1"
```

## Quick Start

### Stateless Transform (StandardScaler)

```rust
use oxits::preprocessing::scaler::{StandardScaler, StandardScalerConfig};
use oxits::Transformer;

let config = StandardScalerConfig::new();
let x = vec![vec![1.0, 2.0, 3.0, 4.0, 5.0]];
let scaled = StandardScaler::transform(&config, &x);
```

### Stateful Transform (SFA)

```rust
use oxits::approximation::sfa::{Sfa, SfaConfig};
use oxits::FittableTransformer;

let config = SfaConfig { n_coefs: Some(4), n_bins: 4, ..SfaConfig::new() };
let x = vec![
    vec![0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
    vec![7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.0],
];
let fitted = Sfa::fit(&config, &x, None);
let result = Sfa::transform(&fitted, &x);
```

### Classification (BOSSVS)

```rust
use oxits::classification::bossvs::{Bossvs, BossvsConfig};

let config = BossvsConfig::new(4);
let x_train = vec![
    vec![0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
    vec![7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.0],
];
let y_train = vec!["A".to_string(), "B".to_string()];

let fitted = Bossvs::fit(&config, &x_train, &y_train);
let predictions = Bossvs::predict(&fitted, &x_train);
```

### Distance Metrics (DTW)

```rust
use oxits::metrics::dtw::dtw_classic;

let a = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let b = vec![1.0, 2.5, 3.5, 4.0, 5.0];
let distance = dtw_classic(&a, &b);
```

### Image Transform (GASF)

```rust
use oxits::image::gaf::{Gaf, GafConfig};
use oxits::{GafMethod, Transformer};

let config = GafConfig { method: GafMethod::Summation, image_size: None };
let x = vec![vec![0.0, 1.0, 2.0, 3.0, 4.0]];
let images = Gaf::transform(&config, &x);
// images[0] is a 5x5 Gramian Angular Summation Field
```

## Cargo Features

| Feature | Default | Description |
|---------|---------|-------------|
| `parallel` | yes | Parallel computation via Rayon |
| `decomposition` | no | SSA with nalgebra SVD |
| `datasets` | no | UCR Archive fetching via ureq |
| `validation` | no | Golden data tests via serde |

```bash
# Default (parallel)
cargo build --release

# All features
cargo build --release --all-features

# No parallelism
cargo build --release --no-default-features
```

## Building

```bash
cargo build --release
cargo test --all-features
```

For best performance, ensure `.cargo/config.toml` targets your CPU:

```toml
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "target-cpu=native"]
```

## Validation

All modules are validated against pyts via golden integration tests. Each test loads reference data generated by pyts and compares output at epsilon < 1e-6:

```bash
cargo test --all-features         # run all tests (229 total)
cargo test --test golden_image    # image module golden tests
cargo test --test golden_metrics  # DTW golden tests
```

To regenerate golden data:

```bash
cd test_harness
pip install pyts numpy scikit-learn
python generate_golden_data.py
```

## Dependencies

- [realfft]https://crates.io/crates/realfft — FFT for DFT, SFA, and SSA periodograms
- [rayon]https://crates.io/crates/rayon — parallel computation (optional)
- [rand]https://crates.io/crates/rand / [rand_distr]https://crates.io/crates/rand_distr — random kernels for ROCKET
- [nalgebra]https://crates.io/crates/nalgebra — SVD for SSA decomposition (optional)
- [ureq]https://crates.io/crates/ureq — HTTP client for UCR Archive (optional)

## License

MIT License — see [LICENSE](LICENSE).