touchstone-rs 0.1.0

A Rust library for benchmarking streaming anomaly detectors against labeled time-series datasets
Documentation
touchstone-rs-0.1.0 has been yanked.

Touchstone

Touchstone is a Rust library for evaluating streaming anomaly detectors on labeled time-series benchmark datasets. Point it at a directory of CSVs, register one or more detectors, call run(), and get back a Polars DataFrame with one row per (dataset, detector) pair.

Touchstone is made in the spirit of TimeEval [2] — a Python benchmarking toolkit for time series anomaly detection algorithms. If you are looking for datasets, the TimeEval evaluation paper [1] provides a large collection already formatted for direct use with Touchstone at the TimeEval Datasets page.

Quickstart

Add to Cargo.toml:

[dependencies]
touchstone-rs = "0.1"

Implementing the Detector Trait

Your algorithm must implement a single trait:

pub trait Detector: Send {
    fn update(&mut self, point: &[f32]) -> f32;
}
  • point is a slice of f32 features for the current time step. The length matches the number of feature columns in the dataset.
  • Return an anomaly score as f32. Higher values mean more anomalous.
  • Return f32::NAN during warmup or whenever a score is not yet meaningful. NaN points are excluded from metric computation.
  • Scores are minmax-normalized to [0, 1] before any metric is computed, so the absolute scale of your scores does not matter.

Running an Evaluation

use std::path::Path;
use touchstone_rs::{Detector, Touchstone};

struct MyDetector { n_dims: usize }

impl MyDetector {
    fn new(n_dims: usize) -> Self {
        MyDetector { n_dims }
    }
}

impl Detector for MyDetector {
    fn update(&mut self, point: &[f32]) -> f32 {
        // compute and return anomaly score
        0.5
    }
}

fn main() {
    let mut experiment = Touchstone::new(Path::new("data"));

    // The factory closure receives `n_dimensions` at runtime — use it to size
    // internal buffers to match the dataset's feature count.
    experiment.add_detector("MyDetector-v1", |n_dims| MyDetector::new(n_dims));

    // Multiple detectors can be registered and are evaluated in a single pass:
    // experiment.add_detector("MyDetector-window10", |n| MyDetector::new(n));
    // experiment.add_detector("MyDetector-window50", |n| MyDetector::new(n));

    let df = experiment.run().unwrap();
    println!("{df}");
}

Output DataFrame

run() returns a DataFrame with this schema:

column type description
dataset String dataset filename (without extension)
detector String name passed to add_detector
roc_auc f64 ROC-AUC
pr_auc f64 Precision-Recall AUC
average_precision f64 Average Precision
precision f64 Precision at 90th-percentile threshold
recall f64 Recall at 90th-percentile threshold
f1 f64 F1 at 90th-percentile threshold
range_precision f64 Range-based Precision (Tatbul et al., NeurIPS 2018)
range_recall f64 Range-based Recall
range_f_score f64 Range-based F-score
range_auc f64 Range-based AUC
range_pr_vus f64 PR-VUS (Paparrizos et al., PVLDB 2022)
range_roc_vus f64 ROC-VUS
time_sec f64 wall-clock seconds for this detector on this dataset

If a dataset fails to load or a detector produces only NaN scores, the metric columns for that row contain NaN.

Custom Metrics

If the default metric set does not suit your needs, swap it out entirely by adding metrics before calling run():

use std::path::Path;
use touchstone_rs::{Detector, Touchstone};
use touchstone_rs::metrics::{RocAuc, F1Score, SigmaThreshold};

# struct MyDetector { n_dims: usize }
# impl MyDetector { fn new(n_dims: usize) -> Self { MyDetector { n_dims } } }
# impl Detector for MyDetector { fn update(&mut self, _: &[f32]) -> f32 { 0.5 } }

let mut experiment = Touchstone::new(Path::new("data"));
experiment.add_detector("MyDetector", |n| MyDetector::new(n));
experiment.add_metric(RocAuc);
experiment.add_metric(F1Score::new(SigmaThreshold(3.0)));

Implement Metric for fully custom scoring:

use touchstone_rs::metrics::Metric;

struct MyMetric;

impl Metric for MyMetric {
    fn name(&self) -> &str { "my_metric" }
    fn score(&self, labels: &[u8], scores: &[f32]) -> f64 {
        // labels: 0 = normal, 1 = anomaly
        // scores: minmax-normalized to [0, 1], NaN already removed
        todo!()
    }
}

Dataset Format

Datasets are CSV files with no assumed column names:

timestamp, feature_1, ..., feature_N, label
2016-04-20 10:35:12, 1.2, 3.4, 0
2016-04-20 10:35:13, 5.6, 7.8, 1
  • Column 1: timestamp — parsed but ignored
  • Columns 2 … N: features — cast to f32, passed as point to update()
  • Last column: binary anomaly label — 0 (normal) or 1 (anomaly)

Touchstone passes every row to update() in order, simulating a streaming environment. Each detector gets a fresh instance per dataset.

Running the Built-in Example

cargo run --example normal_distribution_detector

This runs a rolling z-score detector (window = 20) against all datasets in data/ and prints the results.

References

If you use Touchstone or the TimeEval dataset collection in your work, please cite:

[1] Dataset collection and evaluation methodology

@article{SchmidlEtAl2022Anomaly,
  title = {Anomaly Detection in Time Series: A Comprehensive Evaluation},
  author = {Schmidl, Sebastian and Wenig, Phillip and Papenbrock, Thorsten},
  date = {2022},
  journaltitle = {Proceedings of the VLDB Endowment (PVLDB)},
  volume = {15},
  number = {9},
  pages = {1779--1797},
  doi = {10.14778/3538598.3538602}
}

[2] TimeEval benchmarking toolkit

@article{WenigEtAl2022TimeEval,
  title = {TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms},
  author = {Wenig, Phillip and Schmidl, Sebastian and Papenbrock, Thorsten},
  date = {2022},
  journaltitle = {Proceedings of the VLDB Endowment (PVLDB)},
  volume = {15},
  number = {12},
  pages = {3678--3681},
  doi = {10.14778/3554821.3554873}
}

[3] Touchstone

@software{Touchstone,
  title = {Touchstone: A Rust Library for Benchmarking Streaming Anomaly Detectors},
  author = {Wenig, Phillip},
  date = {2026},
  url = {https://github.com/wenig/touchstone}
}