libspot-rs

A pure Rust implementation of the SPOT (Streaming Peaks Over Threshold) algorithm for real-time anomaly detection in time series data.

Features

Pure Rust: No external C dependencies required
Real-time: Designed for streaming data processing
Configurable: Flexible parameters for different use cases
Efficient: Optimized for performance with minimal memory footprint
Well-documented: Comprehensive API documentation and examples

Installation

cargo add libspot-rs

Quick Start

use libspot_rs::{SpotDetector, SpotConfig, SpotStatus};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create detector with default configuration
let config = SpotConfig::default();
let mut detector = SpotDetector::new(config)?;

// Fit with training data (normal distribution around 5.0)
let training_data: Vec<f64> = (0..1000)
    .map(|i| 5.0 + (i as f64 * 0.01).sin() * 2.0)
    .collect();
detector.fit(&training_data)?;

// Detect anomalies in real-time
let test_value = 50.0; // This should be an anomaly
match detector.step(test_value)? {
    SpotStatus::Normal => println!("Normal data point"),
    SpotStatus::Excess => println!("In the tail distribution"),
    SpotStatus::Anomaly => println!("Anomaly detected! 🚨"),
}
# Ok(())
# }

Configuration

The SPOT algorithm can be configured with various parameters:

use libspot_rs::SpotConfig;

let config = SpotConfig {
    q: 0.0001,              // Anomaly probability threshold
    low_tail: false,        // Monitor upper tail (set true for lower tail)
    discard_anomalies: true, // Exclude anomalies from model updates
    level: 0.998,           // Quantile level that defines the tail
    max_excess: 200,        // Maximum number of excess values to store
};

Advanced Usage

Custom Configuration

use libspot_rs::{SpotDetector, SpotConfig};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
// More sensitive detector (lower anomaly threshold)
let sensitive_config = SpotConfig {
    q: 0.01,               // Higher probability = more sensitive
    level: 0.95,           // Lower level = larger tail region
    ..SpotConfig::default()
};

let mut detector = SpotDetector::new(sensitive_config)?;
# Ok(())
# }

Monitoring Multiple Metrics

use libspot_rs::{SpotDetector, SpotConfig};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let cpu_history = vec![1.0, 2.0, 3.0];
# let memory_history = vec![1.0, 2.0, 3.0];
# let network_history = vec![1.0, 2.0, 3.0];
# fn get_cpu_usage() -> f64 { 1.0 }
# fn get_memory_usage() -> f64 { 1.0 }
# fn get_network_usage() -> f64 { 1.0 }
// Create separate detectors for different metrics
let mut cpu_detector = SpotDetector::new(SpotConfig::default())?;
let mut memory_detector = SpotDetector::new(SpotConfig::default())?;
let mut network_detector = SpotDetector::new(SpotConfig::default())?;

// Train each detector with historical data
cpu_detector.fit(&cpu_history)?;
memory_detector.fit(&memory_history)?;
network_detector.fit(&network_history)?;

// Monitor in real-time
for _ in 0..3 { // Limited loop for doctest
    let cpu_status = cpu_detector.step(get_cpu_usage())?;
    let memory_status = memory_detector.step(get_memory_usage())?;
    let network_status = network_detector.step(get_network_usage())?;

    // Handle anomalies...
    break; // Exit early for doctest
}
# Ok(())
# }

Accessing Detector State

# use libspot_rs::{SpotDetector, SpotConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let mut detector = SpotDetector::new(SpotConfig::default())?;
# let training_data = vec![1.0, 2.0, 3.0];
# detector.fit(&training_data)?;
// Get detector statistics
println!("Total samples: {}", detector.n());
println!("Excess count: {}", detector.nt());
println!("Anomaly threshold: {}", detector.anomaly_threshold());
println!("Excess threshold: {}", detector.excess_threshold());

// Get tail distribution parameters
let (gamma, sigma) = detector.tail_parameters();
println!("Tail shape: {}, scale: {}", gamma, sigma);

// Get peaks statistics
println!("Peaks mean: {}", detector.peaks_mean());
println!("Peaks variance: {}", detector.peaks_variance());
# Ok(())
# }

Algorithm Overview

The SPOT algorithm is designed for online anomaly detection in time series data using:

Extreme Value Theory (EVT): Models the tail of the data distribution
Generalized Pareto Distribution (GPD): Fits the distribution of excesses
Dynamic Thresholding: Adapts to changing data patterns
Streaming Processing: Processes one data point at a time

Key concepts:

Excess: Values above a high quantile threshold
Tail: The extreme region of the data distribution
Anomaly: Values with probability below the configured threshold

Key Components

[SpotDetector]: Main SPOT detector implementation
[SpotConfig]: Configuration parameters for the detector
[SpotStatus]: Status returned by the detector for each data point
[Ubend]: Circular buffer for storing data
[Peaks]: Statistics computation over peaks data
[Tail]: Generalized Pareto Distribution tail modeling

Performance

libspot-rs is optimized for real-time processing:

Memory: O(max_excess) space complexity
Time: O(1) amortized time per data point
Throughput: Can process millions of data points per second

Comparison with C Implementation

Feature	libspot-rs (Pure Rust)	libspot (C + FFI)
Dependencies	None	C library, bindgen
Memory Safety	✅ Guaranteed	⚠️ Manual management
Performance	✅ Excellent	✅ Excellent
Cross-platform	✅ Easy	⚠️ Build complexity
WebAssembly	✅ Full support	❌ Limited

Examples

See the examples/ directory for more comprehensive usage examples:

basic.rs: Basic usage with synthetic data

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the GNU Lesser General Public License v3.0 - see the LICENSE file for details.

libspot-rs 0.1.0