Crate libspot_rs

Source
Expand description

§libspot-rs

Crates.io Documentation License: LGPL v3

A pure Rust implementation of the SPOT (Streaming Peaks Over Threshold) algorithm for real-time anomaly detection in time series data.

§Features

  • Pure Rust: No external C dependencies required
  • Real-time: Designed for streaming data processing
  • Configurable: Flexible parameters for different use cases
  • Efficient: Optimized for performance with minimal memory footprint
  • Well-documented: Comprehensive API documentation and examples

§Installation

cargo add libspot-rs

§Quick Start

use libspot_rs::{SpotDetector, SpotConfig, SpotStatus};

// Create detector with default configuration
let config = SpotConfig::default();
let mut detector = SpotDetector::new(config)?;

// Fit with training data (normal distribution around 5.0)
let training_data: Vec<f64> = (0..1000)
    .map(|i| 5.0 + (i as f64 * 0.01).sin() * 2.0)
    .collect();
detector.fit(&training_data)?;

// Detect anomalies in real-time
let test_value = 50.0; // This should be an anomaly
match detector.step(test_value)? {
    SpotStatus::Normal => println!("Normal data point"),
    SpotStatus::Excess => println!("In the tail distribution"),
    SpotStatus::Anomaly => println!("Anomaly detected! 🚨"),
}

§Configuration

The SPOT algorithm can be configured with various parameters:

use libspot_rs::SpotConfig;

let config = SpotConfig {
    q: 0.0001,              // Anomaly probability threshold
    low_tail: false,        // Monitor upper tail (set true for lower tail)
    discard_anomalies: true, // Exclude anomalies from model updates
    level: 0.998,           // Quantile level that defines the tail
    max_excess: 200,        // Maximum number of excess values to store
};

§Advanced Usage

§Custom Configuration

use libspot_rs::{SpotDetector, SpotConfig};

// More sensitive detector (lower anomaly threshold)
let sensitive_config = SpotConfig {
    q: 0.01,               // Higher probability = more sensitive
    level: 0.95,           // Lower level = larger tail region
    ..SpotConfig::default()
};

let mut detector = SpotDetector::new(sensitive_config)?;

§Monitoring Multiple Metrics

use libspot_rs::{SpotDetector, SpotConfig};

// Create separate detectors for different metrics
let mut cpu_detector = SpotDetector::new(SpotConfig::default())?;
let mut memory_detector = SpotDetector::new(SpotConfig::default())?;
let mut network_detector = SpotDetector::new(SpotConfig::default())?;

// Train each detector with historical data
cpu_detector.fit(&cpu_history)?;
memory_detector.fit(&memory_history)?;
network_detector.fit(&network_history)?;

// Monitor in real-time
for _ in 0..3 { // Limited loop for doctest
    let cpu_status = cpu_detector.step(get_cpu_usage())?;
    let memory_status = memory_detector.step(get_memory_usage())?;
    let network_status = network_detector.step(get_network_usage())?;

    // Handle anomalies...
    break; // Exit early for doctest
}

§Accessing Detector State

// Get detector statistics
println!("Total samples: {}", detector.n());
println!("Excess count: {}", detector.nt());
println!("Anomaly threshold: {}", detector.anomaly_threshold());
println!("Excess threshold: {}", detector.excess_threshold());

// Get tail distribution parameters
let (gamma, sigma) = detector.tail_parameters();
println!("Tail shape: {}, scale: {}", gamma, sigma);

// Get peaks statistics
println!("Peaks mean: {}", detector.peaks_mean());
println!("Peaks variance: {}", detector.peaks_variance());

§Algorithm Overview

The SPOT algorithm is designed for online anomaly detection in time series data using:

  1. Extreme Value Theory (EVT): Models the tail of the data distribution
  2. Generalized Pareto Distribution (GPD): Fits the distribution of excesses
  3. Dynamic Thresholding: Adapts to changing data patterns
  4. Streaming Processing: Processes one data point at a time

Key concepts:

  • Excess: Values above a high quantile threshold
  • Tail: The extreme region of the data distribution
  • Anomaly: Values with probability below the configured threshold

§Key Components

  • SpotDetector: Main SPOT detector implementation
  • SpotConfig: Configuration parameters for the detector
  • SpotStatus: Status returned by the detector for each data point
  • Ubend: Circular buffer for storing data
  • Peaks: Statistics computation over peaks data
  • Tail: Generalized Pareto Distribution tail modeling

§Performance

libspot-rs is optimized for real-time processing:

  • Memory: O(max_excess) space complexity
  • Time: O(1) amortized time per data point
  • Throughput: Can process millions of data points per second

§Comparison with C Implementation

Featurelibspot-rs (Pure Rust)libspot (C + FFI)
DependenciesNoneC library, bindgen
Memory Safety✅ Guaranteed⚠️ Manual management
Performance✅ Excellent✅ Excellent
Cross-platform✅ Easy⚠️ Build complexity
WebAssembly✅ Full support❌ Limited

§Examples

See the examples/ directory for more comprehensive usage examples:

  • basic.rs: Basic usage with synthetic data

§Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

§License

This project is licensed under the GNU Lesser General Public License v3.0 - see the LICENSE file for details.

Re-exports§

pub use f64 as SpotFloat;

Structs§

Peaks
Structure that computes stats about the peaks
SpotConfig
Configuration parameters for SPOT detector
SpotDetector
Main SPOT detector for streaming anomaly detection
Tail
Structure that embeds GPD parameters (GPD tail actually)
Ubend
Circular buffer implementation that matches the C Ubend structure

Enums§

SpotError
Error codes that match the C implementation
SpotStatus
Status codes returned by SPOT operations that match the C implementation exactly

Functions§

version
Get the version of the pure Rust libspot implementation

Type Aliases§

SpotResult
Result type for SPOT operations