Expand description
§libspot-rs
A pure Rust implementation of the SPOT (Streaming Peaks Over Threshold) algorithm for real-time anomaly detection in time series data.
§Features
- Pure Rust: No external C dependencies required
- Real-time: Designed for streaming data processing
- Configurable: Flexible parameters for different use cases
- Efficient: Optimized for performance with minimal memory footprint
- Well-documented: Comprehensive API documentation and examples
§Installation
cargo add libspot-rs§Quick Start
use libspot_rs::{SpotDetector, SpotConfig, SpotStatus};
// Create detector with default configuration
let config = SpotConfig::default();
let mut detector = SpotDetector::new(config)?;
// Fit with training data (normal distribution around 5.0)
let training_data: Vec<f64> = (0..1000)
.map(|i| 5.0 + (i as f64 * 0.01).sin() * 2.0)
.collect();
detector.fit(&training_data)?;
// Detect anomalies in real-time
let test_value = 50.0; // This should be an anomaly
match detector.step(test_value)? {
SpotStatus::Normal => println!("Normal data point"),
SpotStatus::Excess => println!("In the tail distribution"),
SpotStatus::Anomaly => println!("Anomaly detected! 🚨"),
}§Configuration
The SPOT algorithm can be configured with various parameters:
use libspot_rs::SpotConfig;
let config = SpotConfig {
q: 0.0001, // Anomaly probability threshold
low_tail: false, // Monitor upper tail (set true for lower tail)
discard_anomalies: true, // Exclude anomalies from model updates
level: 0.998, // Quantile level that defines the tail
max_excess: 200, // Maximum number of excess values to store
};§Advanced Usage
§Custom Configuration
use libspot_rs::{SpotDetector, SpotConfig};
// More sensitive detector (lower anomaly threshold)
let sensitive_config = SpotConfig {
q: 0.01, // Higher probability = more sensitive
level: 0.95, // Lower level = larger tail region
..SpotConfig::default()
};
let mut detector = SpotDetector::new(sensitive_config)?;§Monitoring Multiple Metrics
use libspot_rs::{SpotDetector, SpotConfig};
// Create separate detectors for different metrics
let mut cpu_detector = SpotDetector::new(SpotConfig::default())?;
let mut memory_detector = SpotDetector::new(SpotConfig::default())?;
let mut network_detector = SpotDetector::new(SpotConfig::default())?;
// Train each detector with historical data
cpu_detector.fit(&cpu_history)?;
memory_detector.fit(&memory_history)?;
network_detector.fit(&network_history)?;
// Monitor in real-time
for _ in 0..3 { // Limited loop for doctest
let cpu_status = cpu_detector.step(get_cpu_usage())?;
let memory_status = memory_detector.step(get_memory_usage())?;
let network_status = network_detector.step(get_network_usage())?;
// Handle anomalies...
break; // Exit early for doctest
}§Accessing Detector State
// Get detector statistics
println!("Total samples: {}", detector.n());
println!("Excess count: {}", detector.nt());
println!("Anomaly threshold: {}", detector.anomaly_threshold());
println!("Excess threshold: {}", detector.excess_threshold());
// Get tail distribution parameters
let (gamma, sigma) = detector.tail_parameters();
println!("Tail shape: {}, scale: {}", gamma, sigma);
// Get peaks statistics
println!("Peaks mean: {}", detector.peaks_mean());
println!("Peaks variance: {}", detector.peaks_variance());§Algorithm Overview
The SPOT algorithm is designed for online anomaly detection in time series data using:
- Extreme Value Theory (EVT): Models the tail of the data distribution
- Generalized Pareto Distribution (GPD): Fits the distribution of excesses
- Dynamic Thresholding: Adapts to changing data patterns
- Streaming Processing: Processes one data point at a time
Key concepts:
- Excess: Values above a high quantile threshold
- Tail: The extreme region of the data distribution
- Anomaly: Values with probability below the configured threshold
§Key Components
SpotDetector: Main SPOT detector implementationSpotConfig: Configuration parameters for the detectorSpotStatus: Status returned by the detector for each data pointUbend: Circular buffer for storing dataPeaks: Statistics computation over peaks dataTail: Generalized Pareto Distribution tail modeling
§Performance
libspot-rs is optimized for real-time processing:
- Memory: O(max_excess) space complexity
- Time: O(1) amortized time per data point
- Throughput: Can process millions of data points per second
§Comparison with C Implementation
| Feature | libspot-rs (Pure Rust) | libspot (C + FFI) |
|---|---|---|
| Dependencies | None | C library, bindgen |
| Memory Safety | ✅ Guaranteed | ⚠️ Manual management |
| Performance | ✅ Excellent | ✅ Excellent |
| Cross-platform | ✅ Easy | ⚠️ Build complexity |
| WebAssembly | ✅ Full support | ❌ Limited |
§Examples
See the examples/ directory for more comprehensive usage examples:
basic.rs: Basic usage with synthetic data
§Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
§License
This project is licensed under the GNU Lesser General Public License v3.0 - see the LICENSE file for details.
Re-exports§
pub use f64 as SpotFloat;
Structs§
- Peaks
- Structure that computes stats about the peaks
- Spot
Config - Configuration parameters for SPOT detector
- Spot
Detector - Main SPOT detector for streaming anomaly detection
- Tail
- Structure that embeds GPD parameters (GPD tail actually)
- Ubend
- Circular buffer implementation that matches the C Ubend structure
Enums§
- Spot
Error - Error codes that match the C implementation
- Spot
Status - Status codes returned by SPOT operations that match the C implementation exactly
Functions§
- version
- Get the version of the pure Rust libspot implementation
Type Aliases§
- Spot
Result - Result type for SPOT operations