libspot-rs

A pure Rust implementation of the SPOT (Streaming Peaks Over Threshold) algorithm for real-time anomaly detection in time series data.
Features
- Pure Rust: No external C dependencies required
- Real-time: Designed for streaming data processing
- Configurable: Flexible parameters for different use cases
- Efficient: Optimized for performance with minimal memory footprint
- Well-documented: Comprehensive API documentation and examples
Installation
cargo add libspot-rs
Quick Start
use libspot_rs::{SpotDetector, SpotConfig, SpotStatus};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = SpotConfig::default();
let mut detector = SpotDetector::new(config)?;
let training_data: Vec<f64> = (0..1000)
.map(|i| 5.0 + (i as f64 * 0.01).sin() * 2.0)
.collect();
detector.fit(&training_data)?;
let test_value = 50.0; match detector.step(test_value)? {
SpotStatus::Normal => println!("Normal data point"),
SpotStatus::Excess => println!("In the tail distribution"),
SpotStatus::Anomaly => println!("Anomaly detected! 🚨"),
}
# Ok(())
# }
Configuration
The SPOT algorithm can be configured with various parameters:
use libspot_rs::SpotConfig;
let config = SpotConfig {
q: 0.0001, low_tail: false, discard_anomalies: true, level: 0.998, max_excess: 200, };
Advanced Usage
Custom Configuration
use libspot_rs::{SpotDetector, SpotConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
let sensitive_config = SpotConfig {
q: 0.01, level: 0.95, ..SpotConfig::default()
};
let mut detector = SpotDetector::new(sensitive_config)?;
# Ok(())
# }
Monitoring Multiple Metrics
use libspot_rs::{SpotDetector, SpotConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let cpu_history = vec![1.0, 2.0, 3.0];
# let memory_history = vec![1.0, 2.0, 3.0];
# let network_history = vec![1.0, 2.0, 3.0];
# fn get_cpu_usage() -> f64 { 1.0 }
# fn get_memory_usage() -> f64 { 1.0 }
# fn get_network_usage() -> f64 { 1.0 }
let mut cpu_detector = SpotDetector::new(SpotConfig::default())?;
let mut memory_detector = SpotDetector::new(SpotConfig::default())?;
let mut network_detector = SpotDetector::new(SpotConfig::default())?;
cpu_detector.fit(&cpu_history)?;
memory_detector.fit(&memory_history)?;
network_detector.fit(&network_history)?;
for _ in 0..3 { let cpu_status = cpu_detector.step(get_cpu_usage())?;
let memory_status = memory_detector.step(get_memory_usage())?;
let network_status = network_detector.step(get_network_usage())?;
break; }
# Ok(())
# }
Accessing Detector State
# use libspot_rs::{SpotDetector, SpotConfig};
# fn main() -> Result<(), Box<dyn std::error::Error>> {
# let mut detector = SpotDetector::new(SpotConfig::default())?;
# let training_data = vec![1.0, 2.0, 3.0];
# detector.fit(&training_data)?;
println!("Total samples: {}", detector.n());
println!("Excess count: {}", detector.nt());
println!("Anomaly threshold: {}", detector.anomaly_threshold());
println!("Excess threshold: {}", detector.excess_threshold());
let (gamma, sigma) = detector.tail_parameters();
println!("Tail shape: {}, scale: {}", gamma, sigma);
println!("Peaks mean: {}", detector.peaks_mean());
println!("Peaks variance: {}", detector.peaks_variance());
# Ok(())
# }
Algorithm Overview
The SPOT algorithm is designed for online anomaly detection in time series data using:
- Extreme Value Theory (EVT): Models the tail of the data distribution
- Generalized Pareto Distribution (GPD): Fits the distribution of excesses
- Dynamic Thresholding: Adapts to changing data patterns
- Streaming Processing: Processes one data point at a time
Key concepts:
- Excess: Values above a high quantile threshold
- Tail: The extreme region of the data distribution
- Anomaly: Values with probability below the configured threshold
Key Components
- [
SpotDetector]: Main SPOT detector implementation
- [
SpotConfig]: Configuration parameters for the detector
- [
SpotStatus]: Status returned by the detector for each data point
- [
Ubend]: Circular buffer for storing data
- [
Peaks]: Statistics computation over peaks data
- [
Tail]: Generalized Pareto Distribution tail modeling
Performance
libspot-rs is optimized for real-time processing:
- Memory: O(max_excess) space complexity
- Time: O(1) amortized time per data point
- Throughput: Can process millions of data points per second
Comparison with C Implementation
| Feature |
libspot-rs (Pure Rust) |
libspot (C + FFI) |
| Dependencies |
None |
C library, bindgen |
| Memory Safety |
✅ Guaranteed |
⚠️ Manual management |
| Performance |
✅ Excellent |
✅ Excellent |
| Cross-platform |
✅ Easy |
⚠️ Build complexity |
| WebAssembly |
✅ Full support |
❌ Limited |
Examples
See the examples/ directory for more comprehensive usage examples:
basic.rs: Basic usage with synthetic data
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
License
This project is licensed under the GNU Lesser General Public License v3.0 - see the LICENSE file for details.