Anomaly Grid
█████╗ ███╗ ██╗ ██████╗ ███╗ ███╗ █████╗ ██╗ ██╗ ██╗
██╔══██╗████╗ ██║██╔═══██╗████╗ ████║██╔══██╗██║ ╚██╗ ██╔╝
███████║██╔██╗ ██║██║ ██║██╔████╔██║███████║██║ ╚████╔╝
██╔══██║██║╚██╗██║██║ ██║██║╚██╔╝██║██╔══██║██║ ╚██╔╝
██║ ██║██║ ╚████║╚██████╔╝██║ ╚═╝ ██║██║ ██║███████╗██║
╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝
[ANOMALY-GRID v0.2.0] - SEQUENCE ANOMALY DETECTION ENGINE
A Rust library implementing variable-order Markov chains for sequence anomaly detection in finite alphabets.
What This Library Does
- Variable-Order Markov Models: Builds contexts of length 1 to max_order from training sequences
- Hierarchical Context Selection: Uses longest available context, falls back to shorter contexts
- Laplace Smoothing: Adds α=1.0 to all transition counts for probability estimation
- Shannon Entropy Calculation: Computes H(X) = -∑ P(x) log₂ P(x) for each context
- Sliding Window Detection: Analyzes test sequences using overlapping windows
- Parallel Batch Processing: Processes multiple sequences using Rayon
Known Limitations So Far
Memory Complexity: O(|alphabet|^max_order)
This grows exponentially:
| Alphabet Size | Max Order | Theoretical Contexts |
|---|---|---|
| 10 | 3 | ~1,100 |
| 20 | 3 | ~8,400 |
| 20 | 4 | ~168,000 |
| 50 | 3 | ~132,500 |
| 100 | 3 | ~1,010,000 |
Memory usage depends on actual sequence patterns and cannot be predicted without profiling your specific data.
Algorithm Limitations
- Standard Markov Model: Basic implementation, no theoretical optimality (Will see if it is worth it)
- Greedy Context Selection: Uses longest available context for now (not optimal)
- Fixed Smoothing: α=1.0 Laplace smoothing (not adaptive for now)
- No Compression: Does not implement Context Tree Weighting or similar algorithms for now.
Practical Constraints
- Memory Bound: Will fail on large state spaces
- Sequential Processing: Not designed for real-time high-volume streams (No plans to change this by my own, it is too much)
Installation
[]
= "0.2.0"
Basic Usage
use *;
API Reference
AnomalyDetector
// Constructor - returns Result due to validation
AnomalyScore
Mathematical Implementation
Probability Estimation
P(next_state | context) = (count + α) / (total_count + α × vocab_size)
where α = 1.0 (Laplace smoothing)
Shannon Entropy
H(X) = -∑ P(x) log₂ P(x)
Context Selection
- Try context of length max_order
- If no transitions found, try max_order-1
- Continue until context of length 1
- If still no match, use uniform probability
Anomaly Scoring
anomaly_strength = tanh((log_likelihood_component × 0.7 + info_score × 0.3) / 10.0)
Complexity Analysis
Time Complexity (Theoretical)
- Training: O(n × max_order × |alphabet|)
- Detection: O(m × max_order) where m = test sequence length
- Memory: O(|alphabet|^max_order)
Note: Actual performance depends heavily on your data characteristics, sequence patterns, and system resources. Profile your specific use case.
Configuration
Choosing max_order
- Order 2: Lower memory usage, captures immediate dependencies
- Order 3: Higher memory usage, captures longer patterns
- Order 4+: Exponentially higher memory usage
Recommendation: Start with order 2, measure memory usage, then increase if needed.
Choosing threshold
- Lower values (0.001): More sensitive, more detections
- Higher values (0.1): Less sensitive, fewer detections
Recommendation: Experiment with your data to find appropriate values.
Memory Estimation
Note: Actual context count depends on sequence patterns and may be much lower.
Performance Monitoring
The library includes performance monitoring:
let mut detector = new?;
detector.train?;
// Get performance metrics
let metrics = detector.performance_metrics;
println!;
println!;
println!;
// Detection with monitoring
let anomalies = detector.detect_anomalies_with_monitoring?;
let updated_metrics = detector.performance_metrics;
println!;
Suitable Use Cases
Potentially Good Fit
- System logs with limited event types
- Network protocols with small command sets
- User workflows with simple action sequences
- IoT sensors with categorical states
- Educational purposes and algorithm learning
Likely Poor Fit
- NLPs (large vocabulary)
- High-resolution sensor data (continuous values)
- Real-time processing (high-volume streams)
- Large state spaces (many unique states)
Recommendation: Test with your specific data to determine suitability.
Error Handling
All functions return AnomalyGridResult<T> which is Result<T, AnomalyGridError>.
Common errors:
InvalidMaxOrder: max_order = 0SequenceTooShort: sequence length < min_sequence_lengthInvalidThreshold: threshold not in [0,1]EmptyContextTree: detection attempted before trainingMemoryLimitExceeded: too many contexts created
Dependencies
[]
= "1.10.0" # Parallel processing only
Minimal dependencies - only Rayon for optional parallel batch processing.
Testing
# Run all tests
# Run comprehensive validation
# Run examples
Version History
v0.2.0
- Major refactoring: Encapsulation of the main functions and a bigger set of tests.
- Documentation reduction: Difficulty to keep documenting everything in text outside the code got chaotic, reduced for simplicity but provided enough examples and comments in the code.
- API consistency: All constructors return
Result<T, AnomalyGridError>
Benchmarking Your Use Case
To evaluate if this library suits your needs:
- Start small: Test with a subset of your data
- Monitor memory: Use the performance metrics
- Profile thoroughly: Test with realistic data volumes
- Compare alternatives: Benchmark against other solutions
When This Library May Not Be Suitable
- Production systems requiring guaranteed performance
- Applications with strict memory constraints
- Real-time systems with strict latency requirements
- Systems requiring theoretical optimality guarantees
- Applications needing adaptive or online learning
License
MIT License - see LICENSE file.