Anomaly Grid
█████╗ ███╗ ██╗ ██████╗ ███╗ ███╗ █████╗ ██╗ ██╗ ██╗
██╔══██╗████╗ ██║██╔═══██╗████╗ ████║██╔══██╗██║ ╚██╗ ██╔╝
███████║██╔██╗ ██║██║ ██║██╔████╔██║███████║██║ ╚████╔╝
██╔══██║██║╚██╗██║██║ ██║██║╚██╔╝██║██╔══██║██║ ╚██╔╝
██║ ██║██║ ╚████║╚██████╔╝██║ ╚═╝ ██║██║ ██║███████╗██║
╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝
[ANOMALY-GRID v0.2.2] - SEQUENCE ANOMALY DETECTION ENGINE
A Rust library implementing variable-order Markov chains for sequence anomaly detection in finite alphabets.
What This Library Does
- Variable-Order Markov Models: Builds contexts of length 1 to max_order from training sequences
- Hierarchical Context Selection: Uses longest available context, falls back to shorter contexts
- Configurable Laplace Smoothing: Adjustable α parameter for probability estimation (default: 1.0)
- Information-Theoretic Scoring: Shannon entropy and KL divergence with lazy computation
- Memory-Optimized Storage: String interning, trie-based contexts, and small collections
- Sliding Window Detection: Analyzes test sequences using overlapping windows
- Parallel Batch Processing: Processes multiple sequences using Rayon
Known Limitations So Far
Memory Complexity: O(|alphabet|^max_order) theoretical, optimized in practice
Measured results with substantial datasets:
| Scenario | Alphabet Size | Max Order | Elements Processed | Theoretical Max | Actual Contexts | Memory Usage | Efficiency |
|---|---|---|---|---|---|---|---|
| IoT Monitoring | 20 | 3 | 1.6M | 8,420 | 830 | 17.4 MB | 9.9% |
| Security Analysis | 50 | 3 | 1.8M | 127,550 | 19,867 | 352.7 MB | 15.6% |
| Enterprise Monitor | 100 | 2 | 2.4M | 10,100 | 2,892 | 89.9 MB | 28.6% |
| Pattern Detection | 30 | 4 | 2.2M | 837,930 | 8,028 | 135.5 MB | 1.0% |
Memory optimizations reduce actual usage:
- String interning reduces duplicate string storage
- Trie-based storage shares common context prefixes
- On-demand computation avoids storing precomputed values
- Small collections use memory-efficient data structures
Algorithm Limitations
- Standard Markov Model: Basic implementation, no theoretical optimality guarantees
- Greedy Context Selection: Uses longest available context (hierarchical fallback)
- Configurable Smoothing: Laplace smoothing with configurable α parameter (default: 1.0)
- Prefix Compression: Uses trie-based storage for memory efficiency
Practical Constraints
- Memory Bound: Will fail on large state spaces
- Sequential Processing: Not designed for real-time high-volume streams (No plans to change this by my own, it is too much)
Installation
[]
= "0.2.2"
Basic Usage
use *;
API Reference
AnomalyDetector
// Constructors - return Result due to validation
AnomalyScore
Mathematical Implementation
Probability Estimation
P(next_state | context) = (count + α) / (total_count + α × vocab_size)
where α is configurable (default: 1.0, Laplace smoothing)
Shannon Entropy
H(X) = -∑ P(x) log₂ P(x)
Context Selection
- Try context of length max_order
- If no transitions found, try max_order-1
- Continue until context of length 1
- If still no match, use uniform probability
Anomaly Scoring
anomaly_strength = tanh((log_likelihood_component × likelihood_weight + info_score × information_weight) / normalization_factor)
where weights are configurable (default: likelihood_weight=0.7, information_weight=0.3, normalization_factor=10.0)
Complexity Analysis
Time Complexity (Theoretical)
- Training: O(n × max_order × |alphabet|)
- Detection: O(m × max_order) where m = test sequence length
- Memory: O(|alphabet|^max_order)
Note: Actual performance depends heavily on your data characteristics, sequence patterns, and system resources. Profile your specific use case.
Configuration
Choosing max_order
- Order 2: Lower memory usage, captures immediate dependencies
- Order 3: Higher memory usage, captures longer patterns
- Order 4+: Exponentially higher memory usage
Recommendation: Start with order 2, measure memory usage, then increase if needed.
Choosing threshold
- Lower values (0.001): More sensitive, more detections
- Higher values (0.1): Less sensitive, fewer detections
Recommendation: Experiment with your data to find appropriate values.
Memory Estimation
Note: Actual context count depends on sequence patterns and may be much lower.
Performance Monitoring
The library includes comprehensive performance monitoring:
let mut detector = new?;
detector.train?;
// Get performance metrics after training
let metrics = detector.performance_metrics;
println!;
println!;
println!;
// Detection with monitoring (updates metrics)
let anomalies = detector.detect_anomalies_with_monitoring?;
let updated_metrics = detector.performance_metrics;
println!;
// Apply memory optimizations
use OptimizationConfig;
let opt_config = default;
detector.optimize?;
Suitable Use Cases
Potentially Good Fit
- System logs with limited event types
- Network protocols with small command sets
- User workflows with simple action sequences
- IoT sensors with categorical states
- Educational purposes and algorithm learning
Likely Poor Fit
- NLPs (large vocabulary)
- High-resolution sensor data (continuous values)
- Real-time processing (high-volume streams)
- Large state spaces (many unique states)
Recommendation: Test with your specific data to determine suitability.
Error Handling
All functions return AnomalyGridResult<T> which is Result<T, AnomalyGridError>.
Common errors:
InvalidMaxOrder: max_order = 0SequenceTooShort: sequence length < min_sequence_lengthInvalidThreshold: threshold not in [0,1]EmptyContextTree: detection attempted before trainingMemoryLimitExceeded: too many contexts created
Dependencies
[]
= "1.10.0" # Parallel batch processing
= "1.13.0" # Memory-efficient small collections
Minimal dependencies for core functionality and memory optimization.
Testing
# Run all tests
# Run comprehensive validation
# Run examples
Version History
v0.2.2
- String interning for duplicate elimination
- Trie-based context storage with prefix sharing
- On-demand computation with lazy evaluation and caching
- Small collections optimization using SmallVec
- Memory pooling infrastructure
- Enhanced Configuration: Configurable smoothing, weights, and memory limits
- Expanded Test Suite: 72 tests with mathematical validation
v0.2.0
- Major refactoring: Encapsulation of the main functions and a bigger set of tests
- Documentation reduction: Difficulty to keep documenting everything in text outside the code got chaotic, reduced for simplicity but provided enough examples and comments in the code
- API consistency: All constructors return
Result<T, AnomalyGridError>
Benchmarking Your Use Case
To evaluate if this library suits your needs:
- Start small: Test with a subset of your data
- Monitor memory: Use the performance metrics
- Profile thoroughly: Test with realistic data volumes
- Compare alternatives: Benchmark against other solutions
When This Library May Not Be Suitable
- Production systems requiring guaranteed performance
- Applications with strict memory constraints
- Real-time systems with strict latency requirements
- Systems requiring theoretical optimality guarantees
- Applications needing adaptive or online learning
License
MIT License - see LICENSE file.