anomaly-grid
█████╗ ███╗ ██╗ ██████╗ ███╗ ███╗ █████╗ ██╗ ██╗ ██╗
██╔══██╗████╗ ██║██╔═══██╗████╗ ████║██╔══██╗██║ ╚██╗ ██╔╝
███████║██╔██╗ ██║██║ ██║██╔████╔██║███████║██║ ╚████╔╝
██╔══██║██║╚██╗██║██║ ██║██║╚██╔╝██║██╔══██║██║ ╚██╔╝
██║ ██║██║ ╚████║╚██████╔╝██║ ╚═╝ ██║██║ ██║███████╗██║
╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝
[ANOMALY-GRID v0.1.5] - SEQUENCE ANOMALY DETECTION ENGINE
Sequential pattern analysis through variable-order Markov chains with spectral decomposition and quantum state modeling. Built for detecting deviations in finite-alphabet sequences.
⚠️ Development Status This library is currently in active development and represents ongoing improve of my knowledge in advanced anomaly detection methodologies. While the core algorithms are mathematically sound and extensively tested, there are areas that require further optimization and refinement. I acknowledge that complex mathematical implementations can present edge cases and unexpected behaviors. If you encounter any issues, inconsistencies, or have suggestions for improvement, please don't hesitate to reach me out. Your feedback is invaluable for enhancing the library's robustness and reliability. Known areas for improvement:
Spectral analysis convergence in edge cases Memory optimization for large state spaces Performance tuning for specific use cases Documentation clarity and completeness
Contact: Please file issues on my repository or reach out directly for technical discussions, bug reports, or collaboration opportunities. I am committed to continuous improvement and appreciate your patience as I or we (hopefully) refine this research implementation.
🚀 Quick Start
use *;
🔬 Core Technology Stack
Mathematical Foundation
- Variable-Order Markov Models: Context Tree Weighting with adaptive order selection
- Spectral Analysis: Eigenvalue decomposition of transition matrices with robust convergence
- Information Theory: Shannon entropy, KL divergence, and surprise quantification
- Quantum Modeling: Superposition states with entropy-based phase encoding -- Highly speculative and naive implementations, will be removing in latter versions and experiment somewhere else
- Topological Features: Simplified persistent homology and clustering analysis
Multi-Dimensional Scoring
Each anomaly receives 5 independent scores:
- Likelihood Score:
prob / sqrt(support)- Lower = more anomalous - Information Score:
(surprise + entropy) / length- Higher = more anomalous - Spectral Score:
|observed - stationary|- Deviation from equilibrium - Quantum Coherence:
1 - trace/n_states- Superposition measurement -- Same of what was stated above about these naive implementations. - Topological Signature:
[components, cycles, clustering]- Structural complexity
🎯 Proven Use Cases
Network Security
// Port scan detection
let normal_traffic = vec!;
let attack_pattern = vec!;
User Behavior Analysis
// Privilege escalation detection
let normal_session = vec!;
let suspicious_session = vec!;
Financial Fraud
// Velocity attack detection
let normal_transactions = vec!;
let fraud_pattern = vec!;
System Monitoring
// Service crash detection
let normal_logs = vec!;
let anomalous_logs = vec!;
Bioinformatics
// DNA mutation detection
let normal_gene = vec!;
let mutation = vec!;
⚡ Performance Characteristics
Computational Complexity
Training: O(n × k × order) where n=sequence_length, k=alphabet_size
Detection: O(m × k × log(k)) where m=test_length
Memory: O(k^order) exponential in context depth
Parallel Processing
// Batch analysis across multiple sequences
let sequences = vec!;
let results = batch_process_sequences;
// Processes all sequences in parallel using Rayon
🛠️ Installation & Dependencies
[]
= "0.1.5"
# Or add manually:
= "0.33.2" # Linear algebra operations
= "0.16.1" # N-dimensional arrays
= "1.10.0" # Parallel processing
📊 Advanced Usage
Model Configuration
// Recommended parameters for different scenarios
let network_detector = new; // Network protocols
let user_detector = new; // User sessions
let financial_detector = new; // Transactions
let bio_detector = new; // DNA sequences
Training Requirements
// Minimum data requirements for stable analysis
let min_sequence_length = 20 * max_order; // Statistical significance
let min_examples_per_symbol = 5; // Reliable probability estimates
let recommended_alphabet_size = 10..=50; // Memory vs. expressiveness trade-off
Result Interpretation
for anomaly in anomalies
🧪 Testing & Validation
Comprehensive Test Suite
# Run all tests with detailed output
# Individual test categories
Mathematical Validation
The library automatically validates:
- Probability Conservation: All context probabilities sum to 1.0
- Entropy Bounds: 0 ≤ entropy ≤ log₂(alphabet_size)
- Spectral Stability: Eigenvalue convergence within tolerance
- Numerical Precision: No NaN/infinity propagation
🚨 Known Limitations
Memory Scaling
// Memory usage grows exponentially with context order
let contexts_10_3 = 10_usize.pow; // 1,000 contexts
let contexts_50_3 = 50_usize.pow; // 125,000 contexts
let contexts_10_5 = 10_usize.pow; // 100,000 contexts
// Recommended limits:
assert!;
assert!;
assert!;
Spectral Analysis Constraints
- Matrix Conditioning: Large/sparse matrices may have unstable eigenvalues
- Convergence Issues: Disconnected graphs may not reach stationary distribution
- Computational Cost: O(n³) eigenvalue decomposition for n states
Quantum Features Disclaimer - Speculative implementations
- Simplified Implementation: Not full quantum computation, this is highly speculative for many reason in that area, don't crucify me xD
- Phase Encoding: Based on classical entropy values only
- Coherence Measure: Approximation of true quantum coherence
In later versions this will be avoided completely since I will continue experimenting with it on a sepatate, we can enjoy some naive implementations :)
🔧 Configuration Tuning
Sensitivity vs. False Positives
let threshold = match use_case ;
Example:
use ;
Memory Optimization
// For large alphabets, consider preprocessing:
Example:
//I set up the helper function like this
Skip if you understood based on the code, I am the worst programmer in the world so I would not be surprised, so for normies like me, here is the memory opt explanation:
The key benefit of reduce_alphabet for memory opt
comes when this processed_sequence_data is then used to build
data structures that depend on the uniqueness of the elements,
such as a HashMap for contexts in a Markov model.
So if you were to build a HashMap<Vec<String>, usize>
to count occurrences of different patterns:
Without reduce_alphabet, "HTTP_GET", "HTTP_POST", and "HTTP_PUT"
would be distinct keys. With reduce_alphabet, they all become
"HTTP_REQUEST", reducing the number of unique keys and with that
the memory consumed by the HashMap and its associated data.
Example (conceptual, assumes you use our AdvancedTransitionModel or similar): let mut model = AdvancedTransitionModel::new(3); model.build_context_tree(&processed_sequence_data).unwrap(); (This step would use less memory than if raw_sequence_data was used)
The 'AdvancedTransitionModel' (from this lib) internally builds
a context tree using a HashMap to store ContextNodes. Each ContextNode
also contains HashMaps for counts and probabilities.
By reducing the alphabet, you directly decrease the number of unique 'states' that appear in these HashMaps, leading to: 1. Fewer entries in the top-level 'contexts' HashMap. 2. Fewer entries in the 'counts' and 'probabilities' HashMaps within each 'ContextNode'. This reduces the overall memory footprint of the model, especially for high-order Markov models and long sequences with many distinct original states.
Performance Optimization
// Use batch processing for multiple sequences
let results = sequences
.par_iter // Parallel processing
.map
.collect;
Example:
use ;
📚 Documentation
- User Manual: Comprehensive developer guide with examples
- API Documentation: Generated from source code
📈 Roadmap
Version 0.2.0 (Planned)
- Streaming anomaly detection for real-time systems
- Advanced topological analysis with true persistent homology
- GPU acceleration for large-scale datasets
- Integration with popular ML frameworks (PyTorch, TensorFlow)
Version 0.3.0 (Future)
- Distributed processing across multiple machines
- Advanced quantum algorithms for state analysis
- Automated hyperparameter optimization
- Web-based visualization dashboard
🤝 Contributing
# Development setup
# Run comprehensive benchmarks
📄 License
Licensed under the MIT License. See LICENCE for details.