Karma
A sophisticated and efficient Hidden Markov Model (HMM) implementation in Rust using the Baum-Welch algorithm for training and the Forward algorithm for evaluation.
Features
- Modern Rust: Built with Rust 2021 edition, idiomatic code, and zero unsafe code
- Type-Safe: Comprehensive error handling with custom error types
- Efficient: Optimized matrix operations with minimal allocations and workspace reuse
- Well-Tested: Extensive unit tests, integration tests, and property-based tests
- Builder Pattern: Ergonomic API with builder pattern for flexible construction
- Serialization: Optional serde support for model persistence
- Documentation: Comprehensive API documentation with examples
- Benchmarked: Performance benchmarks included for tracking optimizations
What are Hidden Markov Models?
Hidden Markov Models (HMMs) are statistical models used to represent systems that transition between hidden states while producing observable outputs. They're widely used in:
- Speech Recognition: Modeling phonemes and words
- Natural Language Processing: Part-of-speech tagging, named entity recognition
- Bioinformatics: DNA sequence analysis, protein structure prediction
- Time Series Analysis: Financial modeling, anomaly detection
- Pattern Recognition: Gesture recognition, activity recognition
An HMM consists of:
- Hidden States: The underlying states the system can be in (not directly observable)
- Observations: The visible outputs produced by the system
- Probabilities:
- Initial state probabilities (π): Probability of starting in each state
- Transition probabilities (A): Probability of transitioning from one state to another
- Emission probabilities (B): Probability of observing a symbol from a given state
Installation
Add this to your Cargo.toml:
[]
= "0.2"
For serialization support:
[]
= { = "0.2", = ["serde"] }
Quick Start
use HiddenMarkovModel;
Usage Examples
Basic Training and Evaluation
use HiddenMarkovModel;
// Create model with 20 states and 10 observations
let mut hmm = new?;
// Optionally randomize initial probabilities to break symmetry
hmm.randomize_initial_probabilities;
// Train on multiple sequences
let training_data = vec!;
for sequence in training_data
// Evaluate test sequences
let prob = hmm.evaluate?;
println!;
Using the Builder Pattern
use HiddenMarkovModel;
let mut hmm = builder
.states
.observations
.randomize // Randomize initial probabilities
.build?;
// Train and evaluate as normal
hmm.train?;
Pattern Classification
Train an HMM to distinguish between different types of sequences:
use HiddenMarkovModel;
let mut hmm = new?;
// Train on alternating patterns
for _ in 0..20
// Train on sequential patterns
for _ in 0..20
// Classify new sequences
let prob_alternating = hmm.evaluate?;
let prob_sequential = hmm.evaluate?;
// Higher probability indicates better match to training data
Model Persistence (with serde feature)
use HiddenMarkovModel;
let mut hmm = new?;
hmm.train?;
// Serialize to JSON
let json = to_string?;
// Save to file
write?;
// Load from file
let loaded_json = read_to_string?;
let loaded_hmm: HiddenMarkovModel = from_str?;
API Overview
HiddenMarkovModel
Construction
new(n_states, n_observations)- Create with uniform initial probabilitiesbuilder()- Get a builder for custom configuration
Training
train(&observations, learning_rate)- Train on an observation sequenceobservations: Slice of observation indices[0, n_observations)learning_rate: Optional f64 in(0.0, 1.0], defaults to0.05- Returns
Result<(), HmmError>
Evaluation
evaluate(&observations)- Compute P(observations | model)- Returns
Result<f64, HmmError>
- Returns
Utilities
randomize_initial_probabilities()- Break symmetry in initial state probabilitiesn_states()- Get number of hidden statesn_observations()- Get number of possible observationsinitial_probabilities()- Get initial state probability vector πtransition_probabilities()- Get state transition matrix Aemission_probabilities()- Get emission probability matrix B
Error Handling
All operations return Result<T, HmmError> with detailed error messages:
use ;
let mut hmm = new?;
match hmm.train
Algorithm Details
Baum-Welch Algorithm (Training)
The Baum-Welch algorithm is an Expectation-Maximization (EM) algorithm for estimating HMM parameters:
- Forward Pass (α): Compute forward probabilities α_t(i) = P(o_1...o_t, q_t=i | λ)
- Backward Pass (β): Compute backward probabilities β_t(i) = P(o_{t+1}...o_T | q_t=i, λ)
- E-Step: Compute state occupation probabilities (γ) and state transition probabilities (ξ)
- M-Step: Update model parameters (π, A, B) using computed probabilities
The learning rate controls how much the parameters are updated in each iteration (like a step size in gradient descent).
Forward Algorithm (Evaluation)
Efficiently computes P(observations | model) using dynamic programming:
- Initialization: α_0(i) = π_i × b_i(o_0)
- Recursion: α_t(j) = [Σ_i α_{t-1}(i) × a_{ij}] × b_j(o_t)
- Termination: P(O|λ) = Σ_i α_T(i)
Time complexity: O(N² × T) where N is the number of states and T is the sequence length.
Performance
The implementation is optimized for performance:
- Workspace Reuse: Pre-allocated workspace that grows as needed
- Cache-Friendly: Contiguous memory layout for better cache performance
- Minimal Allocations: Reuses buffers across training iterations
- Double-Buffering: Alternates between buffers in forward algorithm to avoid copies
Run benchmarks with:
Example benchmark results (will vary by hardware):
model_creation/new/20 time: [125 ns ... 135 ns]
training/20states_10obs time: [45 μs ... 50 μs]
evaluation/100 time: [8 μs ... 9 μs]
Examples
Run the included examples:
# Basic usage example
# Pattern classification example
# Advanced: DNA sequence analysis (bioinformatics)
Featured Example: DNA Sequence Analysis 🧬
The dna_sequence_analysis example demonstrates a sophisticated real-world application: identifying CpG islands in DNA sequences using HMMs. This example shows:
- Real bioinformatics application - CpG island detection for gene annotation
- Automatic pattern learning - No manual feature engineering required
- Clear discrimination - Achieves 17+ orders of magnitude separation between sequence types
- Comprehensive analysis - From data generation to classification to biological insights
- Educational value - Explains genomic concepts and dinucleotide patterns
Perfect for understanding how HMMs are used in computational biology!
Testing
The project includes comprehensive tests:
# Run unit and integration tests
# Run tests with output
# Run property-based tests
# Run with all features
Comparison to Original Implementation
This is a complete rewrite of the original 2017 implementation with significant improvements:
Modernizations
| Aspect | Old (2017) | New (2025) |
|---|---|---|
| Rust Edition | 2015 | 2021 |
| rand version | 0.3.15 (2016) | 0.8 (2024) |
| Error Handling | Panics | Result types with custom errors |
| API Design | Direct struct access | Builder pattern + getters |
| Type Safety | Mixed i64/usize | Consistent usize for indices |
| Memory | Box<Vec> | Direct Vec (more efficient) |
| Documentation | Minimal | Comprehensive with examples |
| Tests | 1 basic test | 25+ tests + property-based |
| Benchmarks | None | Full criterion benchmark suite |
| Dependencies | rand only | +thiserror, +serde (optional) |
Code Quality Improvements
- Idiomatic Rust: Follows Rust API guidelines and best practices
- Type Safety: Proper use of usize for indexing, no unnecessary conversions
- Error Handling: Custom error types with descriptive messages
- Documentation: Full rustdoc documentation with examples
- Testing: Unit tests, integration tests, and property-based tests with proptest
- Performance: Benchmarked and optimized hot paths
API Changes
Old API:
extern crate karma;
let mut hmm = new;
hmm.randomize;
hmm.train; // Takes &Vec, no error handling
let prob = hmm.evaluate; // Returns f64 directly
New API:
use HiddenMarkovModel;
let mut hmm = new?;
hmm.randomize_initial_probabilities;
hmm.train?; // Takes &[usize], returns Result
let prob = hmm.evaluate?; // Returns Result<f64>
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
# Clone the repository
# Run tests
# Run benchmarks
# Build documentation
License
MIT License - see LICENSE file for details.
Copyright (c) 2017 Kenan Sulayman Copyright (c) 2017 The PsychonautWiki Contributors Copyright (c) 2025 Modern Rust Implementation
References
- Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition"
- Baum, L. E.; Petrie, T.; Soules, G.; Weiss, N. (1970). "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains"
- Original JavaScript reference implementation
Acknowledgments
Original implementation by Kenan Sulayman and PsychonautWiki contributors. This modern rewrite maintains the core algorithms while bringing the codebase up to modern Rust standards.
Made with Rust 🦀