# Karma
[](https://crates.io/crates/karma)
[](https://docs.rs/karma)
[](LICENSE)
A sophisticated and efficient Hidden Markov Model (HMM) implementation in Rust using the Baum-Welch algorithm for training and the Forward algorithm for evaluation.
## Features
- **Modern Rust**: Built with Rust 2021 edition, idiomatic code, and zero unsafe code
- **Type-Safe**: Comprehensive error handling with custom error types
- **Efficient**: Optimized matrix operations with minimal allocations and workspace reuse
- **Well-Tested**: Extensive unit tests, integration tests, and property-based tests
- **Builder Pattern**: Ergonomic API with builder pattern for flexible construction
- **Serialization**: Optional serde support for model persistence
- **Documentation**: Comprehensive API documentation with examples
- **Benchmarked**: Performance benchmarks included for tracking optimizations
## What are Hidden Markov Models?
Hidden Markov Models (HMMs) are statistical models used to represent systems that transition between hidden states while producing observable outputs. They're widely used in:
- **Speech Recognition**: Modeling phonemes and words
- **Natural Language Processing**: Part-of-speech tagging, named entity recognition
- **Bioinformatics**: DNA sequence analysis, protein structure prediction
- **Time Series Analysis**: Financial modeling, anomaly detection
- **Pattern Recognition**: Gesture recognition, activity recognition
An HMM consists of:
- **Hidden States**: The underlying states the system can be in (not directly observable)
- **Observations**: The visible outputs produced by the system
- **Probabilities**:
- Initial state probabilities (π): Probability of starting in each state
- Transition probabilities (A): Probability of transitioning from one state to another
- Emission probabilities (B): Probability of observing a symbol from a given state
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
karma = "0.2"
```
For serialization support:
```toml
[dependencies]
karma = { version = "0.2", features = ["serde"] }
```
## Quick Start
```rust
use karma::HiddenMarkovModel;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create an HMM with 3 hidden states and 4 possible observations
let mut hmm = HiddenMarkovModel::new(3, 4)?;
// Train on observation sequences
hmm.train(&[0, 1, 2, 3, 1, 0], None)?;
hmm.train(&[1, 2, 1, 0], Some(0.1))?; // Custom learning rate
// Evaluate probability of a new sequence
let probability = hmm.evaluate(&[0, 1, 2])?;
println!("P(sequence|model) = {}", probability);
Ok(())
}
```
## Usage Examples
### Basic Training and Evaluation
```rust
use karma::HiddenMarkovModel;
// Create model with 20 states and 10 observations
let mut hmm = HiddenMarkovModel::new(20, 10)?;
// Optionally randomize initial probabilities to break symmetry
hmm.randomize_initial_probabilities();
// Train on multiple sequences
let training_data = vec![
vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
vec![0, 1, 4, 5, 7, 9, 0, 4, 7, 8, 2],
];
for sequence in training_data {
hmm.train(&sequence, Some(0.05))?; // 0.05 learning rate
}
// Evaluate test sequences
let prob = hmm.evaluate(&[5, 6, 7, 8])?;
println!("Probability: {}", prob);
```
### Using the Builder Pattern
```rust
use karma::HiddenMarkovModel;
let mut hmm = HiddenMarkovModel::builder()
.states(10)
.observations(5)
.randomize(true) // Randomize initial probabilities
.build()?;
// Train and evaluate as normal
hmm.train(&[0, 1, 2, 3, 4], None)?;
```
### Pattern Classification
Train an HMM to distinguish between different types of sequences:
```rust
use karma::HiddenMarkovModel;
let mut hmm = HiddenMarkovModel::new(15, 4)?;
// Train on alternating patterns
for _ in 0..20 {
hmm.train(&[0, 1, 0, 1, 0, 1], Some(0.1))?;
}
// Train on sequential patterns
for _ in 0..20 {
hmm.train(&[0, 1, 2, 3], Some(0.1))?;
}
// Classify new sequences
let prob_alternating = hmm.evaluate(&[0, 1, 0, 1])?;
let prob_sequential = hmm.evaluate(&[0, 1, 2, 3])?;
// Higher probability indicates better match to training data
```
### Model Persistence (with serde feature)
```rust
use karma::HiddenMarkovModel;
let mut hmm = HiddenMarkovModel::new(5, 3)?;
hmm.train(&[0, 1, 2, 0, 1, 2], Some(0.1))?;
// Serialize to JSON
let json = serde_json::to_string(&hmm)?;
// Save to file
std::fs::write("model.json", json)?;
// Load from file
let loaded_json = std::fs::read_to_string("model.json")?;
let loaded_hmm: HiddenMarkovModel = serde_json::from_str(&loaded_json)?;
```
## API Overview
### `HiddenMarkovModel`
#### Construction
- `new(n_states, n_observations)` - Create with uniform initial probabilities
- `builder()` - Get a builder for custom configuration
#### Training
- `train(&observations, learning_rate)` - Train on an observation sequence
- `observations`: Slice of observation indices `[0, n_observations)`
- `learning_rate`: Optional f64 in `(0.0, 1.0]`, defaults to `0.05`
- Returns `Result<(), HmmError>`
#### Evaluation
#### Utilities
- `randomize_initial_probabilities()` - Break symmetry in initial state probabilities
- `n_states()` - Get number of hidden states
- `n_observations()` - Get number of possible observations
- `initial_probabilities()` - Get initial state probability vector π
- `transition_probabilities()` - Get state transition matrix A
- `emission_probabilities()` - Get emission probability matrix B
### Error Handling
All operations return `Result<T, HmmError>` with detailed error messages:
```rust
use karma::{HiddenMarkovModel, HmmError};
let mut hmm = HiddenMarkovModel::new(3, 4)?;
match hmm.train(&[0, 1, 5], None) {
Ok(_) => println!("Training successful"),
Err(HmmError::InvalidObservation(obs, max)) => {
println!("Invalid observation {} (max: {})", obs, max);
}
Err(e) => println!("Error: {}", e),
}
```
## Algorithm Details
### Baum-Welch Algorithm (Training)
The Baum-Welch algorithm is an Expectation-Maximization (EM) algorithm for estimating HMM parameters:
3. **E-Step**: Compute state occupation probabilities (γ) and state transition probabilities (ξ)
4. **M-Step**: Update model parameters (π, A, B) using computed probabilities
The learning rate controls how much the parameters are updated in each iteration (like a step size in gradient descent).
### Forward Algorithm (Evaluation)
1. **Initialization**: α_0(i) = π_i × b_i(o_0)
2. **Recursion**: α_t(j) = [Σ_i α_{t-1}(i) × a_{ij}] × b_j(o_t)
3. **Termination**: P(O|λ) = Σ_i α_T(i)
Time complexity: O(N² × T) where N is the number of states and T is the sequence length.
## Performance
The implementation is optimized for performance:
- **Workspace Reuse**: Pre-allocated workspace that grows as needed
- **Cache-Friendly**: Contiguous memory layout for better cache performance
- **Minimal Allocations**: Reuses buffers across training iterations
- **Double-Buffering**: Alternates between buffers in forward algorithm to avoid copies
Run benchmarks with:
```bash
cargo bench
```
Example benchmark results (will vary by hardware):
```
model_creation/new/20 time: [125 ns ... 135 ns]
training/20states_10obs time: [45 μs ... 50 μs]
evaluation/100 time: [8 μs ... 9 μs]
```
## Examples
Run the included examples:
```bash
# Basic usage example
cargo run --example basic_usage
# Pattern classification example
cargo run --example classification
# Advanced: DNA sequence analysis (bioinformatics)
cargo run --example dna_sequence_analysis
```
### Featured Example: DNA Sequence Analysis 🧬
The `dna_sequence_analysis` example demonstrates a sophisticated real-world application: identifying CpG islands in DNA sequences using HMMs. This example shows:
- **Real bioinformatics application** - CpG island detection for gene annotation
- **Automatic pattern learning** - No manual feature engineering required
- **Clear discrimination** - Achieves 17+ orders of magnitude separation between sequence types
- **Comprehensive analysis** - From data generation to classification to biological insights
- **Educational value** - Explains genomic concepts and dinucleotide patterns
Perfect for understanding how HMMs are used in computational biology!
## Testing
The project includes comprehensive tests:
```bash
# Run unit and integration tests
cargo test
# Run tests with output
cargo test -- --nocapture
# Run property-based tests
cargo test prop_
# Run with all features
cargo test --all-features
```
## Comparison to Original Implementation
This is a complete rewrite of the original 2017 implementation with significant improvements:
### Modernizations
| Rust Edition | 2015 | 2021 |
| rand version | 0.3.15 (2016) | 0.8 (2024) |
| Error Handling | Panics | Result types with custom errors |
| API Design | Direct struct access | Builder pattern + getters |
| Type Safety | Mixed i64/usize | Consistent usize for indices |
| Memory | Box<Vec> | Direct Vec (more efficient) |
| Documentation | Minimal | Comprehensive with examples |
| Tests | 1 basic test | 25+ tests + property-based |
| Benchmarks | None | Full criterion benchmark suite |
| Dependencies | rand only | +thiserror, +serde (optional) |
### Code Quality Improvements
- **Idiomatic Rust**: Follows Rust API guidelines and best practices
- **Type Safety**: Proper use of usize for indexing, no unnecessary conversions
- **Error Handling**: Custom error types with descriptive messages
- **Documentation**: Full rustdoc documentation with examples
- **Testing**: Unit tests, integration tests, and property-based tests with proptest
- **Performance**: Benchmarked and optimized hot paths
### API Changes
Old API:
```rust
extern crate karma;
let mut hmm = karma::Karma::new(20, 10);
hmm.randomize();
hmm.train(&vec![0, 1, 2], None); // Takes &Vec, no error handling
let prob = hmm.evaluate(&vec![0, 1]); // Returns f64 directly
```
New API:
```rust
use karma::HiddenMarkovModel;
let mut hmm = HiddenMarkovModel::new(20, 10)?;
hmm.randomize_initial_probabilities();
hmm.train(&[0, 1, 2], None)?; // Takes &[usize], returns Result
let prob = hmm.evaluate(&[0, 1])?; // Returns Result<f64>
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/psychonautwiki/karma
cd karma
# Run tests
cargo test
# Run benchmarks
cargo bench
# Build documentation
cargo doc --open
```
## License
MIT License - see LICENSE file for details.
Copyright (c) 2017 Kenan Sulayman
Copyright (c) 2017 The PsychonautWiki Contributors
Copyright (c) 2025 Modern Rust Implementation
## References
- Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition"
- Baum, L. E.; Petrie, T.; Soules, G.; Weiss, N. (1970). "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains"
- [Original JavaScript reference implementation](https://github.com/psychonautwiki/karma/blob/master/poc/reference.js)
## Acknowledgments
Original implementation by Kenan Sulayman and PsychonautWiki contributors. This modern rewrite maintains the core algorithms while bringing the codebase up to modern Rust standards.
---
Made with Rust 🦀