karma 1.0.0

A sophisticated Hidden Markov Model (HMM) implementation using the Baum-Welch algorithm
Documentation

Karma

Crates.io Documentation License: MIT

A sophisticated and efficient Hidden Markov Model (HMM) implementation in Rust using the Baum-Welch algorithm for training and the Forward algorithm for evaluation.

Features

  • Modern Rust: Built with Rust 2021 edition, idiomatic code, and zero unsafe code
  • Type-Safe: Comprehensive error handling with custom error types
  • Efficient: Optimized matrix operations with minimal allocations and workspace reuse
  • Well-Tested: Extensive unit tests, integration tests, and property-based tests
  • Builder Pattern: Ergonomic API with builder pattern for flexible construction
  • Serialization: Optional serde support for model persistence
  • Documentation: Comprehensive API documentation with examples
  • Benchmarked: Performance benchmarks included for tracking optimizations

What are Hidden Markov Models?

Hidden Markov Models (HMMs) are statistical models used to represent systems that transition between hidden states while producing observable outputs. They're widely used in:

  • Speech Recognition: Modeling phonemes and words
  • Natural Language Processing: Part-of-speech tagging, named entity recognition
  • Bioinformatics: DNA sequence analysis, protein structure prediction
  • Time Series Analysis: Financial modeling, anomaly detection
  • Pattern Recognition: Gesture recognition, activity recognition

An HMM consists of:

  • Hidden States: The underlying states the system can be in (not directly observable)
  • Observations: The visible outputs produced by the system
  • Probabilities:
    • Initial state probabilities (π): Probability of starting in each state
    • Transition probabilities (A): Probability of transitioning from one state to another
    • Emission probabilities (B): Probability of observing a symbol from a given state

Installation

Add this to your Cargo.toml:

[dependencies]
karma = "0.2"

For serialization support:

[dependencies]
karma = { version = "0.2", features = ["serde"] }

Quick Start

use karma::HiddenMarkovModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create an HMM with 3 hidden states and 4 possible observations
    let mut hmm = HiddenMarkovModel::new(3, 4)?;

    // Train on observation sequences
    hmm.train(&[0, 1, 2, 3, 1, 0], None)?;
    hmm.train(&[1, 2, 1, 0], Some(0.1))?; // Custom learning rate

    // Evaluate probability of a new sequence
    let probability = hmm.evaluate(&[0, 1, 2])?;
    println!("P(sequence|model) = {}", probability);

    Ok(())
}

Usage Examples

Basic Training and Evaluation

use karma::HiddenMarkovModel;

// Create model with 20 states and 10 observations
let mut hmm = HiddenMarkovModel::new(20, 10)?;

// Optionally randomize initial probabilities to break symmetry
hmm.randomize_initial_probabilities();

// Train on multiple sequences
let training_data = vec![
    vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    vec![0, 1, 4, 5, 7, 9, 0, 4, 7, 8, 2],
];

for sequence in training_data {
    hmm.train(&sequence, Some(0.05))?; // 0.05 learning rate
}

// Evaluate test sequences
let prob = hmm.evaluate(&[5, 6, 7, 8])?;
println!("Probability: {}", prob);

Using the Builder Pattern

use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::builder()
    .states(10)
    .observations(5)
    .randomize(true) // Randomize initial probabilities
    .build()?;

// Train and evaluate as normal
hmm.train(&[0, 1, 2, 3, 4], None)?;

Pattern Classification

Train an HMM to distinguish between different types of sequences:

use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(15, 4)?;

// Train on alternating patterns
for _ in 0..20 {
    hmm.train(&[0, 1, 0, 1, 0, 1], Some(0.1))?;
}

// Train on sequential patterns
for _ in 0..20 {
    hmm.train(&[0, 1, 2, 3], Some(0.1))?;
}

// Classify new sequences
let prob_alternating = hmm.evaluate(&[0, 1, 0, 1])?;
let prob_sequential = hmm.evaluate(&[0, 1, 2, 3])?;

// Higher probability indicates better match to training data

Model Persistence (with serde feature)

use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(5, 3)?;
hmm.train(&[0, 1, 2, 0, 1, 2], Some(0.1))?;

// Serialize to JSON
let json = serde_json::to_string(&hmm)?;

// Save to file
std::fs::write("model.json", json)?;

// Load from file
let loaded_json = std::fs::read_to_string("model.json")?;
let loaded_hmm: HiddenMarkovModel = serde_json::from_str(&loaded_json)?;

API Overview

HiddenMarkovModel

Construction

  • new(n_states, n_observations) - Create with uniform initial probabilities
  • builder() - Get a builder for custom configuration

Training

  • train(&observations, learning_rate) - Train on an observation sequence
    • observations: Slice of observation indices [0, n_observations)
    • learning_rate: Optional f64 in (0.0, 1.0], defaults to 0.05
    • Returns Result<(), HmmError>

Evaluation

  • evaluate(&observations) - Compute P(observations | model)
    • Returns Result<f64, HmmError>

Utilities

  • randomize_initial_probabilities() - Break symmetry in initial state probabilities
  • n_states() - Get number of hidden states
  • n_observations() - Get number of possible observations
  • initial_probabilities() - Get initial state probability vector π
  • transition_probabilities() - Get state transition matrix A
  • emission_probabilities() - Get emission probability matrix B

Error Handling

All operations return Result<T, HmmError> with detailed error messages:

use karma::{HiddenMarkovModel, HmmError};

let mut hmm = HiddenMarkovModel::new(3, 4)?;

match hmm.train(&[0, 1, 5], None) {
    Ok(_) => println!("Training successful"),
    Err(HmmError::InvalidObservation(obs, max)) => {
        println!("Invalid observation {} (max: {})", obs, max);
    }
    Err(e) => println!("Error: {}", e),
}

Algorithm Details

Baum-Welch Algorithm (Training)

The Baum-Welch algorithm is an Expectation-Maximization (EM) algorithm for estimating HMM parameters:

  1. Forward Pass (α): Compute forward probabilities α_t(i) = P(o_1...o_t, q_t=i | λ)
  2. Backward Pass (β): Compute backward probabilities β_t(i) = P(o_{t+1}...o_T | q_t=i, λ)
  3. E-Step: Compute state occupation probabilities (γ) and state transition probabilities (ξ)
  4. M-Step: Update model parameters (π, A, B) using computed probabilities

The learning rate controls how much the parameters are updated in each iteration (like a step size in gradient descent).

Forward Algorithm (Evaluation)

Efficiently computes P(observations | model) using dynamic programming:

  1. Initialization: α_0(i) = π_i × b_i(o_0)
  2. Recursion: α_t(j) = [Σ_i α_{t-1}(i) × a_{ij}] × b_j(o_t)
  3. Termination: P(O|λ) = Σ_i α_T(i)

Time complexity: O(N² × T) where N is the number of states and T is the sequence length.

Performance

The implementation is optimized for performance:

  • Workspace Reuse: Pre-allocated workspace that grows as needed
  • Cache-Friendly: Contiguous memory layout for better cache performance
  • Minimal Allocations: Reuses buffers across training iterations
  • Double-Buffering: Alternates between buffers in forward algorithm to avoid copies

Run benchmarks with:

cargo bench

Example benchmark results (will vary by hardware):

model_creation/new/20    time: [125 ns ... 135 ns]
training/20states_10obs  time: [45 μs ... 50 μs]
evaluation/100           time: [8 μs ... 9 μs]

Examples

Run the included examples:

# Basic usage example
cargo run --example basic_usage

# Pattern classification example
cargo run --example classification

# Advanced: DNA sequence analysis (bioinformatics)
cargo run --example dna_sequence_analysis

Featured Example: DNA Sequence Analysis 🧬

The dna_sequence_analysis example demonstrates a sophisticated real-world application: identifying CpG islands in DNA sequences using HMMs. This example shows:

  • Real bioinformatics application - CpG island detection for gene annotation
  • Automatic pattern learning - No manual feature engineering required
  • Clear discrimination - Achieves 17+ orders of magnitude separation between sequence types
  • Comprehensive analysis - From data generation to classification to biological insights
  • Educational value - Explains genomic concepts and dinucleotide patterns

Perfect for understanding how HMMs are used in computational biology!

Testing

The project includes comprehensive tests:

# Run unit and integration tests
cargo test

# Run tests with output
cargo test -- --nocapture

# Run property-based tests
cargo test prop_

# Run with all features
cargo test --all-features

Comparison to Original Implementation

This is a complete rewrite of the original 2017 implementation with significant improvements:

Modernizations

Aspect Old (2017) New (2025)
Rust Edition 2015 2021
rand version 0.3.15 (2016) 0.8 (2024)
Error Handling Panics Result types with custom errors
API Design Direct struct access Builder pattern + getters
Type Safety Mixed i64/usize Consistent usize for indices
Memory Box<Vec> Direct Vec (more efficient)
Documentation Minimal Comprehensive with examples
Tests 1 basic test 25+ tests + property-based
Benchmarks None Full criterion benchmark suite
Dependencies rand only +thiserror, +serde (optional)

Code Quality Improvements

  • Idiomatic Rust: Follows Rust API guidelines and best practices
  • Type Safety: Proper use of usize for indexing, no unnecessary conversions
  • Error Handling: Custom error types with descriptive messages
  • Documentation: Full rustdoc documentation with examples
  • Testing: Unit tests, integration tests, and property-based tests with proptest
  • Performance: Benchmarked and optimized hot paths

API Changes

Old API:

extern crate karma;

let mut hmm = karma::Karma::new(20, 10);
hmm.randomize();
hmm.train(&vec![0, 1, 2], None); // Takes &Vec, no error handling
let prob = hmm.evaluate(&vec![0, 1]); // Returns f64 directly

New API:

use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(20, 10)?;
hmm.randomize_initial_probabilities();
hmm.train(&[0, 1, 2], None)?; // Takes &[usize], returns Result
let prob = hmm.evaluate(&[0, 1])?; // Returns Result<f64>

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

# Clone the repository
git clone https://github.com/psychonautwiki/karma
cd karma

# Run tests
cargo test

# Run benchmarks
cargo bench

# Build documentation
cargo doc --open

License

MIT License - see LICENSE file for details.

Copyright (c) 2017 Kenan Sulayman Copyright (c) 2017 The PsychonautWiki Contributors Copyright (c) 2025 Modern Rust Implementation

References

  • Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition"
  • Baum, L. E.; Petrie, T.; Soules, G.; Weiss, N. (1970). "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains"
  • Original JavaScript reference implementation

Acknowledgments

Original implementation by Kenan Sulayman and PsychonautWiki contributors. This modern rewrite maintains the core algorithms while bringing the codebase up to modern Rust standards.


Made with Rust 🦀