karma 1.0.0

A sophisticated Hidden Markov Model (HMM) implementation using the Baum-Welch algorithm
Documentation
# Karma

[![Crates.io](https://img.shields.io/crates/v/karma.svg)](https://crates.io/crates/karma)
[![Documentation](https://docs.rs/karma/badge.svg)](https://docs.rs/karma)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

A sophisticated and efficient Hidden Markov Model (HMM) implementation in Rust using the Baum-Welch algorithm for training and the Forward algorithm for evaluation.

## Features

- **Modern Rust**: Built with Rust 2021 edition, idiomatic code, and zero unsafe code
- **Type-Safe**: Comprehensive error handling with custom error types
- **Efficient**: Optimized matrix operations with minimal allocations and workspace reuse
- **Well-Tested**: Extensive unit tests, integration tests, and property-based tests
- **Builder Pattern**: Ergonomic API with builder pattern for flexible construction
- **Serialization**: Optional serde support for model persistence
- **Documentation**: Comprehensive API documentation with examples
- **Benchmarked**: Performance benchmarks included for tracking optimizations

## What are Hidden Markov Models?

Hidden Markov Models (HMMs) are statistical models used to represent systems that transition between hidden states while producing observable outputs. They're widely used in:

- **Speech Recognition**: Modeling phonemes and words
- **Natural Language Processing**: Part-of-speech tagging, named entity recognition
- **Bioinformatics**: DNA sequence analysis, protein structure prediction
- **Time Series Analysis**: Financial modeling, anomaly detection
- **Pattern Recognition**: Gesture recognition, activity recognition

An HMM consists of:
- **Hidden States**: The underlying states the system can be in (not directly observable)
- **Observations**: The visible outputs produced by the system
- **Probabilities**:
  - Initial state probabilities (π): Probability of starting in each state
  - Transition probabilities (A): Probability of transitioning from one state to another
  - Emission probabilities (B): Probability of observing a symbol from a given state

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
karma = "0.2"
```

For serialization support:

```toml
[dependencies]
karma = { version = "0.2", features = ["serde"] }
```

## Quick Start

```rust
use karma::HiddenMarkovModel;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create an HMM with 3 hidden states and 4 possible observations
    let mut hmm = HiddenMarkovModel::new(3, 4)?;

    // Train on observation sequences
    hmm.train(&[0, 1, 2, 3, 1, 0], None)?;
    hmm.train(&[1, 2, 1, 0], Some(0.1))?; // Custom learning rate

    // Evaluate probability of a new sequence
    let probability = hmm.evaluate(&[0, 1, 2])?;
    println!("P(sequence|model) = {}", probability);

    Ok(())
}
```

## Usage Examples

### Basic Training and Evaluation

```rust
use karma::HiddenMarkovModel;

// Create model with 20 states and 10 observations
let mut hmm = HiddenMarkovModel::new(20, 10)?;

// Optionally randomize initial probabilities to break symmetry
hmm.randomize_initial_probabilities();

// Train on multiple sequences
let training_data = vec![
    vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    vec![0, 1, 4, 5, 7, 9, 0, 4, 7, 8, 2],
];

for sequence in training_data {
    hmm.train(&sequence, Some(0.05))?; // 0.05 learning rate
}

// Evaluate test sequences
let prob = hmm.evaluate(&[5, 6, 7, 8])?;
println!("Probability: {}", prob);
```

### Using the Builder Pattern

```rust
use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::builder()
    .states(10)
    .observations(5)
    .randomize(true) // Randomize initial probabilities
    .build()?;

// Train and evaluate as normal
hmm.train(&[0, 1, 2, 3, 4], None)?;
```

### Pattern Classification

Train an HMM to distinguish between different types of sequences:

```rust
use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(15, 4)?;

// Train on alternating patterns
for _ in 0..20 {
    hmm.train(&[0, 1, 0, 1, 0, 1], Some(0.1))?;
}

// Train on sequential patterns
for _ in 0..20 {
    hmm.train(&[0, 1, 2, 3], Some(0.1))?;
}

// Classify new sequences
let prob_alternating = hmm.evaluate(&[0, 1, 0, 1])?;
let prob_sequential = hmm.evaluate(&[0, 1, 2, 3])?;

// Higher probability indicates better match to training data
```

### Model Persistence (with serde feature)

```rust
use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(5, 3)?;
hmm.train(&[0, 1, 2, 0, 1, 2], Some(0.1))?;

// Serialize to JSON
let json = serde_json::to_string(&hmm)?;

// Save to file
std::fs::write("model.json", json)?;

// Load from file
let loaded_json = std::fs::read_to_string("model.json")?;
let loaded_hmm: HiddenMarkovModel = serde_json::from_str(&loaded_json)?;
```

## API Overview

### `HiddenMarkovModel`

#### Construction

- `new(n_states, n_observations)` - Create with uniform initial probabilities
- `builder()` - Get a builder for custom configuration

#### Training

- `train(&observations, learning_rate)` - Train on an observation sequence
  - `observations`: Slice of observation indices `[0, n_observations)`
  - `learning_rate`: Optional f64 in `(0.0, 1.0]`, defaults to `0.05`
  - Returns `Result<(), HmmError>`

#### Evaluation

- `evaluate(&observations)` - Compute P(observations | model)
  - Returns `Result<f64, HmmError>`

#### Utilities

- `randomize_initial_probabilities()` - Break symmetry in initial state probabilities
- `n_states()` - Get number of hidden states
- `n_observations()` - Get number of possible observations
- `initial_probabilities()` - Get initial state probability vector π
- `transition_probabilities()` - Get state transition matrix A
- `emission_probabilities()` - Get emission probability matrix B

### Error Handling

All operations return `Result<T, HmmError>` with detailed error messages:

```rust
use karma::{HiddenMarkovModel, HmmError};

let mut hmm = HiddenMarkovModel::new(3, 4)?;

match hmm.train(&[0, 1, 5], None) {
    Ok(_) => println!("Training successful"),
    Err(HmmError::InvalidObservation(obs, max)) => {
        println!("Invalid observation {} (max: {})", obs, max);
    }
    Err(e) => println!("Error: {}", e),
}
```

## Algorithm Details

### Baum-Welch Algorithm (Training)

The Baum-Welch algorithm is an Expectation-Maximization (EM) algorithm for estimating HMM parameters:

1. **Forward Pass (α)**: Compute forward probabilities α_t(i) = P(o_1...o_t, q_t=i | λ)
2. **Backward Pass (β)**: Compute backward probabilities β_t(i) = P(o_{t+1}...o_T | q_t=i, λ)
3. **E-Step**: Compute state occupation probabilities (γ) and state transition probabilities (ξ)
4. **M-Step**: Update model parameters (π, A, B) using computed probabilities

The learning rate controls how much the parameters are updated in each iteration (like a step size in gradient descent).

### Forward Algorithm (Evaluation)

Efficiently computes P(observations | model) using dynamic programming:

1. **Initialization**: α_0(i) = π_i × b_i(o_0)
2. **Recursion**: α_t(j) = [Σ_i α_{t-1}(i) × a_{ij}] × b_j(o_t)
3. **Termination**: P(O|λ) = Σ_i α_T(i)

Time complexity: O(N² × T) where N is the number of states and T is the sequence length.

## Performance

The implementation is optimized for performance:

- **Workspace Reuse**: Pre-allocated workspace that grows as needed
- **Cache-Friendly**: Contiguous memory layout for better cache performance
- **Minimal Allocations**: Reuses buffers across training iterations
- **Double-Buffering**: Alternates between buffers in forward algorithm to avoid copies

Run benchmarks with:

```bash
cargo bench
```

Example benchmark results (will vary by hardware):
```
model_creation/new/20    time: [125 ns ... 135 ns]
training/20states_10obs  time: [45 μs ... 50 μs]
evaluation/100           time: [8 μs ... 9 μs]
```

## Examples

Run the included examples:

```bash
# Basic usage example
cargo run --example basic_usage

# Pattern classification example
cargo run --example classification

# Advanced: DNA sequence analysis (bioinformatics)
cargo run --example dna_sequence_analysis
```

### Featured Example: DNA Sequence Analysis 🧬

The `dna_sequence_analysis` example demonstrates a sophisticated real-world application: identifying CpG islands in DNA sequences using HMMs. This example shows:

- **Real bioinformatics application** - CpG island detection for gene annotation
- **Automatic pattern learning** - No manual feature engineering required
- **Clear discrimination** - Achieves 17+ orders of magnitude separation between sequence types
- **Comprehensive analysis** - From data generation to classification to biological insights
- **Educational value** - Explains genomic concepts and dinucleotide patterns

Perfect for understanding how HMMs are used in computational biology!

## Testing

The project includes comprehensive tests:

```bash
# Run unit and integration tests
cargo test

# Run tests with output
cargo test -- --nocapture

# Run property-based tests
cargo test prop_

# Run with all features
cargo test --all-features
```

## Comparison to Original Implementation

This is a complete rewrite of the original 2017 implementation with significant improvements:

### Modernizations

| Aspect | Old (2017) | New (2025) |
|--------|-----------|-----------|
| Rust Edition | 2015 | 2021 |
| rand version | 0.3.15 (2016) | 0.8 (2024) |
| Error Handling | Panics | Result types with custom errors |
| API Design | Direct struct access | Builder pattern + getters |
| Type Safety | Mixed i64/usize | Consistent usize for indices |
| Memory | Box&lt;Vec&gt; | Direct Vec (more efficient) |
| Documentation | Minimal | Comprehensive with examples |
| Tests | 1 basic test | 25+ tests + property-based |
| Benchmarks | None | Full criterion benchmark suite |
| Dependencies | rand only | +thiserror, +serde (optional) |

### Code Quality Improvements

- **Idiomatic Rust**: Follows Rust API guidelines and best practices
- **Type Safety**: Proper use of usize for indexing, no unnecessary conversions
- **Error Handling**: Custom error types with descriptive messages
- **Documentation**: Full rustdoc documentation with examples
- **Testing**: Unit tests, integration tests, and property-based tests with proptest
- **Performance**: Benchmarked and optimized hot paths

### API Changes

Old API:
```rust
extern crate karma;

let mut hmm = karma::Karma::new(20, 10);
hmm.randomize();
hmm.train(&vec![0, 1, 2], None); // Takes &Vec, no error handling
let prob = hmm.evaluate(&vec![0, 1]); // Returns f64 directly
```

New API:
```rust
use karma::HiddenMarkovModel;

let mut hmm = HiddenMarkovModel::new(20, 10)?;
hmm.randomize_initial_probabilities();
hmm.train(&[0, 1, 2], None)?; // Takes &[usize], returns Result
let prob = hmm.evaluate(&[0, 1])?; // Returns Result<f64>
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

```bash
# Clone the repository
git clone https://github.com/psychonautwiki/karma
cd karma

# Run tests
cargo test

# Run benchmarks
cargo bench

# Build documentation
cargo doc --open
```

## License

MIT License - see LICENSE file for details.

Copyright (c) 2017 Kenan Sulayman
Copyright (c) 2017 The PsychonautWiki Contributors
Copyright (c) 2025 Modern Rust Implementation

## References

- Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition"
- Baum, L. E.; Petrie, T.; Soules, G.; Weiss, N. (1970). "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains"
- [Original JavaScript reference implementation]https://github.com/psychonautwiki/karma/blob/master/poc/reference.js

## Acknowledgments

Original implementation by Kenan Sulayman and PsychonautWiki contributors. This modern rewrite maintains the core algorithms while bringing the codebase up to modern Rust standards.

---

Made with Rust 🦀