lling-llang 0.1.0

# RRWM Algorithm

The Rational Randomized Weighted-Majority (RRWM) algorithm is an online learning method for ensemble prediction using WFST-based path experts. It provides principled combination of multiple models with guaranteed regret bounds.

## Concepts

### Online Learning Problem

In online learning, we face a sequence of rounds:

```
Round 1: Receive input → Make prediction → Observe loss
Round 2: Receive input → Make prediction → Observe loss
...
Round T: Receive input → Make prediction → Observe loss
```

The goal is to minimize **regret**: the difference between our cumulative loss and the loss of the best single expert in hindsight.

### Path Experts

A **path expert** is a WFST that maps inputs to outputs. Given an input, each expert proposes a prediction (an accepting path). Different experts may propose different paths with different weights.

```
Expert 1: "teh" → "the" (weight 0.9)
Expert 2: "teh" → "tea" (weight 0.7)
Expert 3: "teh" → "ten" (weight 0.8)
```

RRWM combines these experts, learning which ones to trust over time.

### Algorithm Overview

RRWM maintains a **cumulative weight automaton** W_t that tracks performance:

```
RRWM(T rounds):
    W₀ ← one-state automaton (all paths weight 1)

    for t = 1 to T:
        1. Receive input and loss transducer V_t
        2. Compose: W_t ← W_{t-1} ◦ V_t
        3. Push weights to make W_t stochastic
        4. Sample prediction from W_t

    return W_T
```

The key insight is that WFST composition efficiently combines all paths across all experts.

### Regret Bound

RRWM achieves the theoretical regret bound:

```
E[R_T] ≤ 2M√(T log N)
```

Where:
- **R_T**: Total regret after T rounds
- **M**: Maximum loss per round
- **T**: Number of rounds
- **N**: Number of path experts

This bound is **sublinear** in T, meaning average regret decreases over time.

### Connection to Power Semiring

RRWM operates over the η-power semiring to enable:
- Smooth weight updates (vs hard thresholding)
- Proper probability distributions after weight pushing
- Rational loss functions (hence "Rational" in RRWM)

## Core API

### Configuration

```rust
use lling_llang::algorithms::{RrwmConfig, Rrwm};

let config = RrwmConfig::default()
    .eta(1.0)               // η parameter for power semiring
    .learning_rate(1.0)     // Learning rate multiplier
    .max_rounds(100_000)    // Maximum rounds before reset
    .with_statistics()      // Enable statistics tracking
    .seed(42);              // Random seed for reproducibility

let rrwm = Rrwm::<char>::new(config);
```

### Configuration Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `eta` | 1.0 | Power semiring η (smaller = more exploration) |
| `learning_rate` | 1.0 | Multiplier for weight updates |
| `max_rounds` | 100,000 | Reset trigger (prevents unbounded automaton growth) |
| `track_statistics` | false | Enable detailed statistics |
| `seed` | None | Random seed for sampling |

### Main Methods

| Method | Description |
|--------|-------------|
| `observe(loss_transducer)` | Update weights with observed loss |
| `predict()` | Sample a prediction from current distribution |
| `regret_bound(max_loss, num_experts)` | Compute theoretical regret bound |
| `reset()` | Reset to initial state |
| `round()` | Current round number |
| `cumulative_weights()` | Access the cumulative automaton |
| `statistics()` | Get learning statistics (if enabled) |

### Error Handling

```rust
use lling_llang::algorithms::RrwmError;

match rrwm.observe(&loss_transducer) {
    Ok(loss) => println!("Observed loss: {}", loss),
    Err(RrwmError::MaxRoundsExceeded) => println!("Need to reset"),
    Err(RrwmError::PushFailed(msg)) => println!("Push failed: {}", msg),
    Err(RrwmError::SampleFailed(e)) => println!("Sampling failed: {:?}", e),
    Err(RrwmError::EmptyComposition) => println!("No valid paths"),
    Err(RrwmError::ConfigError(msg)) => println!("Config error: {}", msg),
}
```

### Statistics

```rust
let rrwm = Rrwm::<char>::new(RrwmConfig::default().with_statistics());

// After some observations...
if let Some(stats) = rrwm.statistics() {
    println!("Rounds: {}", stats.rounds);
    println!("Total loss: {}", stats.total_loss);
    println!("Average loss: {}", stats.average_loss);
    println!("Cumulative states: {}", stats.cumulative_states);
    println!("Loss history: {:?}", stats.loss_history);
}
```

## Examples

### Basic Online Learning Loop

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};
use lling_llang::semiring::PowerWeight;
use lling_llang::wfst::{VectorWfst, MutableWfst};

// Create RRWM learner
let config = RrwmConfig::default()
    .eta(1.0)
    .with_statistics()
    .seed(42);
let mut rrwm = Rrwm::<char>::new(config);

// Simulate online learning
for round in 0..100 {
    // 1. Make prediction
    match rrwm.predict() {
        Ok(prediction) => {
            println!("Round {}: predicted {:?}", round, prediction.output_string());
        }
        Err(e) => {
            println!("Prediction failed: {:?}", e);
        }
    }

    // 2. Construct loss transducer based on actual outcome
    let loss_transducer = create_loss_transducer(/* actual outcome */);

    // 3. Observe loss and update
    match rrwm.observe(loss_transducer) {
        Ok(loss) => {
            println!("Round {}: loss = {:.4}", round, loss);
        }
        Err(e) => {
            println!("Observation failed: {:?}", e);
            break;
        }
    }
}

// Check final statistics
if let Some(stats) = rrwm.statistics() {
    println!("Final average loss: {:.4}", stats.average_loss);
}

fn create_loss_transducer() -> VectorWfst<char, PowerWeight> {
    let eta = 1.0;
    let mut wfst = VectorWfst::new();
    let s0 = wfst.add_state();
    let s1 = wfst.add_state();
    wfst.set_start(s0);
    wfst.set_final(s1, PowerWeight::one_with_eta(eta));
    wfst.add_arc(s0, Some('a'), Some('a'), s1,
        PowerWeight::from_probability(0.9, eta));
    wfst
}
```

### Setting Up Path Experts

```rust
use lling_llang::algorithms::{Rrwm, RrwmBuilder, RrwmConfig};
use lling_llang::semiring::PowerWeight;
use lling_llang::wfst::{VectorWfst, MutableWfst};

fn create_expert(label: char, weight: f64, eta: f64) -> VectorWfst<char, PowerWeight> {
    let mut wfst = VectorWfst::new();
    let s0 = wfst.add_state();
    let s1 = wfst.add_state();
    wfst.set_start(s0);
    wfst.set_final(s1, PowerWeight::one_with_eta(eta));
    wfst.add_arc(s0, Some('?'), Some(label), s1,
        PowerWeight::from_probability(weight, eta));
    wfst
}

// Create multiple path experts
let eta = 2.0;
let expert_a = create_expert('a', 0.8, eta);  // Predicts 'a' with weight 0.8
let expert_b = create_expert('b', 0.6, eta);  // Predicts 'b' with weight 0.6
let expert_c = create_expert('c', 0.9, eta);  // Predicts 'c' with weight 0.9

// Build RRWM with initial experts
let rrwm = RrwmBuilder::<char>::new()
    .eta(eta)
    .add_expert(expert_a)
    .add_expert(expert_b)
    .add_expert(expert_c)
    .build();

println!("RRWM initialized with η = {}", rrwm.eta());
```

### Computing Regret Bounds

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

let mut rrwm = Rrwm::<char>::new(RrwmConfig::default());

// Simulate 100 rounds
rrwm.round = 100;  // (normally incremented by observe())

let max_loss = 1.0;    // Maximum loss per round
let num_experts = 10;  // Number of path experts

let bound = rrwm.regret_bound(max_loss, num_experts);
println!("Regret bound after {} rounds: {:.2}", rrwm.round(), bound);

// The bound is: 2 * M * sqrt(T * ln(N))
// = 2 * 1.0 * sqrt(100 * ln(10)) ≈ 30.3
```

### Controlling Exploration vs Exploitation

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

// More exploration (smaller η)
let exploratory = Rrwm::<char>::new(
    RrwmConfig::default().eta(0.5)
);
println!("Exploratory η: {}", exploratory.eta());

// More exploitation (larger η)
let exploitative = Rrwm::<char>::new(
    RrwmConfig::default().eta(3.0)
);
println!("Exploitative η: {}", exploitative.eta());

// The η parameter affects:
// - How weights are combined (soft vs hard)
// - The sampling distribution (uniform-like vs greedy)
```

### Resetting the Learner

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

let mut rrwm = Rrwm::<char>::new(RrwmConfig::default().with_statistics());

// Run for a while...
// (rrwm.round increments with each observe())

// Reset to initial state
rrwm.reset();

assert_eq!(rrwm.round(), 0);
assert_eq!(rrwm.cumulative_weights().num_states(), 1);
if let Some(stats) = rrwm.statistics() {
    assert_eq!(stats.rounds, 0);
    assert_eq!(stats.total_loss, 0.0);
}
println!("RRWM reset to initial state");
```

### Accessing Cumulative Weights

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};
use lling_llang::wfst::Wfst;

let rrwm = Rrwm::<char>::new(RrwmConfig::default());

// Access the cumulative weight automaton
let cumulative = rrwm.cumulative_weights();

println!("States: {}", cumulative.num_states());
println!("Start state: {}", cumulative.start());

// The cumulative automaton encodes learned expert weights
// and can be used for inspection or debugging
```

## Algorithm Details

### Weight Update Mechanism

At each round, weights are updated via WFST composition:

```
W_t = WeightPush(W_{t-1} ◦ V_t)
```

Where:
- W_{t-1} is the cumulative automaton from previous round
- V_t is the loss transducer for round t
- ◦ denotes WFST composition
- WeightPush makes the result stochastic for sampling

### Loss Transducer Construction

The loss transducer V_t encodes the loss for each possible prediction:

```
Input: actual outcome y
Output: V_t such that V_t(ŷ) = loss(ŷ, y)

For correct prediction: low loss (high weight)
For incorrect prediction: high loss (low weight)
```

### Sampling Predictions

After weight pushing, the cumulative automaton is stochastic (outgoing weights sum to 1). Predictions are sampled proportional to weights:

```
P(path π) ∝ W_t(π)
```

Better-performing paths get higher probability over time.

## When to Use RRWM

**Choose RRWM when:**

| Scenario | Why RRWM? |
|----------|-----------|
| Combining multiple WFST models | Principled ensemble learning |
| Online adaptation | Updates incrementally, no batch retraining |
| Theoretical guarantees needed | Provable regret bounds |
| Non-additive losses | Handles edit distance, tropical losses |
| Streaming data | Constant memory per update |

**Consider alternatives when:**

| Scenario | Alternative |
|----------|-------------|
| Single best model known | Use that model directly |
| Batch training available | Train ensemble offline |
| Very few rounds | Limited learning opportunity |
| Memory constrained | Cumulative automaton grows |

## Relationship to Other Algorithms

```
                    ┌─────────────────────┐
                    │      RRWM           │
                    │ (Online Learning)   │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
    ┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
    │ Power Semiring  │ │Path Sampling│ │ Weight Pushing  │
    │ (η-soft weights)│ │(predictions)│ │(make stochastic)│
    └─────────────────┘ └─────────────┘ └─────────────────┘
```

## Complexity Analysis

| Operation | Complexity |
|-----------|------------|
| observe() | O(|W_{t-1}| × |V_t|) composition |
| predict() | O(path length) sampling |
| regret_bound() | O(1) |
| reset() | O(1) |
| Space | O(|W_t|) cumulative automaton |

## References

- Cortes, C., Kuznetsov, V., Mohri, M., & Warmuth, M. K. (2015). "On-Line Learning Algorithms for Path Experts with Non-Additive Losses". COLT 2015, PMLR 40:424–447. (Figure 6 defines RRWM)

## Related Documentation

- [Power Semiring](../architecture/power-semiring.md) - The η-power semiring used by RRWM
- [Path Sampling](path-sampling.md) - How predictions are sampled
- [Weight Pushing](weight-pushing.md) - Making automata stochastic
- [Composition](composition.md) - How loss transducers are combined