lling-llang 0.1.0

WFST framework for text normalization and grammar correction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
# RRWM Algorithm

The Rational Randomized Weighted-Majority (RRWM) algorithm is an online learning method for ensemble prediction using WFST-based path experts. It provides principled combination of multiple models with guaranteed regret bounds.

## Concepts

### Online Learning Problem

In online learning, we face a sequence of rounds:

```
Round 1: Receive input → Make prediction → Observe loss
Round 2: Receive input → Make prediction → Observe loss
...
Round T: Receive input → Make prediction → Observe loss
```

The goal is to minimize **regret**: the difference between our cumulative loss and the loss of the best single expert in hindsight.

### Path Experts

A **path expert** is a WFST that maps inputs to outputs. Given an input, each expert proposes a prediction (an accepting path). Different experts may propose different paths with different weights.

```
Expert 1: "teh" → "the" (weight 0.9)
Expert 2: "teh" → "tea" (weight 0.7)
Expert 3: "teh" → "ten" (weight 0.8)
```

RRWM combines these experts, learning which ones to trust over time.

### Algorithm Overview

RRWM maintains a **cumulative weight automaton** W_t that tracks performance:

```
RRWM(T rounds):
    W₀ ← one-state automaton (all paths weight 1)

    for t = 1 to T:
        1. Receive input and loss transducer V_t
        2. Compose: W_t ← W_{t-1} ◦ V_t
        3. Push weights to make W_t stochastic
        4. Sample prediction from W_t

    return W_T
```

The key insight is that WFST composition efficiently combines all paths across all experts.

### Regret Bound

RRWM achieves the theoretical regret bound:

```
E[R_T] ≤ 2M√(T log N)
```

Where:
- **R_T**: Total regret after T rounds
- **M**: Maximum loss per round
- **T**: Number of rounds
- **N**: Number of path experts

This bound is **sublinear** in T, meaning average regret decreases over time.

### Connection to Power Semiring

RRWM operates over the η-power semiring to enable:
- Smooth weight updates (vs hard thresholding)
- Proper probability distributions after weight pushing
- Rational loss functions (hence "Rational" in RRWM)

## Core API

### Configuration

```rust
use lling_llang::algorithms::{RrwmConfig, Rrwm};

let config = RrwmConfig::default()
    .eta(1.0)               // η parameter for power semiring
    .learning_rate(1.0)     // Learning rate multiplier
    .max_rounds(100_000)    // Maximum rounds before reset
    .with_statistics()      // Enable statistics tracking
    .seed(42);              // Random seed for reproducibility

let rrwm = Rrwm::<char>::new(config);
```

### Configuration Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `eta` | 1.0 | Power semiring η (smaller = more exploration) |
| `learning_rate` | 1.0 | Multiplier for weight updates |
| `max_rounds` | 100,000 | Reset trigger (prevents unbounded automaton growth) |
| `track_statistics` | false | Enable detailed statistics |
| `seed` | None | Random seed for sampling |

### Main Methods

| Method | Description |
|--------|-------------|
| `observe(loss_transducer)` | Update weights with observed loss |
| `predict()` | Sample a prediction from current distribution |
| `regret_bound(max_loss, num_experts)` | Compute theoretical regret bound |
| `reset()` | Reset to initial state |
| `round()` | Current round number |
| `cumulative_weights()` | Access the cumulative automaton |
| `statistics()` | Get learning statistics (if enabled) |

### Error Handling

```rust
use lling_llang::algorithms::RrwmError;

match rrwm.observe(&loss_transducer) {
    Ok(loss) => println!("Observed loss: {}", loss),
    Err(RrwmError::MaxRoundsExceeded) => println!("Need to reset"),
    Err(RrwmError::PushFailed(msg)) => println!("Push failed: {}", msg),
    Err(RrwmError::SampleFailed(e)) => println!("Sampling failed: {:?}", e),
    Err(RrwmError::EmptyComposition) => println!("No valid paths"),
    Err(RrwmError::ConfigError(msg)) => println!("Config error: {}", msg),
}
```

### Statistics

```rust
let rrwm = Rrwm::<char>::new(RrwmConfig::default().with_statistics());

// After some observations...
if let Some(stats) = rrwm.statistics() {
    println!("Rounds: {}", stats.rounds);
    println!("Total loss: {}", stats.total_loss);
    println!("Average loss: {}", stats.average_loss);
    println!("Cumulative states: {}", stats.cumulative_states);
    println!("Loss history: {:?}", stats.loss_history);
}
```

## Examples

### Basic Online Learning Loop

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};
use lling_llang::semiring::PowerWeight;
use lling_llang::wfst::{VectorWfst, MutableWfst};

// Create RRWM learner
let config = RrwmConfig::default()
    .eta(1.0)
    .with_statistics()
    .seed(42);
let mut rrwm = Rrwm::<char>::new(config);

// Simulate online learning
for round in 0..100 {
    // 1. Make prediction
    match rrwm.predict() {
        Ok(prediction) => {
            println!("Round {}: predicted {:?}", round, prediction.output_string());
        }
        Err(e) => {
            println!("Prediction failed: {:?}", e);
        }
    }

    // 2. Construct loss transducer based on actual outcome
    let loss_transducer = create_loss_transducer(/* actual outcome */);

    // 3. Observe loss and update
    match rrwm.observe(loss_transducer) {
        Ok(loss) => {
            println!("Round {}: loss = {:.4}", round, loss);
        }
        Err(e) => {
            println!("Observation failed: {:?}", e);
            break;
        }
    }
}

// Check final statistics
if let Some(stats) = rrwm.statistics() {
    println!("Final average loss: {:.4}", stats.average_loss);
}

fn create_loss_transducer() -> VectorWfst<char, PowerWeight> {
    let eta = 1.0;
    let mut wfst = VectorWfst::new();
    let s0 = wfst.add_state();
    let s1 = wfst.add_state();
    wfst.set_start(s0);
    wfst.set_final(s1, PowerWeight::one_with_eta(eta));
    wfst.add_arc(s0, Some('a'), Some('a'), s1,
        PowerWeight::from_probability(0.9, eta));
    wfst
}
```

### Setting Up Path Experts

```rust
use lling_llang::algorithms::{Rrwm, RrwmBuilder, RrwmConfig};
use lling_llang::semiring::PowerWeight;
use lling_llang::wfst::{VectorWfst, MutableWfst};

fn create_expert(label: char, weight: f64, eta: f64) -> VectorWfst<char, PowerWeight> {
    let mut wfst = VectorWfst::new();
    let s0 = wfst.add_state();
    let s1 = wfst.add_state();
    wfst.set_start(s0);
    wfst.set_final(s1, PowerWeight::one_with_eta(eta));
    wfst.add_arc(s0, Some('?'), Some(label), s1,
        PowerWeight::from_probability(weight, eta));
    wfst
}

// Create multiple path experts
let eta = 2.0;
let expert_a = create_expert('a', 0.8, eta);  // Predicts 'a' with weight 0.8
let expert_b = create_expert('b', 0.6, eta);  // Predicts 'b' with weight 0.6
let expert_c = create_expert('c', 0.9, eta);  // Predicts 'c' with weight 0.9

// Build RRWM with initial experts
let rrwm = RrwmBuilder::<char>::new()
    .eta(eta)
    .add_expert(expert_a)
    .add_expert(expert_b)
    .add_expert(expert_c)
    .build();

println!("RRWM initialized with η = {}", rrwm.eta());
```

### Computing Regret Bounds

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

let mut rrwm = Rrwm::<char>::new(RrwmConfig::default());

// Simulate 100 rounds
rrwm.round = 100;  // (normally incremented by observe())

let max_loss = 1.0;    // Maximum loss per round
let num_experts = 10;  // Number of path experts

let bound = rrwm.regret_bound(max_loss, num_experts);
println!("Regret bound after {} rounds: {:.2}", rrwm.round(), bound);

// The bound is: 2 * M * sqrt(T * ln(N))
// = 2 * 1.0 * sqrt(100 * ln(10)) ≈ 30.3
```

### Controlling Exploration vs Exploitation

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

// More exploration (smaller η)
let exploratory = Rrwm::<char>::new(
    RrwmConfig::default().eta(0.5)
);
println!("Exploratory η: {}", exploratory.eta());

// More exploitation (larger η)
let exploitative = Rrwm::<char>::new(
    RrwmConfig::default().eta(3.0)
);
println!("Exploitative η: {}", exploitative.eta());

// The η parameter affects:
// - How weights are combined (soft vs hard)
// - The sampling distribution (uniform-like vs greedy)
```

### Resetting the Learner

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};

let mut rrwm = Rrwm::<char>::new(RrwmConfig::default().with_statistics());

// Run for a while...
// (rrwm.round increments with each observe())

// Reset to initial state
rrwm.reset();

assert_eq!(rrwm.round(), 0);
assert_eq!(rrwm.cumulative_weights().num_states(), 1);
if let Some(stats) = rrwm.statistics() {
    assert_eq!(stats.rounds, 0);
    assert_eq!(stats.total_loss, 0.0);
}
println!("RRWM reset to initial state");
```

### Accessing Cumulative Weights

```rust
use lling_llang::algorithms::{Rrwm, RrwmConfig};
use lling_llang::wfst::Wfst;

let rrwm = Rrwm::<char>::new(RrwmConfig::default());

// Access the cumulative weight automaton
let cumulative = rrwm.cumulative_weights();

println!("States: {}", cumulative.num_states());
println!("Start state: {}", cumulative.start());

// The cumulative automaton encodes learned expert weights
// and can be used for inspection or debugging
```

## Algorithm Details

### Weight Update Mechanism

At each round, weights are updated via WFST composition:

```
W_t = WeightPush(W_{t-1} ◦ V_t)
```

Where:
- W_{t-1} is the cumulative automaton from previous round
- V_t is the loss transducer for round t
- ◦ denotes WFST composition
- WeightPush makes the result stochastic for sampling

### Loss Transducer Construction

The loss transducer V_t encodes the loss for each possible prediction:

```
Input: actual outcome y
Output: V_t such that V_t(ŷ) = loss(ŷ, y)

For correct prediction: low loss (high weight)
For incorrect prediction: high loss (low weight)
```

### Sampling Predictions

After weight pushing, the cumulative automaton is stochastic (outgoing weights sum to 1). Predictions are sampled proportional to weights:

```
P(path π) ∝ W_t(π)
```

Better-performing paths get higher probability over time.

## When to Use RRWM

**Choose RRWM when:**

| Scenario | Why RRWM? |
|----------|-----------|
| Combining multiple WFST models | Principled ensemble learning |
| Online adaptation | Updates incrementally, no batch retraining |
| Theoretical guarantees needed | Provable regret bounds |
| Non-additive losses | Handles edit distance, tropical losses |
| Streaming data | Constant memory per update |

**Consider alternatives when:**

| Scenario | Alternative |
|----------|-------------|
| Single best model known | Use that model directly |
| Batch training available | Train ensemble offline |
| Very few rounds | Limited learning opportunity |
| Memory constrained | Cumulative automaton grows |

## Relationship to Other Algorithms

```
                    ┌─────────────────────┐
                    │      RRWM           │
                    │ (Online Learning)   │
                    └──────────┬──────────┘
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
    ┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
    │ Power Semiring  │ │Path Sampling│ │ Weight Pushing  │
    │ (η-soft weights)│ │(predictions)│ │(make stochastic)│
    └─────────────────┘ └─────────────┘ └─────────────────┘
```

## Complexity Analysis

| Operation | Complexity |
|-----------|------------|
| observe() | O(|W_{t-1}| × |V_t|) composition |
| predict() | O(path length) sampling |
| regret_bound() | O(1) |
| reset() | O(1) |
| Space | O(|W_t|) cumulative automaton |

## References

- Cortes, C., Kuznetsov, V., Mohri, M., & Warmuth, M. K. (2015). "On-Line Learning Algorithms for Path Experts with Non-Additive Losses". COLT 2015, PMLR 40:424–447. (Figure 6 defines RRWM)

## Related Documentation

- [Power Semiring]../architecture/power-semiring.md - The η-power semiring used by RRWM
- [Path Sampling]path-sampling.md - How predictions are sampled
- [Weight Pushing]weight-pushing.md - Making automata stochastic
- [Composition]composition.md - How loss transducers are combined