lling-llang 0.1.0

WFST framework for text normalization and grammar correction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
# lling-llang

[![License](https://img.shields.io/badge/license-Apache--2.0%20OR%20MIT-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/rust-2021-orange.svg)](https://www.rust-lang.org)
[![Verified](https://img.shields.io/badge/proofs-Coq%20%2B%20TLA%E2%81%BA-crimson.svg)](proofs/)
[![Status](https://img.shields.io/badge/status-0.1.0%20early%20development-yellow.svg)](#project-status)

> A pure-Rust, **semiring-generic** Weighted Finite-State Transducer (WFST) toolkit — spanning
> classical weighted-automata algorithms, automatic speech recognition, differentiable decoding,
> and constrained generation — built on a **machine-checked core** (Coq/Rocq proofs + TLA⁺ models).

`lling-llang` treats a problem — spelling/grammar correction, speech decoding, text normalization,
constrained LLM output — as a search for the **best path through a weighted graph of
hypotheses**. A single algebraic abstraction, the **semiring**, lets the *same* algorithm compute
the shortest path, the most probable string, a reachability set, or an expected gradient, just by
swapping the weight type. That is the idea the whole library is organized around.

---

## Highlights

- 🧮 **One algebra, many objectives.** ~15 semiring weight types (Tropical, Log, Probability,
  Boolean, Expectation, Product, Lexicographic, Power, …) drive shortest-path, probabilistic, and
  multi-objective search through identical code.
- 🔤 **Classical WFST algorithms** à la Mohri: composition, `ε`-removal, determinization,
  minimization, weight pushing, shortest-distance, synchronization — generic over the semiring.
- 🌳 **A transducer zoo:** lattices, multitape transducers, weighted pushdown automata (PDA), tree
  transducers, and subsequential transducers.
- 🎙️ **End-to-end speech recognition:** the `H ∘ C ∘ L ∘ G` cascade, CTC topologies, neural
  transducers (RNN-T), acoustic-model fusion, LF-MMI and weakly-supervised training.
- 🔬 **Differentiable & GPU-ready:** GTN-style autograd through WFST operations, and
  GPU-shaped data structures (CSR, lock-free token packing, k-vector reduction).
- ✍️ **Text & code tooling:** text normalization (TN/ITN), error models, grammar-constrained LLM
  decoding, and source-code syntax repair / API migration.
-**Formally verified foundations:** Coq/Rocq proofs of the semiring laws, WFST semantics, and
  algorithm specifications (no `admit`/`Axiom`), plus TLA⁺ models of the trickier protocols.

---

## Notation & glossary

These symbols and terms are used throughout; they are defined here once, before first use.

| Symbol / term                | Meaning                                                                                                                                                                       |
|------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **WFST**                     | *Weighted Finite-State Transducer* — a finite automaton whose transitions carry an input label, an output label, and a weight.                                                |
| **WFSA**                     | *Weighted Finite-State Acceptor* — a WFST with `input = output` (no translation, just scoring).                                                                                 |
| **Semiring** `(K, ⊕, ⊗, 0̄, 1̄)` | A set `K` with two operations: `` combines *alternative* paths, `` combines *sequential* steps; `` is the ``-identity (“no path”), `` the ``-identity (“empty path”). |
| `ε` (epsilon)              | The empty label — a transition that consumes/emits nothing.                                                                                                                   |
| **DAG**                      | *Directed Acyclic Graph*.                                                                                                                                                     |
| **Lattice**                  | A weighted DAG whose start→end paths enumerate hypotheses (e.g. corrections of a sentence).                                                                                   |
| ``                        | *Composition* — chaining transducers so the output of one feeds the input of the next.                                                                                        |
| `H`, `C`, `L`, `G`               | The ASR cascade stages: **H**MM, **C**ontext-dependency, **L**exicon, **G**rammar/LM.                                                                                         |
| **Viterbi / N-best / beam**  | Path-extraction strategies: single best, top-*k*, and approximate top paths.                                                                                                  |
| **CTC**                      | *Connectionist Temporal Classification* — alignment-free sequence labeling.                                                                                                   |
| **RNN-T**                    | *Recurrent Neural-network Transducer* — streaming encoder–predictor–joiner model.                                                                                             |
| **LF-MMI**                   | *Lattice-Free Maximum Mutual Information* — a sequence-discriminative training objective.                                                                                     |
| **PDA**                      | *Pushdown Automaton* — a finite automaton with a stack (recognizes context-free languages).                                                                                   |
| **TN / ITN**                 | *Text Normalization* (“$5” → “five dollars”) and its **I**nverse (“five dollars” → “$5”).                                                                                     |

---

## Architecture at a glance

The library is layered: every tier is generic over the semiring tier at the bottom, and the
formal-verification suite underwrites the core.

![lling-llang module architecture](docs/diagrams/architecture-map.svg)

| Tier                    | Modules                                                                                | Role                                                                       |
|-------------------------|----------------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| **Foundation**          | `semiring` · `wfst` · `lattice` · `backend`                                            | The weight algebra and the automata/graph data structures.                 |
| **Algorithms**          | `algorithms` · `path` · `composition` · `optimization`                                 | Generic WFST operations and path extraction.                               |
| **Transducer families** | `cfg` · `multitape` · `pushdown` · `tree_transducers` · `subsequential`                | Beyond string-to-string: grammars, *k*-tape, stack, and tree transduction. |
| **Correction & NLP**    | `layers` · `error_models` · `text_processing` · `multilingual` · `llm` · `programming` | Application layers for correction, normalization, and generation.          |
| **Deep learning**       | `differentiable` · `gpu` · `simd`                                                      | Gradients through WFSTs and acceleration-friendly layouts.                 |
| **Speech**              | `asr` · `acoustic` · `ctc` · `transducer` · `training`                                 | The speech-recognition stack.                                              |
| **Verification**        | `proofs/` (Coq/Rocq + TLA⁺)                                                            | Machine-checked semantics and invariants.                                  |

---

## Core concepts

### Semirings — one algebra, many objectives

A **semiring** `(K, ⊕, ⊗, 0̄, 1̄)` equips a weight set `K` with two monoids that distribute. The trick
is that an algorithm written in terms of `⊕` and `⊗` computes a *different quantity* depending on which
semiring you instantiate it with:

| Semiring | `` (combine alternatives) | `` (combine in sequence) | `` | `` | Computes… |
|---|---|---|---|---|---|
| **Tropical** | `min` | `+` | `+∞` | `0` | shortest path (e.g. edit distance) |
| **Log** | `⊕ₗₒg` | `+` | `+∞` | `0` | total probability mass (in −log space) |
| **Probability** | `+` | `×` | `0` | `1` | probabilities directly |
| **Boolean** | `` (OR) | `` (AND) | `false` | `true` | reachability (is there *any* path?) |
| **Expectation** | `+` | `product-rule` | `(0, 0)` | `(1, 0)` | expected values / gradients |
| **Product** | `(⊕₁, ⊕₂)` | `(⊗₁, ⊗₂)` | `(0̄₁, 0̄₂)` | `(1̄₁, 1̄₂)` | two objectives at once |
| **Lexicographic** | `lex-min` | `(⊗₁, ⊗₂)` | `(0̄, 0̄)` | `(1̄, 1̄)` | tie-broken priorities |
| **String** | `longest common prefix` | `concatenation` | `` | `ε` | label accumulation |

where the **log-add** operator is  `x ⊕ₗₒg y = −ln(e⁻ˣ + e⁻ʸ)`.  Additional weight types ship in
the same module — `Count`, `Power` (η-power, for online learning), `Gödel`, `SignedTropical`
(rewards as negative costs), and set/edit-valued weights — see
[`docs/architecture/semirings.md`](docs/architecture/semirings.md).

```rust
use lling_llang::semiring::{Semiring, TropicalWeight};

let a = TropicalWeight::new(2.0);
let b = TropicalWeight::new(3.0);

let alternatives = a.plus(&b);   // min(2, 3) = 2   ← pick the cheaper branch
let in_sequence  = a.times(&b);  //     2 + 3 = 5   ← pay both costs
```

> **Why semirings?** Goodman's *Semiring Parsing* [[5]]#references showed that recognition,
> derivation forests, Viterbi, *n*-best, and inside/outside probabilities are all the *same*
> deductive computation over different semirings. Mohri's *Weighted Automata Algorithms*
> [[3]]#references extends this to the full suite of transducer operations. `lling-llang` is a
> direct, statically-typed embodiment of that insight.

### Lattices — weighted DAGs of hypotheses

A **lattice** is a weighted DAG whose nodes are positions in the input and whose arcs are scored
token alternatives. Each start→end path is one complete hypothesis. Lattices compactly represent
*exponentially many* alternatives through shared structure.

![Correction lattice (WFSA) — green marks the best (Viterbi) path](docs/diagrams/lattice.svg)

<details><summary>Plain-text view</summary>

```text
  (0) ──► (1) ──► (2) ──► (3)        (start = 0, final = 3)

  0 → 1 :  the / 0.5   |  teh / 1.0
  1 → 2 :  quick / 0.5 |  quik / 1.0
  2 → 3 :  fox / 0.0
```
</details>

```text
Total cost of a path  =  ⊗ of its arc weights  (Tropical ⊗ = +):
  the  quick fox  = 0.5 + 0.5 + 0.0 = 1.0   ← best  (Viterbi ⊕ = min)
  the  quik  fox  = 0.5 + 1.0 + 0.0 = 1.5
  teh  quick fox  = 1.0 + 0.5 + 0.0 = 1.5
  teh  quik  fox  = 1.0 + 1.0 + 0.0 = 2.0   (the uncorrected input)
```

Weights are **costs**: lower is better, and a dictionary/language model makes a plausible
correction cheaper than leaving an unlikely token in place.

### WFSTs & rational operations

The `wfst` module provides the general transducer with its **rational operations** — *union*
(`A ∪ B`), *concatenation* (`A · B`), and *Kleene closure* (`A*`) — and **unary operations** —
*invert*, *project* (keep input or output tape), and *reverse*. Larger systems are assembled by
lazily **composing** transducers (`A ∘ B`), evaluated on the fly to avoid materializing the full
product. See [[1]](#references), [[3]](#references) and
[`docs/architecture/wfst-operations.md`](docs/architecture/wfst-operations.md).

### Correction layers

A **correction layer** transforms a lattice — filtering paths or re-weighting arcs — and layers
compose into a pipeline. This is how spelling, grammar, and language-model knowledge stack up.

![Correction pipeline](docs/diagrams/correction-pipeline.svg)

---

## Quick start

Add the crate (path or version) to your `Cargo.toml`:

```toml
[dependencies]
lling-llang = "0.1"
```

### A worked example

Build the lattice from the diagram above, then extract the best correction. *(This is the real API
— `add_correction`, `EdgeMetadata::{original, correction}`, `viterbi` returning `ViterbiResult`.)*

```rust
use lling_llang::lattice::{LatticeBuilder, EdgeMetadata};
use lling_llang::backend::HashMapBackend;
use lling_llang::semiring::TropicalWeight;
use lling_llang::path::viterbi;

let mut builder = LatticeBuilder::new(HashMapBackend::new());

// Position 0 → 1: the original token "teh" costs more than its correction "the".
builder.add_correction(0, 1, "teh", TropicalWeight::new(1.0), EdgeMetadata::original());
builder.add_correction(0, 1, "the", TropicalWeight::new(0.5), EdgeMetadata::correction(1));

// Position 1 → 2: likewise for "quik" vs. "quick".
builder.add_correction(1, 2, "quik",  TropicalWeight::new(1.0), EdgeMetadata::original());
builder.add_correction(1, 2, "quick", TropicalWeight::new(0.5), EdgeMetadata::correction(1));

// Position 2 → 3: "fox" is already correct (zero cost).
builder.add_correction(2, 3, "fox", TropicalWeight::new(0.0), EdgeMetadata::original());

let mut lattice = builder.build(3);          // 3 = final position
let result = viterbi(&mut lattice);

assert!(result.success);
assert_eq!(result.path.to_words(&lattice), vec!["the", "quick", "fox"]);
assert_eq!(result.path.weight.value(), 1.0); // 0.5 + 0.5 + 0.0
```

For alternatives, swap `viterbi` for `nbest(&mut lattice, k)` (top-*k*) or
`beam_search(&mut lattice, width)` (approximate, for large lattices).

### The one algorithm behind it

`viterbi` is a special case of the **generalized single-source shortest-distance** algorithm
(Mohri [[3]](#references)). Presented in literate style — the same loop computes *any* of the
objectives in the semiring table, because `⊕` and `⊗` are abstract:

We want, for every node `q`, the value `d[q]` = the `⊕`-combination over **all** start→`q` paths of
the `⊗`-product of their arc weights. Initialize the source to the empty-path weight `1̄` and everything
else to “no path yet”, `0̄`:

```text
d[start] ← 1̄
d[q]     ← 0̄          for every q ≠ start
```

Because a lattice is acyclic, visiting nodes in **topological order** guarantees that when we reach
`q`, `d[q]` is already final. We then *relax* each outgoing arc, pushing `q`'s value forward:

```text
for q in topological_order(G):
    for each arc  q ─[w]→ r:
        d[r] ← d[r] ⊕ (d[q] ⊗ w)          # combine this path in
```

The result `d[end]` is the answer; choosing the semiring chooses the question:

```text
Tropical (⊕ = min, ⊗ = +)   →  cost of the cheapest path        (Viterbi)
Log      (⊕ = log-add)       →  total probability mass of all paths (forward score)
Boolean  (⊕ = ∨, ⊗ = ∧)      →  is the end reachable at all?
```

To recover the *path* (not just its score), Viterbi additionally stores a back-pointer at each
relaxation and walks them back from `end`. Complexity is `O(∣V∣ + ∣E∣)` for acyclic lattices;
general (cyclic) inputs use a queue discipline and semiring-specific closure
(see [`docs/algorithms/shortest-distance.md`](docs/algorithms/shortest-distance.md)).

---

## What's inside — a capability tour

Each area links to its in-depth guide under [`docs/`](docs/).

### Classical WFST algorithms
Generic over the semiring, following Mohri [[3]](#references):
shortest-distance, **weight pushing**, **`ε`-removal**, **determinization**, **minimization**,
`connect` (trimming), and **synchronization** of label delay.
→ [`docs/algorithms/`](docs/algorithms/)

### Transducer families
- **`multitape`***k*-tape transducers with projection & synchronization.
- **`pushdown`** — weighted PDAs for context-free structure.
- **`tree_transducers`** — ranked-alphabet tree-to-tree transduction.
- **`subsequential`** — deterministic transducers with piecewise decomposition (Mohri [[2]]#references).
- **`cfg`** — an **Earley parser** (Earley [[4]]#references) adapted to run over a *lattice*
  rather than a single string, intersecting a grammar with the hypothesis space.
[`docs/architecture/`]docs/architecture/, [`docs/algorithms/parsing.md`]docs/algorithms/parsing.md

### Automatic speech recognition
The canonical recognition network is the composed, optimized cascade
`N = π(min(det(H ∘ C ∘ L ∘ G)))` (Mohri, Pereira & Riley [[1]](#references)):

![ASR cascade](docs/diagrams/asr-cascade.svg)

The stack also includes context-dependency builders (triphone/tetraphone), n-gram LMs with
back-off, subword (BPE) lexicons, dysfluency handling, pronunciation variants, and multi-pass
lattice rescoring. Acoustic integration (`acoustic`) fuses neural emission posteriors with the
graph. End-to-end front-ends include **CTC** (`ctc`) — the objective of Graves et al.
[[6]](#references), the WFST-based decoding of Miao et al. [[8]](#references), and the compact
topologies of Laptev et al. [[9]](#references) — and **neural transducers** (`transducer`, RNN-T,
Graves [[7]](#references)).
→ [`docs/advanced/asr-pipeline.md`](docs/advanced/asr-pipeline.md), [`docs/acoustic/overview.md`](docs/acoustic/overview.md)

### Training objectives
- **LF-MMI** sequence-discriminative training (Povey et al. [[11]]#references; Tian et al. [[14]]#references).
- **Weak supervision** from noisy transcripts via bypass arcs (Gao et al. [[15]]#references).
- **Pruned** composition for memory-bounded training.
[`docs/training/weak-supervision.md`]docs/training/weak-supervision.md

### Differentiable & GPU-ready
- **`differentiable`** — automatic differentiation *through* WFST operations (forward-score and
  Viterbi), WFST convolutional layers, token-graph CTC variants, lexicon marginalization, and
  top-down (k2-style) arc-posterior gradients, following the GTN framework of Hannun et al.
  [[10]]#references.
- **`gpu`** — CPU-side data structures shaped for massively-parallel decoding: **CSR** adjacency,
  lock-free **uint64 token packing**, **k-vector** atomic reduction, and mark-and-compact soft
  pruning, following Braun et al. [[12]]#references. *(These are GPU-ready layouts; CUDA/`wgpu`
  kernels are a documented extension point, not yet shipped.)*
[`docs/advanced/differentiable.md`]docs/advanced/differentiable.md, [`docs/advanced/gpu-acceleration.md`]docs/advanced/gpu-acceleration.md

### Text & code
- **`text_processing`** — TN/ITN over semiotic classes (cardinals, money, dates, …; Zhang et al. [[16]]#references).
- **`error_models`** — edit-distance, Damerau-Levenshtein, confusion-matrix (OCR/QWERTY), and homophone transducers.
- **`llm`** — grammar-constrained decoding (token masking from a CFG/PDA/FSM) for LLM output.
- **`programming`** — syntax-error repair and automated API-version migration.

### Online learning
**RRWM** (Randomized Weighted-Majority over path experts) and the **`η`-power semiring** for online
ensemble learning, after Cortes, Kuznetsov, Mohri & Warmuth [[13]](#references).
→ [`docs/algorithms/rrwm.md`](docs/algorithms/rrwm.md), [`docs/architecture/power-semiring.md`](docs/architecture/power-semiring.md)

---

## Formal verification

The semantics that matter are **machine-checked**. Every Coq/Rocq file builds with *no* `admit`,
`Axiom`, or `sorry`; the TLA⁺ specs are model-checked with TLC and include deliberately-broken
mutants to prove the checks have teeth.

![Formal verification](docs/diagrams/formal-verification.svg)

| Layer                     | What is proven                                                                                                                                                        |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Coq — foundations**     | Semiring laws; tropical & log weights; quantization, interval, and roundoff contracts; finite semiring matrix closure.                                                |
| **Coq — WFST semantics**  | WFST/state/transition definitions; accepting-path connectivity; adjacency-matrix semantics; the weighted language `L(A)` via duplicate-free path enumerations.          |
| **Coq — algorithm specs** | Partial-correctness specifications and Bellman-update lemmas for Viterbi, shortest-distance, determinization, and minimization.                                       |
| **TLA⁺ models**           | `RRWM` (bounded online-learning accounting), `LazyComposition` (cache/memory bounds), `CascadeOrder` (`H → C → L → G` ordering) — 9 TLC configs + 3 expected-failure mutants. |

Run everything from the repository root:

```bash
make verify-proofs            # all Rocq proofs + all TLC configurations + mutant checks
```

For the memory-intensive Rocq build, a resource-limited invocation is recommended:

```bash
systemd-run --user --scope -p MemoryMax=126G -p CPUQuota=1800% \
  -p IOWeight=30 -p TasksMax=200 make -C proofs/coq -j1
```

See [`proofs/README.md`](proofs/README.md) and [`proofs/doc/proof-status.md`](proofs/doc/proof-status.md).

---

## Installation & feature flags

The default build is a **standalone WFST framework with no external dependencies**. Optional
features pull in integrations and extra layers:

| Feature                                                | Enables                                                    |
|--------------------------------------------------------|------------------------------------------------------------|
| `levenshtein`                                          | Fuzzy lexical correction via `liblevenshtein`.             |
| `pcfg`                                                 | Probabilistic CFG support.                                 |
| `phonetic-rescore`                                     | Phonetic lattice rescoring (Zompist-style rules).          |
| `code-correction` / `latex-syntax` / `mathml-semantic` | Domain-specific correction layers.                         |
| `pos-tagging` / `lm-rerank`                            | POS-tagging and language-model reranking layers.           |
| `f1r3fly`                                              | Full F1R3FLY.io stack (PathMap, MORK, MeTTaIL, MeTTaTron). |
| `pathmap-backend` / `sexpr`                            | PathMap-backed storage; S-expression path format.          |
| `serde` / `bincode-ser`                                | Serialization.                                             |
| `test-utils`                                           | Property-test strategies & fixtures for downstream crates. |

```toml
[dependencies]
lling-llang = { version = "0.1", features = ["levenshtein", "serde"] }
```

See [`Cargo.toml`](Cargo.toml) for the complete, authoritative list.

---

## Documentation

Comprehensive guides live in [`docs/`](docs/) (start at [`docs/README.md`](docs/README.md)):

| Section                            | Contents                                                                                                                             |
|------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| [Architecture]docs/architecture/ | Semirings, lattices, WFST traits & operations, backends, layers.                                                                     |
| [Algorithms]docs/algorithms/     | Path extraction, shortest-distance, pushing, `ε`-removal, determinization, minimization, composition, parsing, sampling, RRWM.         |
| [Advanced]docs/advanced/         | CTC topologies, subsequential transducers, differentiable ops, top-down autograd, ASR pipeline, beam optimization, GPU acceleration. |
| [ASR & Acoustic]docs/asr/        | Cascade construction, subword lexicons, acoustic-model integration.                                                                  |
| [Training]docs/training/         | Weak supervision and discriminative objectives.                                                                                      |
| [Integration]docs/integration/   | F1R3FLY.io ecosystem, liblevenshtein, external speech/NLP & text-correction pipelines.                                               |
| [API Reference]docs/api/         | Per-trait reference for semirings, WFSTs, lattices, backends, paths, layers.                                                         |

---

## Benchmarking

A [Criterion](https://github.com/bheisler/criterion.rs) harness in
[`benches/core_benchmarks.rs`](benches/core_benchmarks.rs) measures semiring operations, lattice
algorithms, path extraction (Viterbi / N-best / beam), Earley parsing on lattices, CTC topologies,
and differentiable scoring:

```bash
cargo bench
```

---

## Project status

**Version 0.1.0 — early development.** The core (`semiring`, `wfst`, `lattice`, `algorithms`,
`path`, `cfg`, `composition`, `layers`) is stable and exercised by property tests and the formal
proofs. Several higher tiers are best understood as **library scaffolding** that you wire into your
own models rather than turnkey systems — notably `gpu` (GPU-ready data structures, no kernels yet),
`acoustic`, and `transducer` (integration surfaces for neural front-ends). Breaking changes may
occur before 1.0.

---

## References

*Every citation below has been checked; DOIs and arXiv identifiers resolve to the stated work.*

**Weighted automata & parsing foundations**
- **[1]** Mohri, M., Pereira, F., & Riley, M. (2002). *Weighted Finite-State Transducers in Speech Recognition.* Computer Speech & Language 16(1):69–88. [doi:10.1006/csla.2001.0184]https://doi.org/10.1006/csla.2001.0184
- **[2]** Mohri, M. (1997). *Finite-State Transducers in Language and Speech Processing.* Computational Linguistics 23(2):269–311. [ACL J97-2003]https://aclanthology.org/J97-2003/
- **[3]** Mohri, M. (2009). *Weighted Automata Algorithms.* In *Handbook of Weighted Automata*, pp. 213–254. Springer. [doi:10.1007/978-3-642-01492-5_6]https://doi.org/10.1007/978-3-642-01492-5_6
- **[4]** Earley, J. (1970). *An Efficient Context-Free Parsing Algorithm.* Communications of the ACM 13(2):94–102. [doi:10.1145/362007.362035]https://doi.org/10.1145/362007.362035
- **[5]** Goodman, J. (1999). *Semiring Parsing.* Computational Linguistics 25(4):573–605. [ACL J99-4004]https://aclanthology.org/J99-4004/

**Speech & sequence models**
- **[6]** Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). *Connectionist Temporal Classification.* ICML '06:369–376. [doi:10.1145/1143844.1143891]https://doi.org/10.1145/1143844.1143891
- **[7]** Graves, A. (2012). *Sequence Transduction with Recurrent Neural Networks.* [arXiv:1211.3711]https://arxiv.org/abs/1211.3711
- **[8]** Miao, Y., Gowayyed, M., & Metze, F. (2015). *EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding.* ASRU 2015. [doi:10.1109/ASRU.2015.7404790]https://doi.org/10.1109/ASRU.2015.7404790
- **[9]** Laptev, A., Majumdar, S., & Ginsburg, B. (2022). *CTC Variations Through New WFST Topologies.* Interspeech 2022:1041–1045. [doi:10.21437/Interspeech.2022-10854]https://doi.org/10.21437/Interspeech.2022-10854
- **[11]** Povey, D., Peddinti, V., Galvez, D., et al. (2016). *Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI.* Interspeech 2016:2751–2755. [doi:10.21437/Interspeech.2016-595]https://doi.org/10.21437/Interspeech.2016-595
- **[14]** Tian, J., Yu, J., Weng, C., Zou, Y., & Yu, D. (2022). *Integrating Lattice-Free MMI into End-to-End Speech Recognition.* [arXiv:2203.15614]https://arxiv.org/abs/2203.15614
- **[15]** Gao, D., Liao, C., Liu, C., et al. (2025). *WST: Weakly Supervised Transducer for Automatic Speech Recognition.* [arXiv:2511.04035]https://arxiv.org/abs/2511.04035

**Differentiable & GPU decoding**
- **[10]** Hannun, A., Pratap, V., Kahn, J., & Hsu, W.-N. (2020). *Differentiable Weighted Finite-State Transducers.* ICML 2020 (PMLR 119). [arXiv:2010.01003]https://arxiv.org/abs/2010.01003
- **[12]** Braun, H., Luitjens, J., Leary, R., Kaldewey, T., & Galvez, D. (2020). *GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition.* ICASSP 2020:7874–7878. [doi:10.1109/ICASSP40776.2020.9054099]https://doi.org/10.1109/ICASSP40776.2020.9054099

**Online learning & text normalization**
- **[13]** Cortes, C., Kuznetsov, V., Mohri, M., & Warmuth, M. K. (2015). *On-Line Learning Algorithms for Path Experts with Non-Additive Losses.* COLT 2015, PMLR 40:424–447. [proceedings.mlr.press/v40/Cortes15]https://proceedings.mlr.press/v40/Cortes15.html
- **[16]** Zhang, Y., Bakhturina, E., Gorman, K., & Ginsburg, B. (2021). *NeMo Inverse Text Normalization: From Development to Production.* [arXiv:2104.05055]https://arxiv.org/abs/2104.05055

---

## License

Licensed under the **Apache License 2.0** — see [`LICENSE`](LICENSE). The crate manifest declares
`MIT OR Apache-2.0`; if you require the MIT option, note that the MIT license text is not yet
included in the repository.

## Contributing

Contributions are welcome. Please read the relevant [`docs/`](docs/) design notes before opening a
PR, keep new algorithms generic over the `Semiring` trait, and — for changes touching verified
semantics — run `make verify-proofs` so the proofs stay green.