# espeak-ng-rs
[](https://crates.io/crates/espeak-ng)
[](https://docs.rs/espeak-ng)
[](https://www.gnu.org/licenses/gpl-3.0.html)
A pure-Rust port of [eSpeak NG](https://github.com/espeak-ng/espeak-ng) text-to-speech,
built with a test-first, bottom-up approach.
The C library is used as an oracle: the Rust implementation must produce
**bit-identical output** for every input.
---
## Status
| `encoding` | ✅ Complete | All codepages (UTF-8, ISO-8859-*, KOI8-R, ISCII, …) |
| `phoneme` | ✅ Complete | Phoneme table loader, IPA rendering, instruction scanner |
| `dictionary` | ✅ Complete | Hash lookup, rule engine, suffix stripping, `SetWordStress` |
| `translate` | ✅ Complete | Full text → IPA pipeline, multi-language, numbers, punctuation |
| `synthesize` | ✅ Complete | Full harmonic synthesis reading espeak-ng binary phoneme data |
**319 tests passing** — 27/27 IPA oracle comparisons + 10 synthesis integration tests.
**Two synthesis paths are available:**
| `Synthesizer::synthesize(ipa_str)` | Hand-coded IPA → 3-formant cascade | Works anywhere, generic voice |
| `Synthesizer::synthesize_codes(codes, phdata)` | Phoneme bytecodes → real espeak-ng frame data → harmonic synth | Requires espeak-ng data files, authentic espeak-ng character |
---
## Quick start
```bash
# Run all tests (unit + integration + oracle)
cargo test
# Run oracle comparison tests with verbose output
cargo test --test oracle_comparison -- --nocapture
# Run benchmarks (requires espeak-ng binary on PATH for C baseline)
./benches/bench.sh
```
---
## Usage
```rust
// Text → IPA phonemes
let ipa = espeak_ng::text_to_ipa("en", "hello world")?;
assert_eq!(ipa, "həlˈəʊ wˈɜːld");
// More examples
espeak_ng::text_to_ipa("en", "42")?; // "fˈɔːti tˈuː"
espeak_ng::text_to_ipa("en", "walked")?; // "wˈɔːkt"
espeak_ng::text_to_ipa("en", "happily")?; // "hˈapɪli"
espeak_ng::text_to_ipa("de", "schön")?; // "ʃˈøːn"
espeak_ng::text_to_ipa("fr", "bonjour")?; // "bɔ̃ʒˈuːɹ"
// Text → raw PCM (22050 Hz, mono, 16-bit) — not yet implemented
let (samples, rate) = espeak_ng::text_to_pcm("en", "hello world")?;
```
---
## What is implemented
### `encoding/`
- Full UTF-8 encode/decode (`utf8_decode_one`, `encode_one`)
- All eSpeak NG codepage tables: ISO-8859-1 through -16, KOI8-R, ISCII
- `Encoding::from_name()` lookup matching C's `encoding.c`
### `phoneme/`
- Binary phoneme table loader (`ph_data` files, `phonindex`)
- Per-language table selection (`select_table_by_name`)
- Phoneme attribute access: type, flags, mnemonic, program address
- IPA string extraction via bytecode scanner (`phoneme_ipa_string`):
- Handles `i_IPA_NAME` instructions
- Correctly handles synthesis-only phonemes (first instruction ≥ `i_FMT`)
- Language-specific scanning depth to avoid bleed-through
### `dictionary/`
- Binary `en_dict`-format reader (`Dictionary::from_bytes`)
- `TransposeAlphabet` decompression for Latin-script entries
- Hash-based word lookup (`hash_word`, `lookup`)
- Full rule engine (`TranslateRules` / `MatchRule`):
- Pre/post context matching (letter groups, syllable counts, stress, …)
- `RULE_ENDING` suffix detection with `end_type` and separated `end_phonemes`
- `RULE_NO_SUFFIX`, `RULE_DOUBLE`, `RULE_LETTERGP`, `RULE_DOLLAR`, …
- Score-based rule selection, condition bitmask, spell-word flag
- `SetWordStress` — full port of the C function:
- Vowel stress array construction (`GetVowelStress`)
- All stress placement strategies (trochaic, iambic, left-to-right, …)
- `$strend` / `$strend2` end-stress promotion
- Clause-level final-stress demotion
- Suffix stripping (SUFX_I): re-translates stem with `FLAG_SUFFIX_REMOVED`,
combining stem phonemes + suffix phonemes correctly
- Word-final devoicing for German / Dutch / Afrikaans / Slovak / Slovenian / Albanian
### `translate/`
- `text_to_ipa(lang, text) → String` public API
- Tokeniser: words, parsed number tokens, punctuation, clause boundaries
- `word_to_phonemes`: dictionary lookup → suffix stripping → translation rules
- Typed number grammar:
- `NumberGrammar` models ordinal parsing, tens ordering, hundreds, and thousands behavior per language
- `NumberToken` distinguishes cardinals, decimals, and ordinals before phoneme rendering
- `Pronunciation` builder handles `END_WORD` (||) separators without manual byte trimming in each branch
- Number-to-phonemes:
- Integers 0–999 999 999 999 via grouped scale dict entries (`_0`–`_19`, `_NX`, `_0C`, `_0M1`, `_0M2`, `_0M3`)
- `NUM_1900` year format (1900 → "nineteen hundred")
- Decimal numbers: integer + point + individual digits
- Per-language number grammar for conjunctions, units-before-tens ordering, and omitted `one` prefixes
- Ordinal numbers via `_#<suffix>` dict entries, language ordinal indicators, ordinal-dot languages, and scale-aware ordinal composition
- IPA rendering (`phonemes_to_ipa_full`):
- Primary (ˈ) / secondary (ˌ) stress marks before vowels
- `END_WORD` → word-boundary space
- Language-specific overrides (English schwa, French 'r' → ʁ, …)
- Context-sensitive phonemes: `d#` → 't'/'d', `z#` → 's'/'z' based on voicing
- French liaison phoneme suppression at word-final position
- German word-final devoicing (Auslautverhärtung)
- Multi-clause stress promotion (mirrors `phonemelist.c`)
- Language routing: en, fr, de, es, and many more via data files
---
## Oracle test coverage
| `en_hello` | "hello" | hɛlˈəʊ |
| `en_hello_world` | "hello world" | hɛlˈəʊ wˈɜːld |
| `en_silent_e` | "cake" | kˈeɪk |
| `en_gh_digraph` | "night" | nˈaɪt |
| `en_silent_consonants` | "pneumonia" | njuːmˈəʊniə |
| `en_suffixes` | "walked", "happily", … | wˈɔːkt, hˈapɪli, … |
| `en_numbers_cardinal` | "0" … "1000000" | zˈiəɹəʊ … wˈɒn mˈɪliən |
| `en_numbers_with_decimal` | "3.14", "0.5" | θɹˈiː pɔɪnt wˈɒn fˈɔː, … |
| `en_sentence_period` | "Hello. Goodbye." | hɛlˈəʊ ɡʊdbˈaɪ |
| `en_comma` | "yes, no, maybe" | jˈɛs nˈəʊ mˈeɪbi |
| `de_guten_tag` | "guten Tag" | ɡˈuːtən tˈaːk |
| `de_umlauts` | "über", "schön", "müde" | ˈyːbɜ, ʃˈøːn, mˈyːdə |
| `de_ch_digraph` | "Bach", "ich" | bˈax, ˈɪç |
| `es_hola` | "hola" | ˈola |
| `es_ll_digraph` | "llamar" | ʎamˈaɾ |
| `fr_bonjour` | "bonjour" | bɔ̃ʒˈuːɹ |
| `fr_nasal_vowels` | "bon" | bˈɔ̃ |
| `fr_liaison` | "les amis" | le-z amˈi |
---
## Testing approach
Tests are written before the implementation (TDD).
```
tests/
encoding_integration.rs 22 golden-value tests for all encodings
oracle_comparison.rs 27 tests comparing Rust ↔ C oracle output
common/mod.rs shared helpers (espeak_available, try_espeak_ipa, …)
```
Oracle tests use an `assert_matches_oracle!` macro with three outcomes:
| `espeak-ng` not on PATH | Skip with `[SKIP]` notice |
| Rust returns `NotImplemented` | Print C oracle value as a target, pass |
| Rust returns a real string | Must exactly match C oracle output |
This means all comparison tests can be written now, run in any environment,
and automatically start enforcing correctness as each module is implemented.
---
## Data directory
The crate reads compiled eSpeak NG data files at runtime. The data resolution order is:
1. `ESPEAK_DATA_PATH` environment variable
2. `espeak-ng-data/` next to the running executable
3. `espeak-ng-data/` in the current working directory
4. `/usr/share/espeak-ng-data` (system installation)
A complete copy of the compiled data directory (from eSpeak NG 1.52.0 + additional
language files from 1.52.0.1) is bundled at `espeak-ng-data/` in this repository.
This makes the crate fully self-contained without requiring a system eSpeak NG
installation.
The bundle contains:
- **114 compiled dictionaries** (`*_dict` files) for 114 languages
- **145 language definition files** (`lang/`) — includes ps, rup, crh, mn not in 1.52.0
- **200 voice definition files** (`voices/`) — includes asia/ps, ps voices
- **Binary phoneme data** (`phondata`, `phonindex`, `phontab`, `intonations`)
For selective embedding, the repository also contains per-language dictionary
crates under `data-crates/espeak-ng-data-dict-<lang>` in addition to the
aggregate `espeak-ng-data-dicts` crate.
```bash
# Use bundled data explicitly
ESPEAK_DATA_PATH=/path/to/espeak-ng-rs/espeak-ng-data cargo test
```
---
## Features
| `c-oracle` | Links `libespeak-ng` via FFI; enables the `oracle` module for comparison tests and benchmarks. Requires `libespeak-ng` to be installed (`pkg-config: espeak-ng`). |
| `bundled-data` | Embeds the full eSpeak NG dataset via the aggregate data crates and enables `install_bundled_data()`. |
| `bundled-data-<lang>` | Embeds phoneme data plus a single language dictionary crate and enables selective installers such as `install_bundled_language()`. |
| `bundled-espeak` | Downloads eSpeak NG 1.52.0 from GitHub, builds it with CMake, and bakes the binary/data paths into the benchmarks. Requires `cmake`, a C compiler, `curl`/`wget`, `tar`. |
```bash
# FFI oracle
cargo test --features c-oracle
# Full embedded data
cargo test --features bundled-data
# Selective embedded data
cargo test --features bundled-data-en,bundled-data-uk
# Selective bundled-data demo
cargo run --example bundled_data_selective_demo --features bundled-data-en,bundled-data-uk
# Bundled build (no system install needed)
cargo bench --features bundled-espeak
cargo bench --features bundled-espeak,c-oracle # both
```
Selective bundled-data helpers exposed by the main crate:
- `espeak_ng::bundled_languages()`
- `espeak_ng::has_bundled_language("uk")`
- `espeak_ng::install_bundled_language(&data_dir, "uk")`
- `espeak_ng::install_bundled_languages(&data_dir, &["en", "uk"])`
---
## Publishing checklist
Before anything is published, make sure tests are valid and passing.
```bash
# 1) Baseline test suite
cargo test
# 2) Oracle + bundled-espeak path
cargo test --features "c-oracle,bundled-espeak"
# 3) Optional selective bundled-data checks
cargo test --test bundled_data_selective --features bundled-data-en,bundled-data-de
# 4) Preview publish order/commands
python3 scripts/publish_all_crates.py
# 5) Dry-run publish checks (local changes allowed)
python3 scripts/publish_all_crates.py --execute --dry-run --allow-dirty
# 6) Actual publish (when ready)
python3 scripts/publish_all_crates.py --execute
```
`scripts/publish_all_crates.py --execute` enforces these preflight checks
before any crate is published and aborts on first failure.
The same gates are enforced in CI on push/PR by
`.github/workflows/ci.yml`.
Use [PUBLISHING.md](PUBLISHING.md) for full publication details.
---
## Benchmarks

| First-phoneme latency | **~606 ns** | ~5.5 ms | **~9 000×** |
| Synthesizer throughput | **380× real-time** | — | — |
| Resonator DSP (per sample) | **3.2 ns** | — | — |
| Encoding name lookup | **3.0 ns** | — | — |
The Rust speedup over C subprocess comes entirely from eliminating process-spawn and
shared-library initialisation overhead — the in-process dictionary lookup + rule engine
returns the first phoneme in under a microsecond.
See [BENCHMARK.md](BENCHMARK.md) for the full Criterion HTML report.
```bash
./benches/bench.sh # run + snapshot + generate BENCHMARK.md
./benches/bench.sh --no-run # regenerate BENCHMARK.md from last run
./benches/bench.sh --filter resonator # one group only
```
Benchmark groups:
| `encoding/utf8_decode` | UTF-8 decode throughput across scripts and input sizes |
| `encoding/name_lookup` | `Encoding::from_name()` lookup latency |
| `synthesize/resonator` | Single resonator DSP filter tick (`Resonator::tick()`) |
| `text_to_ipa/rust` | Full Rust pipeline: text → IPA |
| `text_to_ipa/c_cli` | C subprocess baseline (process spawn included) |
| `latency/first_phoneme` | First-phoneme latency: Rust vs C subprocess |
| `text_to_ipa/ffi_vs_rust` | Rust vs C FFI baseline (`--features c-oracle`) |
---
## Project layout
```
espeak-ng-rs/
├── src/
│ ├── lib.rs public API + module declarations
│ ├── error.rs EspeakError enum, Result alias
│ ├── encoding/
│ │ ├── mod.rs Encoding enum, TextDecoder, utf8_decode_one/encode_one
│ │ └── codepages.rs ISO-8859-*, KOI8-R, ISCII lookup tables
│ ├── phoneme/
│ │ ├── mod.rs PhonemeType, PhonemeFlags, PhonemeTable
│ │ ├── load.rs Binary phoneme table loader
│ │ ├── table.rs Table selection, mnemonic access
│ │ └── feature.rs Phoneme feature extraction
│ ├── dictionary/
│ │ ├── mod.rs Constants, flag definitions
│ │ ├── file.rs Dictionary binary parser, group index
│ │ ├── lookup.rs Hash-based word lookup
│ │ ├── rules.rs MatchRule + TranslateRules engine
│ │ ├── stress.rs SetWordStress, GetVowelStress
│ │ ├── phonemes.rs Phoneme encoding helpers
│ │ └── transpose.rs TransposeAlphabet decompression
│ ├── translate/
│ │ ├── mod.rs Translator, text_to_ipa, word_to_phonemes,
│ │ │ tokeniser, number-to-phonemes, IPA renderer
│ │ └── ipa_table.rs Kirschenbaum → IPA lookup, mnemonic overrides
│ ├── synthesize/
│ │ ├── mod.rs Synthesizer API, synthesize_codes() high-quality path
│ │ ├── engine.rs IPA → cascade formant synthesizer (generic path)
│ │ ├── targets.rs IPA → FormantTarget table (60 phonemes)
│ │ ├── phondata.rs Binary SPECT_SEQ / frame_t parser from phondata
│ │ ├── bytecode.rs Phoneme bytecode scanner (finds i_FMT address)
│ │ ├── wavegen.rs Harmonic synthesizer (PeaksToHarmspect + wavegen loop)
│ │ └── sintab_data.rs 2048-entry sine lookup table (from sintab.h)
│ └── oracle/mod.rs FFI to libespeak-ng (feature = c-oracle)
├── tests/
│ ├── common/mod.rs
│ ├── encoding_integration.rs
│ ├── dictionary_integration.rs
│ └── oracle_comparison.rs
├── benches/
│ ├── vs_c.rs Criterion benchmark suite
│ ├── bench.sh Run benchmarks + generate BENCHMARK.md
│ └── results/ Criterion JSON + SVG snapshots (committed)
├── build.rs pkg-config link (c-oracle) + CMake build (bundled-espeak)
├── Cargo.toml
├── BENCHMARK.md
└── README.md
```
---
## Known limitations
- **Number translation** uses typed per-language grammar plus grouped scale composition through billions, but it still does not cover every `numbers.c` feature and format.
- **Ordinal numbers** are supported via `_#<suffix>` dict entries (English "1st", Spanish "1º", …), language ordinal indicators (Dutch "1e"), and ordinal-dot languages (German "3.").
- **Prefix stripping** not yet implemented (very rare in English).
- **`phonSWITCH`** (mid-word language switching) not yet handled.
---
## Licence
GPL-3.0-or-later — same as eSpeak NG.
---
## Authors
- [Eugene Hauptmann](https://github.com/eugenehp)
---
## Copyright
Copyright © 2026 Eugene Hauptmann
This project is a from-scratch Pure Rust reimplementation and does not copy
C source from eSpeak NG, but it is licensed under the same terms:
[GPL-3.0-or-later](https://www.gnu.org/licenses/gpl-3.0.html).
Source: <https://github.com/eugenehp/espeak-ng-rs>