kittentts-rs
A Rust port of KittenTTS — an ultra-lightweight, CPU-only text-to-speech engine based on ONNX models.
Screenshots
| iOS | Android |
|---|---|
![]() |
![]() |
Features
- ONNX Runtime inference — uses
ort(ORT 2.0 bindings) for fast CPU inference - Full text preprocessing — numbers, currencies, abbreviations, ordinals, units, etc. → spoken words
- Pure-Rust phonemisation — IPA output via the
espeak-ngcrate (no C library, no system dependencies) - 114 bundled languages — English and 113 other languages ship as embedded data (no runtime downloads)
- Same ONNX models — works with all KittenTTS HuggingFace checkpoints
- Automatic chunking — long texts split into ≤ 400-char sentence chunks, then concatenated
- Cross-platform — macOS, Linux, Windows, iOS, Android — all from pure Rust
- Zero native dependencies — no
cmake, nopkg-config, nobrew install, noapt install
Prerequisites
None! The espeak feature uses the pure-Rust espeak-ng crate
with bundled data for all 114 supported languages. No system library installation is required on any platform.
The espeak feature is opt-in. Without it every API that accepts raw IPA
input still works; only text-to-IPA conversion (and therefore the high-level
generate / generate_to_file functions) is unavailable.
Installation
# Cargo.toml
# Without espeak (IPA input only)
[]
= "0.3.0"
# With espeak (full text input — pure Rust, no system deps)
[]
= { = "0.3.0", = ["espeak"] }
Or add it with cargo:
From GitHub
[]
= { = "https://github.com/eugenehp/kittentts-rs", = "v0.3.0" }
= { = "https://github.com/eugenehp/kittentts-rs", = "main" }
Quick Start
use download;
use Path;
Run the bundled example:
Available Models
| Model | Params | Size |
|---|---|---|
KittenML/kitten-tts-mini-0.8 |
80M | 80 MB |
KittenML/kitten-tts-micro-0.8 |
40M | 41 MB |
KittenML/kitten-tts-nano-0.8-fp32 |
15M | 56 MB |
KittenML/kitten-tts-nano-0.8-int8 |
15M | 25 MB |
Available Voices (v0.8)
Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
Bundled Languages (114)
The espeak feature bundles phoneme data for all 114 espeak-ng languages:
af, am, an, ar, as, az, ba, be, bg, bn, bpy, bs, ca, chr, cmn, cs, cv, cy, da, de, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gn, grc, gu, hak, haw, he, hi, hr, ht, hu, hy, ia, id, io, is, it, ja, jbo, ka, kk, kl, kn, ko, kok, ku, ky, la, lb, lfn, lt, lv, mi, mk, ml, mr, ms, mt, mto, my, nci, ne, nl, no, nog, om, or, pa, pap, piqd, pl, pt, py, qdb, qu, quc, qya, ro, ru, sd, shn, si, sjn, sk, sl, smj, sq, sr, sv, sw, ta, te, th, ti, tk, tn, tr, tt, ug, uk, ur, uz, vi, yue
Note: 4 languages (
bs,io,lfn,pap) have missing phoneme tables inespeak-ng0.1.0. 17 languages with non-Latin scripts may return empty IPA for some inputs (upstream limitation).
API
// Load from HuggingFace Hub
let tts = load_from_hub?;
// Load from local files
let tts = load?;
// Generate audio → Vec<f32> at 24 kHz
let audio: = tts.generate?;
// Generate and save to WAV
tts.generate_to_file?;
// Generate from pre-computed IPA (no espeak feature needed)
let audio = tts.generate_from_ipa?;
// Available voices
println!;
Cross-Platform Build
Since phonemisation is now pure Rust, cross-compilation is straightforward:
# iOS
# Android
# Linux aarch64
# Windows (from any host)
No ESPEAK_LIB_DIR, no sysroot, no cross-compiled C library needed.
iOS
Android
With cargo cross
Architecture
Input text
↓ TextPreprocessor (preprocess.rs)
• numbers / currency / percentages / ordinals → words
• contractions, units, scientific notation, fractions, …
↓ chunk_text() (model.rs)
• split into ≤ 400-char sentence chunks
↓ espeak-ng (pure Rust) (phonemize.rs)
• text → IPA phoneme string (en, with stress)
• requires `espeak` feature
↓ ipa_to_ids() (tokenize.rs)
• IPA chars → integer token IDs (fixed vocab, same as Python)
• prepend/append pad token 0
↓ ONNX Runtime inference (model.rs)
• inputs: input_ids [1, T], style [1, D], speed [1]
• output: audio waveform [samples]
↓ tail-trim (–2 000 samples) + chunk concatenation
↓ Vec<f32> @ 24 kHz or WAV file
Crate Structure
| File | Role |
|---|---|
src/lib.rs |
Public API & re-exports |
src/preprocess.rs |
Text preprocessing pipeline |
src/phonemize.rs |
Pure-Rust espeak-ng phonemisation (bundled data, no C FFI) |
src/tokenize.rs |
IPA character → token ID |
src/npz.rs |
Hand-written NPY/NPZ loader |
src/model.rs |
ONNX inference, chunking, WAV output |
src/download.rs |
HuggingFace Hub model download |
src/ffi.rs |
C FFI layer for iOS/Android |
build.rs |
Build script (minimal — no native library linking needed) |
tests/integration_tests.rs |
Integration & e2e test suite (40 tests, model-file based) |
ios/build_rust_ios.sh |
Full iOS XCFramework build (device + simulator) |
android/build_rust_android.sh |
Full Android arm64 build (JNI bridge) |
examples/basic.rs |
CLI example |
Running Tests
# All unit tests (no espeak — 20 tests)
# All unit + phonemisation tests (29 tests, including 114-language coverage)
# Integration and e2e tests using bundled model files (32 tests)
# Integration + espeak + full inference e2e tests (40 tests)
# Full test suite (72 tests)
# Point at a custom model directory
KITTENTTS_MODEL_DIR=/path/to/models
Test counts at a glance
| Suite | --features espeak |
Tests |
|---|---|---|
Unit tests (src/**) |
no | 20 |
Unit tests (src/**) |
yes | 29 |
| Integration tests | no | 32 |
| Integration tests | yes | 40 |
| Doc-tests | — | 3 |
| Total | yes | 72 |
Migration from C libespeak-ng
This crate previously used C FFI bindings to libespeak-ng with a 1200-line
build.rs for native library detection and cross-compilation. It now uses the
pure-Rust espeak-ng crate instead:
- No system library required —
brew install espeak-ng/apt install libespeak-ng-devno longer needed - No C compiler needed — no
cmake, nogcc, no build scripts - No unsafe code in phonemisation — the entire FFI layer was removed
- build.rs reduced from 1200 lines to 8 — no pkg-config, no platform path walk, no Windows auto-build
- Cross-compilation just works — no
ESPEAK_LIB_DIR, noESPEAK_SYSROOT, no NDK toolchain setup
Citation
Changelog
See CHANGELOG.md for a full history of releases and changes.
License
This project is licensed under the Apache License 2.0.

