kittentts-rs
A Rust port of KittenTTS — an ultra-lightweight, CPU-only text-to-speech engine based on ONNX models.
Screenshots
| iOS | Android |
|---|---|
![]() |
![]() |
Features
- ONNX Runtime inference — uses
ort(ORT 2.0 bindings) for fast CPU inference - Full text preprocessing — numbers, currencies, abbreviations, ordinals, units, etc. → spoken words
- espeak-ng phonemisation — IPA output via the
libespeak-ngC library (FFI, not subprocess) - Same ONNX models — works with all KittenTTS HuggingFace checkpoints
- Automatic chunking — long texts split into ≤ 400-char sentence chunks, then concatenated
- Cross-platform — macOS, Linux, Windows (MSVC + MinGW), iOS, Android
- Cross-compilation —
cargo crosssupported out-of-the-box for Linux GNU/musl, Android, and more
Prerequisites
The espeak Cargo feature requires libespeak-ng to be installed (the
shared or static library, not just the command-line tool):
# macOS
# Debian / Ubuntu (installs libespeak-ng.so + headers)
# Alpine Linux (installs libespeak-ng.a for static linking)
# Fedora / RHEL
# Arch Linux
# Windows — zero-config auto-build (requires git + cmake in PATH):
# cargo build --features espeak
# ↑ clones + compiles espeak-ng from GitHub automatically on first build.
#
# Or pre-install manually (choose one):
# Option A: official installer → https://github.com/espeak-ng/espeak-ng/releases
# Option B: vcpkg
# Option C: MSYS2/MinGW64
The espeak feature is opt-in. Without it every API that accepts raw IPA
input still works; only text-to-IPA conversion (and therefore the high-level
generate / generate_to_file functions) is unavailable.
Installation
# Cargo.toml
# Without espeak-ng (IPA input only — no native library needed)
[]
= "0.2.5"
# With espeak-ng (full text input)
[]
= { = "0.2.5", = ["espeak"] }
Or add it with cargo:
From GitHub
[]
= { = "https://github.com/eugenehp/kittentts-rs", = "v0.2.5" }
= { = "https://github.com/eugenehp/kittentts-rs", = "main" }
Quick Start
use download;
use Path;
Run the bundled example:
Available Models
| Model | Params | Size |
|---|---|---|
KittenML/kitten-tts-mini-0.8 |
80M | 80 MB |
KittenML/kitten-tts-micro-0.8 |
40M | 41 MB |
KittenML/kitten-tts-nano-0.8-fp32 |
15M | 56 MB |
KittenML/kitten-tts-nano-0.8-int8 |
15M | 25 MB |
Available Voices (v0.8)
Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
API
// Load from HuggingFace Hub
let tts = load_from_hub?;
// Load from local files
let tts = load?;
// Generate audio → Vec<f32> at 24 kHz
let audio: = tts.generate?;
// Generate and save to WAV
tts.generate_to_file?;
// Available voices
println!;
Build Configuration
Environment Variables
| Variable | Description |
|---|---|
ESPEAK_LIB_DIR |
Directory containing libespeak-ng.a or espeak-ng.lib. Takes priority over all auto-detection. Required for iOS/Android. |
ESPEAK_SYSROOT |
Root of a cross-compilation sysroot. All Unix candidate lib paths are prefixed with this value. |
ESPEAK_BUILD_SCRIPT |
Path to a script that builds libespeak-ng from source. Invoked automatically when ESPEAK_LIB_DIR is set but the archive is missing. |
ESPEAK_TAG |
espeak-ng release tag used by the build scripts (default: 1.52.0). |
VCPKG_ROOT |
vcpkg installation root; enables vcpkg-installed espeak-ng on Windows. |
MSYS2_PATH |
MSYS2 installation root on Windows (default: C:\msys64). |
ANDROID_NDK_HOME |
Android NDK root for Android cross-compilation (also ANDROID_NDK_ROOT / NDK_HOME). |
PKG_CONFIG_ALLOW_CROSS |
Set to 1 to allow pkg-config to run during cross-compilation. |
Auto-build Script
Point ESPEAK_BUILD_SCRIPT at one of the provided scripts to have the build
system compile libespeak-ng automatically when the archive is missing:
# macOS / Linux — any target
ESPEAK_LIB_DIR=/espeak-static/lib \
ESPEAK_BUILD_SCRIPT=/scripts/build-espeak-static.sh \
# Windows
Cross-Compilation
Linux → Linux aarch64 (Debian/Ubuntu multiarch — simplest)
Any host → Linux x86_64 or Windows x64 with cargo-zigbuild
cargo-zigbuild uses the Zig
compiler as a drop-in cross-linker — no Docker, no SDK, no sysroot required
for the default (no-espeak) feature set.
# Linux x86_64 — ORT static lib downloaded automatically
# Windows x64 — ORT import library created automatically by the test script,
# or manually: download the ORT Windows ZIP, run llvm-dlltool on the DLL,
# then point ORT_LIB_LOCATION at the directory with libonnxruntime.dll.a.
ORT_LIB_LOCATION=/path/to/ort \
ORT_PREFER_DYNAMIC_LINK=1 \
The Windows build produces a libkittentts.a containing genuine x86-64 COFF
objects that link against onnxruntime.dll at runtime.
Use the dedicated test script to automate the full cross-compilation setup (ORT download, import lib creation, optional espeak cross-compile, optional Wine test run):
# Basic Windows cross-build from Linux/macOS (no espeak):
# With espeak (needs x86_64-w64-mingw32-gcc):
# Full test — build + run under Wine:
Any host → any target with cargo cross
# cross reads Cross.toml and installs libespeak-ng-dev inside the container
Cross.toml in the repository root configures pre-build commands for every
supported GNU and musl target. No manual setup beyond installing cross is
required.
Custom sysroot
ESPEAK_SYSROOT=/path/to/target-sysroot \
Build espeak-ng from source for a specific target
# Linux → aarch64 (requires: apt install gcc-aarch64-linux-gnu)
ESPEAK_LIB_DIR=/espeak-static/aarch64/lib \
ESPEAK_TARGET=aarch64-unknown-linux-gnu \
# Linux → Android arm64 (requires: ANDROID_NDK_HOME set)
ESPEAK_LIB_DIR=/espeak-static/android/lib \
ESPEAK_TARGET=aarch64-linux-android \
ANDROID_NDK_HOME=/path/to/ndk \
# Windows (MSYS2 or MSVC)
Native Windows build and test
Use the PowerShell test script on a Windows machine or a GitHub Actions
windows-latest runner:
# Basic build + test (no espeak):
powershell -ExecutionPolicy Bypass -File scripts\test-windows-native.ps1
# Full test including espeak (auto-builds espeak-ng — needs git + cmake):
powershell -ExecutionPolicy Bypass -File scripts\test-windows-native.ps1 -Espeak
# Build only, skip running tests:
powershell -ExecutionPolicy Bypass -File scripts\test-windows-native.ps1 -Espeak -BuildOnly
The -Espeak flag triggers the zero-config auto-build in build.rs which
clones and compiles espeak-ng from source the first time. Subsequent builds
are instant (stamp file).
Distribution note: when shipping a Windows binary built with the
espeakfeature, copy theespeak-ng-data/directory next to the.exe. The script prints the exact path at the end of a successful run.
iOS
Builds espeak-ng for both device (arm64) and Simulator (arm64-sim), compiles
kittentts-rs for each slice, and packages everything into
ios/KittenTTS.xcframework.
Android
Builds espeak-ng as a shared library, compiles the Rust static lib and JNI
bridge, and copies all .so files into
android/KittenTTSApp/app/src/main/jniLibs/arm64-v8a/.
Architecture
Input text
↓ TextPreprocessor (preprocess.rs)
• numbers / currency / percentages / ordinals → words
• contractions, units, scientific notation, fractions, …
↓ chunk_text() (model.rs)
• split into ≤ 400-char sentence chunks
↓ libespeak-ng FFI (phonemize.rs)
• text → IPA phoneme string (en-us, with stress)
• requires `espeak` feature + libespeak-ng linked at build time
↓ ipa_to_ids() (tokenize.rs)
• IPA chars → integer token IDs (fixed vocab, same as Python)
• prepend/append pad token 0
↓ ONNX Runtime inference (model.rs)
• inputs: input_ids [1, T], style [1, D], speed [1]
• output: audio waveform [samples]
↓ tail-trim (–5 000 samples) + chunk concatenation
↓ Vec<f32> @ 24 kHz or WAV file
Crate Structure
| File | Role |
|---|---|
src/lib.rs |
Public API & re-exports |
src/preprocess.rs |
Text preprocessing pipeline |
src/phonemize.rs |
libespeak-ng FFI bindings and initialisation |
src/tokenize.rs |
IPA character → token ID |
src/npz.rs |
Hand-written NPY/NPZ loader |
src/model.rs |
ONNX inference, chunking, WAV output |
src/download.rs |
HuggingFace Hub model download |
src/ffi.rs |
C FFI layer for iOS/Android |
build.rs |
Native library detection and linking (Windows + cross-compilation aware) |
tests/integration_tests.rs |
Integration & e2e test suite (40 tests, model-file based) |
scripts/build-espeak-static.sh |
Build libespeak-ng.a from source (macOS/Linux/Android cross-compile) |
scripts/build-espeak-static.ps1 |
Build espeak-ng.lib/libespeak-ng.a from source (Windows MSVC/MinGW) |
scripts/test-windows-cross.sh |
Cross-compile + test for Windows from Linux/macOS (downloads ORT, optional Wine run) |
scripts/test-windows-native.ps1 |
Build + test natively on Windows (auto-builds espeak-ng, MSVC/MinGW aware) |
Cross.toml |
cargo cross configuration for Linux GNU/musl targets |
.cargo/config.toml |
Cross-compilation linker settings (cargo-zigbuild, Windows, Linux x86_64) |
ios/build_rust_ios.sh |
Full iOS XCFramework build (device + simulator) |
android/build_rust_android.sh |
Full Android arm64 build (JNI + espeak-ng) |
examples/basic.rs |
CLI example |
Running Tests
# All unit tests (no native library required — 20 tests)
# All unit + espeak-ng phonemisation tests (requires libespeak-ng — 28 tests)
# Integration and e2e tests using bundled model files (32 tests)
# Integration + espeak + full inference e2e tests (40 tests)
# Point at a custom model directory
KITTENTTS_MODEL_DIR=/path/to/models
# Check that all code compiles for the host target
Test counts at a glance
| Suite | --features espeak |
Tests |
|---|---|---|
Unit tests (src/**) |
no | 20 |
Unit tests (src/**) |
yes | 28 |
| Integration tests | no | 32 |
| Integration tests | yes | 40 |
| Doc-tests | — | 4 |
Citation
Changelog
See CHANGELOG.md for a full history of releases and changes.
License
This project is licensed under the Apache License 2.0.

