kittentts 0.4.0

Rust port of KittenTTS — lightweight ONNX-based text-to-speech
Documentation

kittentts-rs

A Rust port of KittenTTS — an ultra-lightweight, CPU-only text-to-speech engine based on ONNX models.

Screenshots

iOS Android
iOS Android

Features

  • ONNX Runtime inference — uses ort (ORT 2.0 bindings) for fast CPU inference
  • Full text preprocessing — numbers, currencies, abbreviations, ordinals, units, etc. → spoken words
  • Pure-Rust phonemisation — IPA output via the espeak-ng crate (no C library, no system dependencies)
  • 114 bundled languages — English and 113 other languages ship as embedded data (no runtime downloads)
  • Same ONNX models — works with all KittenTTS HuggingFace checkpoints
  • Automatic chunking — long texts split into ≤ 400-char sentence chunks, then concatenated
  • Cross-platform — macOS, Linux, Windows, iOS, Android — all from pure Rust
  • Zero native dependencies — no cmake, no pkg-config, no brew install, no apt install

Prerequisites

None! The espeak feature uses the pure-Rust espeak-ng crate with bundled data for all 114 supported languages. No system library installation is required on any platform.

The espeak feature is opt-in. Without it every API that accepts raw IPA input still works; only text-to-IPA conversion (and therefore the high-level generate / generate_to_file functions) is unavailable.

Installation

# Cargo.toml

# Without espeak (IPA input only)
[dependencies]
kittentts = "0.3.0"

# With espeak (full text input — pure Rust, no system deps)
[dependencies]
kittentts = { version = "0.3.0", features = ["espeak"] }

Or add it with cargo:

cargo add kittentts
cargo add kittentts --features espeak

From GitHub

[dependencies]
kittentts = { git = "https://github.com/eugenehp/kittentts-rs", tag = "v0.3.0" }
kittentts = { git = "https://github.com/eugenehp/kittentts-rs", branch = "main" }

Quick Start

use kittentts::download;
use std::path::Path;

fn main() -> anyhow::Result<()> {
    // Downloads model from HuggingFace (cached after first run)
    let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8")?;

    println!("Available voices: {:?}", tts.available_voices);

    // Generate and save as WAV (24 kHz, float32, mono)
    tts.generate_to_file(
        "Hello from Rust! This high quality TTS model works without a GPU.",
        Path::new("output.wav"),
        "Jasper",
        1.0,   // speed (1.0 = normal)
        true,  // clean_text: run number/abbreviation expansion
    )?;

    Ok(())
}

Run the bundled example:

cargo run --example basic --features espeak
cargo run --example basic --features espeak -- --voice Luna --text "Hello world" --output hello.wav

Available Models

Model Params Size
KittenML/kitten-tts-mini-0.8 80M 80 MB
KittenML/kitten-tts-micro-0.8 40M 41 MB
KittenML/kitten-tts-nano-0.8-fp32 15M 56 MB
KittenML/kitten-tts-nano-0.8-int8 15M 25 MB

Available Voices (v0.8)

Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Bundled Languages (114)

The espeak feature bundles phoneme data for all 114 espeak-ng languages:

af, am, an, ar, as, az, ba, be, bg, bn, bpy, bs, ca, chr, cmn, cs, cv, cy, da, de, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gn, grc, gu, hak, haw, he, hi, hr, ht, hu, hy, ia, id, io, is, it, ja, jbo, ka, kk, kl, kn, ko, kok, ku, ky, la, lb, lfn, lt, lv, mi, mk, ml, mr, ms, mt, mto, my, nci, ne, nl, no, nog, om, or, pa, pap, piqd, pl, pt, py, qdb, qu, quc, qya, ro, ru, sd, shn, si, sjn, sk, sl, smj, sq, sr, sv, sw, ta, te, th, ti, tk, tn, tr, tt, ug, uk, ur, uz, vi, yue

Note: 4 languages (bs, io, lfn, pap) have missing phoneme tables in espeak-ng 0.1.0. 17 languages with non-Latin scripts may return empty IPA for some inputs (upstream limitation).

API

// Load from HuggingFace Hub
let tts = kittentts::download::load_from_hub("KittenML/kitten-tts-mini-0.8")?;

// Load from local files
let tts = kittentts::model::KittenTtsOnnx::load(
    Path::new("model.onnx"),
    Path::new("voices.npz"),
    Default::default(), // speed_priors
    Default::default(), // voice_aliases
)?;

// Generate audio → Vec<f32> at 24 kHz
let audio: Vec<f32> = tts.generate("Hello!", "Jasper", 1.0, true)?;

// Generate and save to WAV
tts.generate_to_file("Hello!", Path::new("out.wav"), "Jasper", 1.0, true)?;

// Generate from pre-computed IPA (no espeak feature needed)
let audio = tts.generate_from_ipa("həloʊ", "Jasper", 1.0, 5)?;

// Available voices
println!("{:?}", tts.available_voices);

Cross-Platform Build

Since phonemisation is now pure Rust, cross-compilation is straightforward:

# iOS
rustup target add aarch64-apple-ios
cargo build --target aarch64-apple-ios --features espeak

# Android
rustup target add aarch64-linux-android
cargo build --target aarch64-linux-android --features espeak

# Linux aarch64
rustup target add aarch64-unknown-linux-gnu
cargo build --target aarch64-unknown-linux-gnu --features espeak

# Windows (from any host)
rustup target add x86_64-pc-windows-gnu
cargo build --target x86_64-pc-windows-gnu --features espeak

No ESPEAK_LIB_DIR, no sysroot, no cross-compiled C library needed.

iOS

bash ios/build_rust_ios.sh

Android

export ANDROID_NDK_HOME=/path/to/ndk
bash android/build_rust_android.sh

With cargo cross

cargo install cross --git https://github.com/cross-rs/cross
cross build --target aarch64-unknown-linux-gnu --features espeak
cross build --target x86_64-unknown-linux-musl --features espeak

Architecture

Input text
    ↓  TextPreprocessor  (preprocess.rs)
       • numbers / currency / percentages / ordinals → words
       • contractions, units, scientific notation, fractions, …
    ↓  chunk_text()  (model.rs)
       • split into ≤ 400-char sentence chunks
    ↓  espeak-ng (pure Rust)  (phonemize.rs)
       • text → IPA phoneme string (en, with stress)
       • requires `espeak` feature
    ↓  ipa_to_ids()  (tokenize.rs)
       • IPA chars → integer token IDs  (fixed vocab, same as Python)
       • prepend/append pad token 0
    ↓  ONNX Runtime inference  (model.rs)
       • inputs:  input_ids [1, T], style [1, D], speed [1]
       • output:  audio waveform [samples]
    ↓  tail-trim (–2 000 samples) + chunk concatenation
    ↓  Vec<f32> @ 24 kHz  or  WAV file

Crate Structure

File Role
src/lib.rs Public API & re-exports
src/preprocess.rs Text preprocessing pipeline
src/phonemize.rs Pure-Rust espeak-ng phonemisation (bundled data, no C FFI)
src/tokenize.rs IPA character → token ID
src/npz.rs Hand-written NPY/NPZ loader
src/model.rs ONNX inference, chunking, WAV output
src/download.rs HuggingFace Hub model download
src/ffi.rs C FFI layer for iOS/Android
build.rs Build script (minimal — no native library linking needed)
tests/integration_tests.rs Integration & e2e test suite (40 tests, model-file based)
ios/build_rust_ios.sh Full iOS XCFramework build (device + simulator)
android/build_rust_android.sh Full Android arm64 build (JNI bridge)
examples/basic.rs CLI example

Running Tests

# All unit tests (no espeak — 20 tests)
cargo test

# All unit + phonemisation tests (29 tests, including 114-language coverage)
cargo test --features espeak

# Integration and e2e tests using bundled model files (32 tests)
cargo test --test integration_tests

# Integration + espeak + full inference e2e tests (40 tests)
cargo test --test integration_tests --features espeak

# Full test suite (72 tests)
cargo test --features espeak

# Point at a custom model directory
KITTENTTS_MODEL_DIR=/path/to/models cargo test --test integration_tests

Test counts at a glance

Suite --features espeak Tests
Unit tests (src/**) no 20
Unit tests (src/**) yes 29
Integration tests no 32
Integration tests yes 40
Doc-tests 3
Total yes 72

Migration from C libespeak-ng

This crate previously used C FFI bindings to libespeak-ng with a 1200-line build.rs for native library detection and cross-compilation. It now uses the pure-Rust espeak-ng crate instead:

  • No system library requiredbrew install espeak-ng / apt install libespeak-ng-dev no longer needed
  • No C compiler needed — no cmake, no gcc, no build scripts
  • No unsafe code in phonemisation — the entire FFI layer was removed
  • build.rs reduced from 1200 lines to 8 — no pkg-config, no platform path walk, no Windows auto-build
  • Cross-compilation just works — no ESPEAK_LIB_DIR, no ESPEAK_SYSROOT, no NDK toolchain setup

Citation

@software{kittentts_rs_2026,
  author    = {Eugene Hauptmann},
  title     = {kittentts-rs: A Rust Port of KittenTTS},
  year      = {2026},
  url       = {https://github.com/eugenehp/kittentts-rs},
  note      = {Ultra-lightweight, CPU-only text-to-speech engine based on ONNX models}
}
@software{kittentts_2024,
  author    = {KittenML},
  title     = {KittenTTS},
  year      = {2024},
  url       = {https://github.com/eugenehp/KittenTTS}
}

Changelog

See CHANGELOG.md for a full history of releases and changes.

License

This project is licensed under the Apache License 2.0.