kittentts 0.2.2

Rust port of KittenTTS — lightweight ONNX-based text-to-speech
Documentation

kittentts-rs

A Rust port of KittenTTS — an ultra-lightweight, CPU-only text-to-speech engine based on ONNX models.

Screenshots

iOS Android
iOS Android

Features

  • ONNX Runtime inference — uses ort (ORT 2.0 bindings) for fast CPU inference
  • Full text preprocessing — numbers, currencies, abbreviations, ordinals, units, etc. → spoken words
  • espeak-ng phonemisation — identical IPA output to the Python library
  • Same ONNX models — works with all KittenTTS HuggingFace checkpoints
  • Automatic chunking — long texts split into ≤400-char sentence chunks, then concatenated

Prerequisites

espeak-ng must be on $PATH for phonemisation:

# Alpine Linux
apk add espeak-ng

# Debian / Ubuntu
apt install espeak-ng

# macOS
brew install espeak-ng

Installation

Via crates.io

cargo add kittentts

Or add it manually to your Cargo.toml:

[dependencies]
kittentts = "0.2"

Via GitHub

To use the latest unreleased code directly from the repository:

# Clone and use as a local path dependency
git clone https://github.com/eugenehp/kittentts-rs
[dependencies]
kittentts = { path = "../kittentts-rs" }

Or reference it as a git dependency without cloning manually:

[dependencies]
kittentts = { git = "https://github.com/eugenehp/kittentts-rs" }

# Pin to a specific branch or tag
kittentts = { git = "https://github.com/eugenehp/kittentts-rs", branch = "main" }
kittentts = { git = "https://github.com/eugenehp/kittentts-rs", tag = "v0.2.0" }

Quick Start

Add to Cargo.toml:

[dependencies]
kittentts = "0.2"
use kittentts::download;
use std::path::Path;

fn main() -> anyhow::Result<()> {
    // Downloads model from HuggingFace (cached after first run)
    let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8")?;

    println!("Available voices: {:?}", tts.available_voices);

    // Generate and save as WAV (24 kHz, float32, mono)
    tts.generate_to_file(
        "Hello from Rust! This high quality TTS model works without a GPU.",
        Path::new("output.wav"),
        "Jasper",
        1.0,   // speed (1.0 = normal)
        true,  // clean_text: run number/abbreviation expansion
    )?;

    Ok(())
}

Run the bundled example:

cargo run --example basic
cargo run --example basic -- --voice Luna --text "Hello world" --output hello.wav

Available Models

Model Params Size
KittenML/kitten-tts-mini-0.8 80M 80 MB
KittenML/kitten-tts-micro-0.8 40M 41 MB
KittenML/kitten-tts-nano-0.8-fp32 15M 56 MB
KittenML/kitten-tts-nano-0.8-int8 15M 25 MB

Available Voices (v0.8)

Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

API

// Load from HuggingFace Hub
let tts = kittentts::download::load_from_hub("KittenML/kitten-tts-mini-0.8")?;

// Load from local files (if you already have the ONNX + voices.npz)
let tts = kittentts::model::KittenTtsOnnx::load(
    Path::new("model.onnx"),
    Path::new("voices.npz"),
    std::collections::HashMap::new(), // speed_priors
    std::collections::HashMap::new(), // voice_aliases
)?;

// Generate audio → Vec<f32> at 24 kHz
let audio: Vec<f32> = tts.generate("Hello!", "Jasper", 1.0, true)?;

// Generate and save to WAV
tts.generate_to_file("Hello!", Path::new("out.wav"), "Jasper", 1.0, true)?;

// Available voices
println!("{:?}", tts.available_voices);

Architecture

Input text
    ↓  TextPreprocessor  (preprocess.rs)
       • numbers / currency / percentages / ordinals → words
       • contractions, units, scientific notation, fractions, …
    ↓  chunk_text()  (model.rs)
       • split into ≤400-char sentence chunks
    ↓  espeak-ng subprocess  (phonemize.rs)
       • text → IPA phoneme string (en-us, with stress)
    ↓  ipa_to_ids()  (tokenize.rs)
       • IPA chars → integer token IDs  (fixed vocab, same as Python)
       • prepend/append pad token 0
    ↓  tract-onnx inference  (model.rs)
       • inputs:  input_ids [1, T], style [1, D], speed [1]
       • output:  audio waveform [samples]
    ↓  tail-trim (–5 000 samples) + chunk concatenation
    ↓  Vec<f32> @ 24 kHz  or  WAV file

Crate Structure

File Role
src/lib.rs Public API & re-exports
src/preprocess.rs Text preprocessing pipeline (mirrors preprocess.py)
src/phonemize.rs eSpeak-NG subprocess wrapper
src/tokenize.rs IPA character → token ID (mirrors TextCleaner)
src/npz.rs Hand-written NPY/NPZ loader (no ndarray-npy needed)
src/model.rs ONNX inference via tract, chunking, WAV output
src/download.rs HuggingFace Hub model download + config.json parsing
examples/basic.rs CLI example

Running Tests

cargo test                    # unit tests (espeak-ng optional for most)
cargo test -- --include-ignored  # includes the espeak-ng phonemise test

Citation

If you use kittentts-rs in your research or project, please cite:

@software{kittentts_rs_2026,
  author    = {Eugene Hauptmann},
  title     = {kittentts-rs: A Rust Port of KittenTTS},
  year      = {2026},
  url       = {https://github.com/eugenehp/kittentts-rs},
  note      = {Ultra-lightweight, CPU-only text-to-speech engine based on ONNX models}
}

If you also use the underlying KittenTTS models, please additionally cite the original library:

@software{kittentts_2024,
  author    = {KittenML},
  title     = {KittenTTS},
  year      = {2024},
  url       = {https://github.com/eugenehp/KittenTTS}
}

Changelog

See CHANGELOG.md for a full history of releases and changes.

License

This project is licensed under the Apache License 2.0.