Expand description
§kittentts
Rust port of KittenTTS — an ultra-lightweight ONNX-based text-to-speech engine.
§Quick start
use kittentts::{KittenTTS, download};
// Download the model from HuggingFace (cached after first run)
let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8").unwrap();
// Generate audio samples (Vec<f32>, 24 kHz mono)
let audio = tts.generate("Hello from Rust!", "Jasper", 1.0, true).unwrap();
// Or write directly to a WAV file
tts.generate_to_file(
"Hello from Rust!",
std::path::Path::new("output.wav"),
"Jasper",
1.0,
true,
).unwrap();§Mobile (iOS / Android)
Phonemisation uses the pure-Rust espeak-ng crate with bundled data.
The same generate API works on every platform
with no extra setup — no C library, no system dependencies.
You can also skip phonemisation entirely and pass pre-computed IPA:
use kittentts::{KittenTTS, download};
let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8").unwrap();
let audio = tts.generate_from_ipa("həloʊ fɹʌm ɹʌst", "Jasper", 1.0, 20).unwrap();§Build requirements
| Platform | Requirement |
|---|---|
| All platforms | None — the espeak-ng crate is pure Rust with bundled data |
§Pipeline (matches Python implementation)
- Text preprocessing — numbers, currencies, abbreviations → spoken words.
- Chunking — long texts split into ≤ 400-char sentence chunks.
- Phonemisation — pure-Rust
espeak-ngconverts text to IPA phonemes. - Tokenisation — IPA characters mapped to integer token IDs.
- ONNX inference — model takes
(input_ids, style, speed), outputs audio. - Tail trim — last 5 000 samples removed (silence artifact).
- Concat — per-chunk audio concatenated into a single waveform.
Re-exports§
pub use model::KittenTtsOnnx as KittenTTS;pub use model::SAMPLE_RATE;pub use encoding::AudioEncoder;pub use encoding::AudioFormat;pub use encoding::EncoderFactory;
Modules§
- download
- HuggingFace Hub model downloader — mirrors
get_model.py. - encoding
- Audio encoding — convert raw f32 samples to various audio formats.
- ffi
- C FFI — bridges
KittenTtsOnnxto iOS / Android callers. - model
- ONNX model runner — mirrors Python’s
KittenTTS_1_Onnx. - npz
- Minimal NPZ / NPY loader.
- phonemize
- Phonemisation using the pure-Rust
espeak-ngcrate. - preprocess
- Text preprocessing pipeline — mirrors
kittentts/preprocess.py. - tokenize
- Character-level tokeniser — mirrors Python’s
TextCleaner.