Skip to main content

Crate kittentts

Crate kittentts 

Source
Expand description

§kittentts

Rust port of KittenTTS — an ultra-lightweight ONNX-based text-to-speech engine.

§Quick start

use kittentts::{KittenTTS, download};

// Download the model from HuggingFace (cached after first run)
let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8").unwrap();

// Generate audio samples (Vec<f32>, 24 kHz mono)
let audio = tts.generate("Hello from Rust!", "Jasper", 1.0, true).unwrap();

// Or write directly to a WAV file
tts.generate_to_file(
    "Hello from Rust!",
    std::path::Path::new("output.wav"),
    "Jasper",
    1.0,
    true,
).unwrap();

§Mobile (iOS / Android)

Phonemisation uses the pure-Rust espeak-ng crate with bundled data. The same generate API works on every platform with no extra setup — no C library, no system dependencies.

You can also skip phonemisation entirely and pass pre-computed IPA:

use kittentts::{KittenTTS, download};

let tts = download::load_from_hub("KittenML/kitten-tts-mini-0.8").unwrap();
let audio = tts.generate_from_ipa("həloʊ fɹʌm ɹʌst", "Jasper", 1.0, 20).unwrap();

§Build requirements

PlatformRequirement
All platformsNone — the espeak-ng crate is pure Rust with bundled data

§Pipeline (matches Python implementation)

  1. Text preprocessing — numbers, currencies, abbreviations → spoken words.
  2. Chunking — long texts split into ≤ 400-char sentence chunks.
  3. Phonemisation — pure-Rust espeak-ng converts text to IPA phonemes.
  4. Tokenisation — IPA characters mapped to integer token IDs.
  5. ONNX inference — model takes (input_ids, style, speed), outputs audio.
  6. Tail trim — last 5 000 samples removed (silence artifact).
  7. Concat — per-chunk audio concatenated into a single waveform.

Re-exports§

pub use model::KittenTtsOnnx as KittenTTS;
pub use model::SAMPLE_RATE;
pub use encoding::AudioEncoder;
pub use encoding::AudioFormat;
pub use encoding::EncoderFactory;

Modules§

download
HuggingFace Hub model downloader — mirrors get_model.py.
encoding
Audio encoding — convert raw f32 samples to various audio formats.
ffi
C FFI — bridges KittenTtsOnnx to iOS / Android callers.
model
ONNX model runner — mirrors Python’s KittenTTS_1_Onnx.
npz
Minimal NPZ / NPY loader.
phonemize
Phonemisation using the pure-Rust espeak-ng crate.
preprocess
Text preprocessing pipeline — mirrors kittentts/preprocess.py.
tokenize
Character-level tokeniser — mirrors Python’s TextCleaner.