Expand description
§neutts
Rust port of NeuTTS — an on-device voice-cloning TTS system built on a GGUF LLM backbone and the NeuCodec neural audio codec (pure-Rust CPU inference, no ONNX Runtime).
§Architecture
text ──► espeak-ng ──► IPA ──► GGUF backbone ──► speech tokens ──► NeuCodec decoder ──► audio
+ ref codes ──►- GGUF backbone (
llama-cpp-4) — a small causal LM that generates speech token IDs. - NeuCodec decoder — pure-Rust FSQ+Vocos+ISTFT decoder; 24 kHz output.
§One-time setup
pip install torch huggingface_hub safetensors
python scripts/convert_weights.py # download + extract decoder weights
cargo build # codec weights loaded at runtime§Quick start
ⓘ
use neutts::{NeuTTS, download};
use std::path::Path;
let tts = download::load_from_hub("neuphonic/neutts-nano-q4-gguf").unwrap();
let ref_codes = tts.load_ref_codes(Path::new("samples/jo.npy")).unwrap();
let audio = tts.infer("Hello from Rust!", &ref_codes, "Reference transcript.").unwrap();
tts.write_wav(&audio, Path::new("output.wav")).unwrap();§Features
| Feature | Default | Effect |
|---|---|---|
backbone | ✓ | GGUF backbone via llama-cpp-4 (requires cmake + C++) |
espeak | Raw-text input via pure-Rust espeak-ng (114 bundled languages, no system deps) | |
wgpu | GPU-accelerated codec via Burn wgpu; falls back to Burn NdArray then ndarray | |
metal | macOS Metal GPU for the backbone (passed to llama-cpp-4) | |
cuda | NVIDIA CUDA for the backbone (passed to llama-cpp-4) | |
fast | ✓ | RoPE: degree-7/6 Horner polynomial, no transcendental calls (~1e-4 error) |
precise | RoPE: stdlib f32::sin_cos(), correctly rounded; mutually exclusive w/ fast |
Re-exports§
pub use model::NeuTTS;pub use cache::RefCodeCache;pub use cache::CacheOutcome;pub use codec::NeuCodecEncoder;pub use codec::NeuCodecDecoder;pub use codec::SAMPLE_RATE;pub use codec::ENCODER_SAMPLE_RATE;pub use codec::SAMPLES_PER_TOKEN;pub use codec::ENCODER_SAMPLES_PER_TOKEN;
Modules§
- backbone
- GGUF backbone — runs the NeuTTS LLM that generates speech token IDs.
- cache
- Reference-code cache — avoids re-encoding the same WAV file twice.
- codec
- NeuCodec decoder — pure-Rust CPU inference from safetensors weights.
- download
- HuggingFace Hub model downloader.
- ffi
- C FFI — bridges [
NeuTTS] to iOS / Android callers. - model
- NeuTTS model — ties the GGUF backbone and NeuCodec Burn decoder together.
- npy
- Minimal NPY / NPZ reader — supports
float32andint32dtypes. - phonemize
- Phonemisation using the pure-Rust
espeak-ngcrate. - preprocess
- Text preprocessing pipeline.
- tokens
- Speech token helpers.