Crate neutts

Expand description

§neutts

Rust port of NeuTTS — an on-device voice-cloning TTS system built on a GGUF LLM backbone and the NeuCodec neural audio codec (pure-Rust CPU inference, no ONNX Runtime).

§Architecture

 text  ──► espeak-ng ──► IPA ──► GGUF backbone ──► speech tokens ──► NeuCodec decoder ──► audio
                              +  ref codes ──►

GGUF backbone (llama-cpp-4) — a small causal LM that generates speech token IDs.
NeuCodec decoder — pure-Rust FSQ+Vocos+ISTFT decoder; 24 kHz output.

§One-time setup

pip install torch huggingface_hub safetensors
python scripts/convert_weights.py     # download + extract decoder weights
cargo build                           # codec weights loaded at runtime

§Quick start

use neutts::{NeuTTS, download};
use std::path::Path;

let tts = download::load_from_hub("neuphonic/neutts-nano-q4-gguf").unwrap();
let ref_codes = tts.load_ref_codes(Path::new("samples/jo.npy")).unwrap();
let audio = tts.infer("Hello from Rust!", &ref_codes, "Reference transcript.").unwrap();
tts.write_wav(&audio, Path::new("output.wav")).unwrap();

§Features

Feature	Default	Effect
`backbone`	✓	GGUF backbone via llama-cpp-4 (requires cmake + C++)
`espeak`		Raw-text input via pure-Rust espeak-ng (114 bundled languages, no system deps)
`wgpu`		GPU-accelerated codec via Burn wgpu; falls back to Burn NdArray then ndarray
`metal`		macOS Metal GPU for the backbone (passed to llama-cpp-4)
`cuda`		NVIDIA CUDA for the backbone (passed to llama-cpp-4)
`fast`	✓	RoPE: degree-7/6 Horner polynomial, no transcendental calls (~1e-4 error)
`precise`		RoPE: stdlib `f32::sin_cos()`, correctly rounded; mutually exclusive w/ fast

Re-exports§

pub use model::NeuTTS;
pub use cache::RefCodeCache;
pub use cache::CacheOutcome;
pub use codec::NeuCodecEncoder;
pub use codec::NeuCodecDecoder;
pub use codec::SAMPLE_RATE;
pub use codec::ENCODER_SAMPLE_RATE;
pub use codec::SAMPLES_PER_TOKEN;
pub use codec::ENCODER_SAMPLES_PER_TOKEN;

Modules§

backbone: GGUF backbone — runs the NeuTTS LLM that generates speech token IDs.
cache: Reference-code cache — avoids re-encoding the same WAV file twice.
codec: NeuCodec decoder — pure-Rust CPU inference from safetensors weights.
download: HuggingFace Hub model downloader.
ffi: C FFI — bridges [NeuTTS] to iOS / Android callers.
model: NeuTTS model — ties the GGUF backbone and NeuCodec Burn decoder together.
npy: Minimal NPY / NPZ reader — supports float32 and int32 dtypes.
phonemize: Phonemisation using the pure-Rust espeak-ng crate.
preprocess: Text preprocessing pipeline.
tokens: Speech token helpers.

Crate neutts

Crate neutts Copy item path

§neutts

§Architecture

§One-time setup

§Quick start

§Features

Re-exports§

Modules§

Crate neutts