Skip to main content

Crate neutts

Crate neutts 

Source
Expand description

§neutts

Rust port of NeuTTS — an on-device voice-cloning TTS system built on a GGUF LLM backbone and the NeuCodec neural audio codec (pure-Rust CPU inference, no ONNX Runtime).

§Architecture

 text  ──► espeak-ng ──► IPA ──► GGUF backbone ──► speech tokens ──► NeuCodec decoder ──► audio
                              +  ref codes ──►
  1. GGUF backbone (llama-cpp-4) — a small causal LM that generates speech token IDs.
  2. NeuCodec decoder — pure-Rust FSQ+Vocos+ISTFT decoder; 24 kHz output.

§One-time setup

pip install torch huggingface_hub safetensors
python scripts/convert_weights.py     # download + extract decoder weights
cargo build                           # codec weights loaded at runtime

§Quick start

use neutts::{NeuTTS, download};
use std::path::Path;

let tts = download::load_from_hub("neuphonic/neutts-nano-q4-gguf").unwrap();
let ref_codes = tts.load_ref_codes(Path::new("samples/jo.npy")).unwrap();
let audio = tts.infer("Hello from Rust!", &ref_codes, "Reference transcript.").unwrap();
tts.write_wav(&audio, Path::new("output.wav")).unwrap();

§Features

FeatureDefaultEffect
backboneGGUF backbone via llama-cpp-4 (requires cmake + C++)
espeakRaw-text input via pure-Rust espeak-ng (114 bundled languages, no system deps)
wgpuGPU-accelerated codec via Burn wgpu; falls back to Burn NdArray then ndarray
metalmacOS Metal GPU for the backbone (passed to llama-cpp-4)
cudaNVIDIA CUDA for the backbone (passed to llama-cpp-4)
fastRoPE: degree-7/6 Horner polynomial, no transcendental calls (~1e-4 error)
preciseRoPE: stdlib f32::sin_cos(), correctly rounded; mutually exclusive w/ fast

Re-exports§

pub use model::NeuTTS;
pub use cache::RefCodeCache;
pub use cache::CacheOutcome;
pub use codec::NeuCodecEncoder;
pub use codec::NeuCodecDecoder;
pub use codec::SAMPLE_RATE;
pub use codec::ENCODER_SAMPLE_RATE;
pub use codec::SAMPLES_PER_TOKEN;
pub use codec::ENCODER_SAMPLES_PER_TOKEN;

Modules§

backbone
GGUF backbone — runs the NeuTTS LLM that generates speech token IDs.
cache
Reference-code cache — avoids re-encoding the same WAV file twice.
codec
NeuCodec decoder — pure-Rust CPU inference from safetensors weights.
download
HuggingFace Hub model downloader.
ffi
C FFI — bridges [NeuTTS] to iOS / Android callers.
model
NeuTTS model — ties the GGUF backbone and NeuCodec Burn decoder together.
npy
Minimal NPY / NPZ reader — supports float32 and int32 dtypes.
phonemize
Phonemisation using the pure-Rust espeak-ng crate.
preprocess
Text preprocessing pipeline.
tokens
Speech token helpers.