wavekat-tts 0.0.1

Unified text-to-speech for voice pipelines with multiple backend support
Documentation

Unified text-to-speech for voice pipelines.

Provides a clean abstraction over TTS engines — both local models and cloud APIs — behind common Rust traits. Same pattern as wavekat-vad and wavekat-turn.

All backends produce AudioFrame<'static> from wavekat-core, keeping audio abstract across the WaveKat ecosystem.

Architecture

wavekat-vad   →  "is someone speaking?"
wavekat-turn  →  "are they done speaking?"
wavekat-tts   →  "synthesize the response"
     │                   │                     │
     └───────────────────┴─────────────────────┘
                         │
               AudioFrame (wavekat-core)

Feature flags

Backends

Feature Backend Multilingual Requires
qwen3-tts Qwen3-TTS (ONNX) 10 languages ONNX model download
cosyvoice CosyVoice (ONNX) Yes ONNX model download

Execution providers

Composable with any backend feature. Selects the inference hardware at build time.

Feature Provider Platform
cuda NVIDIA CUDA Linux / Windows
tensorrt NVIDIA TensorRT Linux / Windows
coreml Apple CoreML macOS / iOS

Quick start

[dependencies]
wavekat-tts = { version = "0.0.1", features = ["qwen3-tts"] }
use wavekat_tts::{TtsBackend, SynthesizeRequest};
use wavekat_tts::backends::qwen3_tts::Qwen3Tts;

let tts = Qwen3Tts::new("path/to/model.onnx")?;
let request = SynthesizeRequest::new("我觉得这个方案");
let audio = tts.synthesize(&request)?;
// audio: AudioFrame<'static> at 24kHz