Expand description
§Candle Qwen3-TTS
A Rust implementation of Qwen3-TTS text-to-speech model for the Candle ML framework.
This crate provides:
- High-level model API (
model::Model) - Speaker encoder (ECAPA-TDNN based)
- Audio tokenizer (12Hz)
§Architecture Overview
Qwen3-TTS uses a hierarchical generation approach:
- Text is processed through a talker model with multimodal RoPE
- The first codebook (semantic) is predicted by the main talker
- Remaining codebooks (acoustic) are predicted by a sub-talker (code predictor)
- All 32 codes are decoded to audio via the tokenizer
§Example
use qwen_tts::model::loader::{ModelLoader, LoaderConfig};
use candle_core::Device;
let loader = ModelLoader::from_local_dir("/path/to/model")?;
let model = loader.load_tts_model(&Device::Cpu, &LoaderConfig::default())?;
let result = model.generate_custom_voice_from_text(
"Hello, world!",
"vivian",
"english",
None,
None,
)?;Modules§
- audio
- Audio processing components.
- config
- io
- model
- High-level TTS model wrapper.
- nn
- Model implementations for Qwen3-TTS.
- synthesis
- text
- Text processing components.
Macros§
- increment_
counter - Macro for incrementing call count. Compiles to no-op when timing feature is disabled.
- record_
elapsed - Macro for recording elapsed time. Compiles to no-op when timing feature is disabled.
- timed
- Macro for timing a code block. Compiles to no-op when timing feature is disabled.