Skip to main content

Crate qwen_tts

Crate qwen_tts

Expand description

§Candle Qwen3-TTS

A Rust implementation of Qwen3-TTS text-to-speech model for the Candle ML framework.

This crate provides:

High-level model API (model::Model)
Speaker encoder (ECAPA-TDNN based)
Audio tokenizer (12Hz)

§Architecture Overview

Qwen3-TTS uses a hierarchical generation approach:

Text is processed through a talker model with multimodal RoPE
The first codebook (semantic) is predicted by the main talker
Remaining codebooks (acoustic) are predicted by a sub-talker (code predictor)
All 32 codes are decoded to audio via the tokenizer

§Example

use qwen_tts::model::loader::{ModelLoader, LoaderConfig};
use candle_core::Device;

let loader = ModelLoader::from_local_dir("/path/to/model")?;
let model = loader.load_tts_model(&Device::Cpu, &LoaderConfig::default())?;
let result = model.generate_custom_voice_from_text(
    "Hello, world!",
    "vivian",
    "english",
    None,
    None,
)?;

Modules§

audio: Audio processing components.
config
io
model: High-level TTS model wrapper.
nn: Model implementations for Qwen3-TTS.
synthesis
text: Text processing components.

Macros§

increment_counter: Macro for incrementing call count. Compiles to no-op when timing feature is disabled.
record_elapsed: Macro for recording elapsed time. Compiles to no-op when timing feature is disabled.
timed: Macro for timing a code block. Compiles to no-op when timing feature is disabled.