Skip to main content

Crate rlx_voxtral_tts

Crate rlx_voxtral_tts 

Source
Expand description

Voxtral-4B-TTS on RLX — Ministral LM + acoustic flow matching + codec decode.

Native Rust port of vLLM-Omni VoxtralTTSAudioGeneration (no Python at inference).

Re-exports§

pub use backbone::CompiledMinistralLm;
pub use backbone::MinistralLm;
pub use backbone::NativeTtsEngine;
pub use bench::VoxtralTtsBenchReport;
pub use codec::CodecDecoder;
pub use config::HF_MODEL_ID;
pub use config::VoxtralTtsConfig;
pub use generation::GenerationConfig;
pub use load::VoxtralTtsWeightStore;
pub use lora::load_lora_bank;
pub use options::VoxtralTtsOptions;
pub use options::VoxtralTtsRunnerBuilder;
pub use prompt_tokens::load_prompt_tokens;
pub use runner::VoxtralTtsRunner;
pub use runner::parse_codes_file;
pub use runner::write_wav_mono;
pub use tokens::PRESET_VOICES;
pub use voice::VoiceEmbedding;
pub use voice_clone::VoiceCloneSupport;
pub use voice_clone::clone_from_reference_audio;
pub use voice_clone::encode_reference_wav;
pub use voice_clone::encode_reference_wav_to_file;
pub use voice_clone::max_reference_seconds;
pub use voice_clone::voice_clone_support;

Modules§

acoustic
Flow-matching acoustic transformer (vLLM-Omni FlowMatchingAudioTransformer).
acoustic_compiled
Compiled acoustic velocity stack (3-token FM sequence, bidirectional attention).
acoustic_engine
Acoustic head backend — eager CPU reference or RLX-compiled stack.
acoustic_flow
Compiled acoustic velocity stack (3-token bidirectional transformer, no attention RoPE).
backbone
bench
Stage timing for native TTS (LM prefill / decode, acoustic, codec).
cli
codec
config
Voxtral-4B-TTS config (params.json / HF layout).
decode_shard_layer
Decode layer for wgpu LM shards — global checkpoint keys, local past_k_* inputs.
generation
Generation options (parity with vLLM-Omni sampling).
lm_flow
Compiled Ministral graphs (inputs_embeds prefill/decode, no LM head).
load
Mmap-backed weight access for consolidated.safetensors.
lora
LoRA adapters on Ministral attention + FFN projections (inference merge + eager apply).
math
Small ndarray helpers for eager CPU inference.
options
Runner options — device and eager fallbacks.
prompt_tokens
Load prompt token ids exported by the Docker tools image.
rng
Reproducible Gaussian noise for flow-matching (seeded PCG).
runner
End-to-end TTS runner — native Rust only.
speech_tokenizer
Native Tekken speech prompt tokenization (replaces Docker mistral_common).
tokens
Audio + text special tokens (vLLM-Omni AudioSpecialTokens).
voice
Preset voice embeddings (voice_embedding/*.pt or converted .f32).
voice_clone
Reference-audio voice cloning via trained codec encoder weights.
voice_pt
Convert HuggingFace voice_embedding/*.pt (bf16 zip) to native .f32.
weights
Map Voxtral-4B-TTS checkpoint keys → Llama builder keys.