rlx-voxtral 0.2.5

Mistral Voxtral speech LM for RLX (Whisper encoder + Llama decoder)

Coverage
19.47%
22 out of 113 items documented0 out of 80 items with examples
Size
Source code size: 140.55 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 2.08 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 25s Average build duration of successful builds.
all releases: 27s Average build duration of successful builds in releases after 2024-10-23.
Links
MIT-RLX/rlx-models
3 0 0
crates.io
Dependencies
Versions
- 0.2.5 (2026-06-10)
- 0.2.4 (2026-06-09)
Owners

Mistral Voxtral — Whisper-style audio encoder + 4× projector + Llama text decoder.

Weights: HuggingFace safetensors (mistralai/Voxtral-Mini-3B-2507) with audio_tower.*, multi_modal_projector.*, and language_model.* tensors.

Audio and text embeddings are fused additively at [audio_token_id] placeholders before the Llama trunk runs (see [embed::fuse_inputs_embeds]).