rlx-voxtral 0.2.4

Mistral Voxtral speech LM for RLX (Whisper encoder + Llama decoder)
Documentation

Mistral Voxtral — Whisper-style audio encoder + 4× projector + Llama text decoder.

Weights: HuggingFace safetensors (mistralai/Voxtral-Mini-3B-2507) with audio_tower.*, multi_modal_projector.*, and language_model.* tensors.

Audio and text embeddings are fused additively at [audio_token_id] placeholders before the Llama trunk runs (see [embed::fuse_inputs_embeds]).