rlx-models-core
Shared config, weight loading, compile profiles, and packed GGUF prefill helpers for RLX model crates (published on crates.io as rlx-models-core; import as rlx_core).
Version 0.2.1 adds packed GGUF support:
| API | Role |
|---|---|
packed_gguf_compile_guard |
Metal RLX_DISABLE_MPSGRAPH, MLX RLX_MLX_MODE=lazy during compile |
compile_options_for_packed_gguf_prefill_with_profile |
Fusion off on wgpu/CUDA/ROCm for FusedResidualRmsNorm gaps |
packed_gguf_execution_device |
Native CPU/Metal/MLX packed; wgpu/CUDA/ROCm → CPU prefill |
run_packed_prefill |
Active-extent packed prefill execute (actual_seq inside bucket) |
EmbeddedSafetensors |
Parse HF safetensors from include_bytes! / memory; tensor_f32(name) |
tensor_view_to_f32 |
Decode F32/F16/BF16 safetensor views to Vec<f32> |
Used by rlx-llama32, rlx-qwen3, rlx-gemma, rlx-minicpm5, and rlx-vad (embedded Silero weights).
Embedded safetensors
For small models shipped inside the binary:
use EmbeddedSafetensors;
const WEIGHTS: & = include_bytes!;
let st = parse?;
let w = st.tensor_f32?;
Disk-backed sharded checkpoints still use SafetensorsCheckpoint (mmap + index.json).