rlx-models-core

Shared config, weight loading, compile profiles, and packed GGUF prefill helpers for RLX model crates (published on crates.io as rlx-models-core; import as rlx_core).

Version 0.2.1 adds packed GGUF support:

API	Role
`packed_gguf_compile_guard`	Metal `RLX_DISABLE_MPSGRAPH`, MLX `RLX_MLX_MODE=lazy` during compile
`compile_options_for_packed_gguf_prefill_with_profile`	Fusion off on wgpu/CUDA/ROCm for `FusedResidualRmsNorm` gaps
`packed_gguf_execution_device`	Native CPU/Metal/MLX packed; wgpu/CUDA/ROCm → CPU prefill
`run_packed_prefill`	Active-extent packed prefill execute (`actual_seq` inside bucket)
`EmbeddedSafetensors`	Parse HF safetensors from `include_bytes!` / memory; `tensor_f32(name)`
`tensor_view_to_f32`	Decode F32/F16/BF16 safetensor views to `Vec<f32>`

Used by rlx-llama32, rlx-qwen3, rlx-gemma, rlx-minicpm5, and rlx-vad (embedded Silero weights).

Embedded safetensors

For small models shipped inside the binary:

use rlx_core::embedded_safetensors::EmbeddedSafetensors;

const WEIGHTS: &[u8] = include_bytes!("../weights/model.safetensors");

let st = EmbeddedSafetensors::parse(WEIGHTS)?;
let w = st.tensor_f32("layer.weight")?;

Disk-backed sharded checkpoints still use SafetensorsCheckpoint (mmap + index.json).

rlx-models-core 0.2.4

rlx-models-core

Embedded safetensors

See also