rlx-models-core
Shared config, weight loading, compile profiles, and packed GGUF prefill helpers for RLX model crates (published on crates.io as rlx-models-core; import as rlx_core).
Version 0.2.1 adds packed GGUF support:
| API | Role |
|---|---|
packed_gguf_compile_guard |
Metal RLX_DISABLE_MPSGRAPH, MLX RLX_MLX_MODE=eager during compile |
compile_options_for_packed_gguf_prefill_with_profile |
Fusion off on wgpu/CUDA/ROCm for FusedResidualRmsNorm gaps |
packed_gguf_execution_device |
Route MLX/wgpu/CUDA packed prefill to CPU when needed |
Used by rlx-llama32, rlx-qwen3, rlx-gemma, and rlx-minicpm5.