rlx-models-core 0.2.1

Shared config, weight loading, and compile helpers for RLX model crates
Documentation

rlx-models-core

Shared config, weight loading, compile profiles, and packed GGUF prefill helpers for RLX model crates (published on crates.io as rlx-models-core; import as rlx_core).

Version 0.2.1 adds packed GGUF support:

API Role
packed_gguf_compile_guard Metal RLX_DISABLE_MPSGRAPH, MLX RLX_MLX_MODE=eager during compile
compile_options_for_packed_gguf_prefill_with_profile Fusion off on wgpu/CUDA/ROCm for FusedResidualRmsNorm gaps
packed_gguf_execution_device Route MLX/wgpu/CUDA packed prefill to CPU when needed

Used by rlx-llama32, rlx-qwen3, rlx-gemma, and rlx-minicpm5.

See also