rlx-llama32
LLaMA 3.2–shaped causal LMs in RLX (runner, CLI, GGUF packed prefill).
Version 0.2.1 — Metal KV-decode compile guard (RLX_DISABLE_MPSGRAPH on decode); packed GGUF helpers re-exported via rlx_core::flow_bridge (used by rlx-minicpm5, rlx-qwen3, rlx-gemma).
CLI
Packed GGUF
When building a packed prefill graph (Op::DequantMatMul), use the shared helpers from rlx_core:
compile_options_for_packed_gguf_prefill(device)— Llama 3.2 prefill profilepacked_gguf_compile_guard(device, || compile…)— Metal / MLX env overridespacked_gguf_execution_device(device)— CPU fallback for MLX/wgpu/CUDA until upstream GPU parity
See README.md gotchas and crates/rlx-minicpm5/README.md.