Expand description
Cache dequantized GGUF weight bytes for static params.
Qwen3.5 decode with --packed was re-dequantizing every K-quant
weight on every matmul (hundreds of times per token). Keys are
(k, n, scheme, bytes_hash) — stable for identical GGUF bytes regardless
of arena offset (multiple compiled graphs reuse offsets).
Functions§
- clear_
dequant_ cache - Drop cached dequantized weights (e.g. between model loads in tests).
- gguf_
weight_ f32 - Return dense
[k×n]weights (GGUF row-major[n,k]layout) forw_bytes.