Skip to main content

Module dequant_cache

Module dequant_cache 

Source
Expand description

Cache dequantized GGUF weight bytes for static params.

Qwen3.5 decode with --packed was re-dequantizing every K-quant weight on every matmul (hundreds of times per token). Keys are (k, n, scheme, bytes_hash) — stable for identical GGUF bytes regardless of arena offset (multiple compiled graphs reuse offsets).

Functions§

clear_dequant_cache
Drop cached dequantized weights (e.g. between model loads in tests).
gguf_weight_f32
Return dense [k×n] weights (GGUF row-major [n,k] layout) for w_bytes.