Expand description
CPU backend using Accelerate (macOS) / portable fallback (Linux). Context = () — all ops execute immediately, no batching needed.
Structs§
- CpuBackend
- CpuGptq
Store - CPU-side GPTQ store — dequantized f32 weights in row-major [n, k] layout. Trades memory for simplicity: repack once at load, then run normal GEMM.
Enums§
- CpuQuant
Store - CPU-side container for any GGUF k-quant flavour. Each variant holds
the dense fp32 weights post-eager-dequant — CPU isn’t the bench
target so we don’t pay the complexity of on-the-fly dequant here;
the variant tag exists so
gemm_quantcan route consistently.