Expand description
Linear<CpuBackend> impl for GGUF k-quant weights.
Phase 3e/3: replaces the old BackendQuantGguf::gemm_quant impl on
CpuBackend. The kernel call (Q4_K dequant + Self::gemm) lives
inside CpuGgufLinear::forward instead of the trait method body.
Structsยง
- CpuGguf
Linear - CPU GGUF Linear: holds a
CpuQuantStore(currently Q4_K-dequantised weights) plus shape, dispatches viaCpuBackend::gemm.