Skip to main content

Module cpu_gguf

Module cpu_gguf 

Source
Expand description

Linear<CpuBackend> impl for GGUF k-quant weights.

Phase 3e/3: replaces the old BackendQuantGguf::gemm_quant impl on CpuBackend. The kernel call (Q4_K dequant + Self::gemm) lives inside CpuGgufLinear::forward instead of the trait method body.

Structsยง

CpuGgufLinear
CPU GGUF Linear: holds a CpuQuantStore (currently Q4_K-dequantised weights) plus shape, dispatches via CpuBackend::gemm.