Skip to main content

Module cpu_dequant

Module cpu_dequant 

Source
Expand description

Linear<CpuBackend> impl for GPTQ weights, dequantized at load time.

Phase 3e/2: replaces the old BackendQuantMarlin::gemm_gptq impl on CpuBackend. The kernel call (Self::gemm on dequantized weights) lives inside CpuGptqLinear::forward instead of the trait method body.

Structsยง

CpuGptqLinear
CPU GPTQ Linear: holds dequantized fp32 weights [out_features, in_features] row-major, optional bias [out_features], dispatches via CpuBackend::gemm.