Expand description
Linear<CpuBackend> impl for GPTQ weights, dequantized at load time.
Phase 3e/2: replaces the old BackendQuantMarlin::gemm_gptq impl on
CpuBackend. The kernel call (Self::gemm on dequantized weights)
lives inside CpuGptqLinear::forward instead of the trait method
body.
Structsยง
- CpuGptq
Linear - CPU GPTQ Linear: holds dequantized fp32 weights
[out_features, in_features]row-major, optional bias[out_features], dispatches viaCpuBackend::gemm.