Skip to main content

Module quant_linear

Module quant_linear 

Source
Expand description

QuantLinear<B> — thin wrapper that delegates to the boxed Linear<B> returned by B::load_quant / B::load_quant_fused.

Phase 3e/3: backend-specific kernel dispatch (Metal Q4_K/Q6_K mul_mm, CPU dequant + gemm) lives inside the boxed Linear’s forward() body, not in a Backend trait method. The historical QuantLinear<B> constructors (from_gguf_bytes, from_gguf_fused) stay so callers don’t have to change shape — they just route through the new factory.

Structs§

QuantLinear
Linear projection backed by a GGUF k-quant weight.