Expand description
QuantLinear<B> — thin wrapper that delegates to the boxed
Linear<B> returned by B::load_quant / B::load_quant_fused.
Phase 3e/3: backend-specific kernel dispatch (Metal Q4_K/Q6_K
mul_mm, CPU dequant + gemm) lives inside the boxed Linear’s
forward() body, not in a Backend trait method. The historical
QuantLinear<B> constructors (from_gguf_bytes, from_gguf_fused)
stay so callers don’t have to change shape — they just route through
the new factory.
Structs§
- Quant
Linear - Linear projection backed by a GGUF k-quant weight.