Module linear

Expand description

GgufLinear: a GGUF-sourced linear projection that integrates with ferrum’s Linear trait.

Phase 1B uses an eager-dequant-at-load strategy: when constructed from a candle QTensor, the quantized payload is decoded to fp32 once on CPU, then handed to DenseLinear so the runtime path goes through the standard B::gemm kernel. This is the simplest correct path that works uniformly across CPU / Metal / CUDA without per-backend bridging code.

Trade-off: we lose GGUF’s memory advantage (Q4_K_M @ 4.5 bits/weight becomes fp32 @ 32 bits/weight in RAM) and we don’t get fused dequant-matmul perf. Phase 1D will replace this with a real quantization-aware Linear that holds the QTensor and dispatches to Metal / CUDA Q4_K_M kernels.

Why a dedicated GgufLinear type instead of just returning DenseLinear? So Phase 1D can swap the internals (eager dequant → lazy QMatMul) without churning the public API of any WeightLoader that already returns Box<dyn Linear>.

Structs§

GgufLinear: Linear projection backed by a GGUF-sourced quantized tensor.

Functions§

linear_from_qtensor: Convenience: build a boxed Linear from a QTensor. Useful for WeightLoader impls that want a uniform Box<dyn Linear> output.

Module linear

Module linear Copy item path

Structs§

Functions§

Module linear