Expand description
GgufLoader<B>: implements WeightLoader<B> against a GGUF file.
Bridges the model layer (which addresses weights by ferrum’s HuggingFace-
style names) to the on-disk GGUF format (llama.cpp’s blk.{i}.attn_q.weight
shorthand). Three responsibilities:
- Name translation — delegates to
gguf::names::ferrum_to_gguf - Tensor materialisation — uses Phase 1A’s
GgufFile::read_tensorthen dequant on CPU intoB::Bufferforload_tensor, or wraps the QTensor inGgufLinear<B>forload_linear. - Fusion — reproduces the
qkv_proj/gate_up_projshims the model expects: q/k/v split tensors are concatenated row-wise into a single fused weight before the Linear is built.
All paths go through eager dequant-to-fp32 (Phase 1B’s strategy).
Phase 1D will add a quant-aware shortcut so Q4_K_M weights can stay
quantised in backend memory; the public WeightLoader<B> API stays
the same.
Structs§
- Gguf
Loader - Backend-generic weight loader for GGUF files.