Skip to main content

Module loader

Module loader 

Source
Expand description

GgufLoader<B>: implements WeightLoader<B> against a GGUF file.

Bridges the model layer (which addresses weights by ferrum’s HuggingFace- style names) to the on-disk GGUF format (llama.cpp’s blk.{i}.attn_q.weight shorthand). Three responsibilities:

  1. Name translation — delegates to gguf::names::ferrum_to_gguf
  2. Tensor materialisation — uses Phase 1A’s GgufFile::read_tensor then dequant on CPU into B::Buffer for load_tensor, or wraps the QTensor in GgufLinear<B> for load_linear.
  3. Fusion — reproduces the qkv_proj / gate_up_proj shims the model expects: q/k/v split tensors are concatenated row-wise into a single fused weight before the Linear is built.

All paths go through eager dequant-to-fp32 (Phase 1B’s strategy). Phase 1D will add a quant-aware shortcut so Q4_K_M weights can stay quantised in backend memory; the public WeightLoader<B> API stays the same.

Structs§

GgufLoader
Backend-generic weight loader for GGUF files.