Module loader

Expand description

GgufLoader<B>: implements WeightLoader<B> against a GGUF file.

Bridges the model layer (which addresses weights by ferrum’s HuggingFace- style names) to the on-disk GGUF format (llama.cpp’s blk.{i}.attn_q.weight shorthand). Three responsibilities:

Name translation — delegates to gguf::names::ferrum_to_gguf
Tensor materialisation — uses Phase 1A’s GgufFile::read_tensor then dequant on CPU into B::Buffer for load_tensor, or wraps the QTensor in GgufLinear<B> for load_linear.
Fusion — reproduces the qkv_proj / gate_up_proj shims the model expects: q/k/v split tensors are concatenated row-wise into a single fused weight before the Linear is built.

All paths go through eager dequant-to-fp32 (Phase 1B’s strategy). Phase 1D will add a quant-aware shortcut so Q4_K_M weights can stay quantised in backend memory; the public WeightLoader<B> API stays the same.

Structs§

GgufLoader: Backend-generic weight loader for GGUF files.

Module loader

Module loader Copy item path

Structs§

Module loader