Expand description
GGUF (GGML Universal Format) reader.
Phase 1A scope: parse the file header, expose metadata + tensor descriptors,
and load individual tensors as candle QTensor (which already handles
dequant for every K-quant variant on CPU / Metal / CUDA).
Phase 1A added the GgufFile reader. Phase 1B added GgufLinear<B>.
Phase 1C (this commit) adds GgufLoader<B> — implements WeightLoader<B>
against ferrum’s HuggingFace-style tensor names by translating to GGUF’s
blk.{i}.attn_q.weight shorthand and handling qkv_proj / gate_up_proj
fusion on the fly.
§Why wrap candle instead of writing a parser from scratch
candle_core::quantized::gguf_file::Content already implements the full
GGUF v1/v2/v3 spec, including all current GGML K-quant variants and
Metal/CUDA/CPU dequant kernels. Re-implementing that for ferrum would be
3-5 weeks of work duplicating well-tested code. Instead this module
provides a small adapter that:
- Adds an
mmap-backedopen(path)constructor (candle’s API takes a genericRead + Seekand pushes file handling to the caller). - Provides typed metadata accessors keyed by string (
metadata_string,metadata_u32, …) so callers don’t pattern-match onValueeverywhere. - Documents the GGUF metadata key conventions ferrum relies on
(
general.architecture,<arch>.block_count, …) in one place.
Re-exports§
pub use file::GgufFile;pub use linear::linear_from_qtensor;pub use linear::GgufLinear;pub use loader::GgufLoader;pub use names::ferrum_to_gguf;pub use names::gate_up_split_parts;pub use names::qkv_split_parts;
Modules§
- file
GgufFile: mmap-backed reader for a single GGUF file.- linear
GgufLinear<B>: a GGUF-sourced linear projection that integrates with ferrum’sLinear<B>trait.- loader
GgufLoader<B>: implementsWeightLoader<B>against a GGUF file.- names
- GGUF ↔ ferrum tensor-name translation.