Skip to main content

Module gguf

Module gguf 

Source
Expand description

GGUF (GGML Universal Format) reader.

Phase 1A scope: parse the file header, expose metadata + tensor descriptors, and load individual tensors as candle QTensor (which already handles dequant for every K-quant variant on CPU / Metal / CUDA).

Phase 1A added the GgufFile reader. Phase 1B added GgufLinear<B>. Phase 1C (this commit) adds GgufLoader<B> — implements WeightLoader<B> against ferrum’s HuggingFace-style tensor names by translating to GGUF’s blk.{i}.attn_q.weight shorthand and handling qkv_proj / gate_up_proj fusion on the fly.

§Why wrap candle instead of writing a parser from scratch

candle_core::quantized::gguf_file::Content already implements the full GGUF v1/v2/v3 spec, including all current GGML K-quant variants and Metal/CUDA/CPU dequant kernels. Re-implementing that for ferrum would be 3-5 weeks of work duplicating well-tested code. Instead this module provides a small adapter that:

  1. Adds an mmap-backed open(path) constructor (candle’s API takes a generic Read + Seek and pushes file handling to the caller).
  2. Provides typed metadata accessors keyed by string (metadata_string, metadata_u32, …) so callers don’t pattern-match on Value everywhere.
  3. Documents the GGUF metadata key conventions ferrum relies on (general.architecture, <arch>.block_count, …) in one place.

Re-exports§

pub use file::GgufFile;
pub use linear::linear_from_qtensor;
pub use linear::GgufLinear;
pub use loader::GgufLoader;
pub use names::ferrum_to_gguf;
pub use names::gate_up_split_parts;
pub use names::qkv_split_parts;

Modules§

file
GgufFile: mmap-backed reader for a single GGUF file.
linear
GgufLinear<B>: a GGUF-sourced linear projection that integrates with ferrum’s Linear<B> trait.
loader
GgufLoader<B>: implements WeightLoader<B> against a GGUF file.
names
GGUF ↔ ferrum tensor-name translation.

Structs§

QTensor
TensorInfo

Enums§

GgmlDType
Value
ValueType