Module gguf

Expand description

GGUF (GGML Universal Format) reader.

Phase 1A scope: parse the file header, expose metadata + tensor descriptors, and load individual tensors as candle QTensor (which already handles dequant for every K-quant variant on CPU / Metal / CUDA).

Phase 1A added the GgufFile reader. Phase 1B added GgufLinear. Phase 1C (this commit) adds GgufLoader — implements WeightLoader against ferrum’s HuggingFace-style tensor names by translating to GGUF’s blk.{i}.attn_q.weight shorthand and handling qkv_proj / gate_up_proj fusion on the fly.

§Why wrap candle instead of writing a parser from scratch

candle_core::quantized::gguf_file::Content already implements the full GGUF v1/v2/v3 spec, including all current GGML K-quant variants and Metal/CUDA/CPU dequant kernels. Re-implementing that for ferrum would be 3-5 weeks of work duplicating well-tested code. Instead this module provides a small adapter that:

Adds an mmap-backed open(path) constructor (candle’s API takes a generic Read + Seek and pushes file handling to the caller).
Provides typed metadata accessors keyed by string (metadata_string, metadata_u32, …) so callers don’t pattern-match on Value everywhere.
Documents the GGUF metadata key conventions ferrum relies on (general.architecture, <arch>.block_count, …) in one place.

Re-exports§

pub use file::GgufFile;
pub use linear::linear_from_qtensor;
pub use linear::GgufLinear;
pub use loader::GgufLoader;
pub use names::ferrum_to_gguf;
pub use names::gate_up_split_parts;
pub use names::qkv_split_parts;

Modules§

file: GgufFile: mmap-backed reader for a single GGUF file.
linear: GgufLinear: a GGUF-sourced linear projection that integrates with ferrum’s Linear trait.
loader: GgufLoader: implements WeightLoader against a GGUF file.
names: GGUF ↔ ferrum tensor-name translation.

Structs§

QTensor
TensorInfo

Enums§

GgmlDType
Value
ValueType

Module gguf

Module gguf Copy item path

§Why wrap candle instead of writing a parser from scratch

Re-exports§

Modules§

Structs§

Enums§

Module gguf