Struct GgufFile

Source

pub struct GgufFile { /* private fields */ }

Expand description

A parsed GGUF file, ready for lazy tensor loading.

The file is kept open so that tensor data can be read on demand via load_tensor and load_tensor_f32.

Implementations§

Source §

impl GgufFile

Source

pub fn open(path: &Path) -> Result<Self>

Open and parse a GGUF v3 file.

This reads the full header (magic, version, tensor count, metadata KV pairs, tensor info entries) but does not read any tensor data. Tensor data is loaded lazily via load_tensor or load_tensor_f32.

§Errors

Returns MlxError::IoError if the file cannot be opened. Returns MlxError::GgufParseError if the file is not valid GGUF v3.

Source

pub fn metadata(&self, key: &str) -> Option<&MetadataValue>

Look up a metadata value by key.

Source

pub fn metadata_string(&self, key: &str) -> Option<&str>

Look up a metadata string value by key.

Source

pub fn metadata_u32(&self, key: &str) -> Option<u32>

Look up a metadata u32 value by key.

Source

pub fn metadata_f32(&self, key: &str) -> Option<f32>

Look up a metadata f32 value by key.

Source

pub fn tensor_names(&self) -> Vec<&str>

Return the names of all tensors in the file.

Source

pub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>

Look up info for a specific tensor by name.

Source

pub fn tensor_count(&self) -> usize

Number of tensors in the file.

Source

pub fn metadata_count(&self) -> usize

Number of metadata key-value pairs.

Source

pub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>

Load a tensor as a raw buffer on the Metal device.

For quantized types (Q4_0, Q8_0, Q4_K, Q6_K) the buffer contains raw GGML blocks with dtype U8 — these are consumed directly by quantized_matmul_ggml kernels.

For F32 and F16 tensors the buffer has the corresponding typed dtype.

§Errors

Returns an error if the tensor name is not found, or if reading fails.

Source

pub fn load_tensor_f32( &self, name: &str, device: &MlxDevice, ) -> Result<MlxBuffer>

Load a tensor, dequantizing to F32 on the CPU, then upload to the Metal device.

This is used for norm weights, embedding tables, and other tensors where the inference kernels operate on F32 directly.

§Errors

Returns an error if the tensor name is not found, reading fails, or dequantization encounters malformed data.

Source

pub fn load_tensor_into_pool( &self, name: &str, device: &MlxDevice, pool: &mut MlxBufferPool, ) -> Result<MlxBuffer>

Load a tensor and register its underlying Metal buffer with pool’s residency set, returning the MlxBuffer to the caller.

This is functionally equivalent to:

let buf = gguf.load_tensor(name, device)?;
pool.register_existing(device, &buf)?;

but exists as a single call so callers don’t need to reach for the underlying MlxBufferPool::register_existing API directly. See that method’s docs for the residency-set ownership contract.

§Why a separate method instead of a `pool` parameter on `load_tensor`

load_tensor has stable callers across the codebase that pass only &MlxDevice; making the pool registration optional via a new method keeps the existing signature wire-compatible.

§Note on bucket-rounding

The buffer is allocated at exactly info.byte_len via MlxDevice::alloc_buffer (no bucket-rounding) and added to the pool’s residency set only — it is not placed in the recycling free list. This is the path hf2q’s static weight loader uses to gain MTLResidencySet hints without paying the 48% bucket-rounding tax that would have inflated 17 GB of weights to 25 GB.