pub struct GgufFile { /* private fields */ }Expand description
A parsed GGUF file, ready for lazy tensor loading.
The file is kept open so that tensor data can be read on demand via
load_tensor and
load_tensor_f32.
Implementations§
Source§impl GgufFile
impl GgufFile
Sourcepub fn open(path: &Path) -> Result<Self>
pub fn open(path: &Path) -> Result<Self>
Open and parse a GGUF v3 file.
This reads the full header (magic, version, tensor count, metadata KV
pairs, tensor info entries) but does not read any tensor data.
Tensor data is loaded lazily via load_tensor or
load_tensor_f32.
§Errors
Returns MlxError::IoError if the file cannot be opened.
Returns MlxError::GgufParseError if the file is not valid GGUF v3.
Sourcepub fn metadata(&self, key: &str) -> Option<&MetadataValue>
pub fn metadata(&self, key: &str) -> Option<&MetadataValue>
Look up a metadata value by key.
Sourcepub fn metadata_string(&self, key: &str) -> Option<&str>
pub fn metadata_string(&self, key: &str) -> Option<&str>
Look up a metadata string value by key.
Sourcepub fn metadata_u32(&self, key: &str) -> Option<u32>
pub fn metadata_u32(&self, key: &str) -> Option<u32>
Look up a metadata u32 value by key.
Sourcepub fn metadata_f32(&self, key: &str) -> Option<f32>
pub fn metadata_f32(&self, key: &str) -> Option<f32>
Look up a metadata f32 value by key.
Sourcepub fn tensor_names(&self) -> Vec<&str>
pub fn tensor_names(&self) -> Vec<&str>
Return the names of all tensors in the file.
Sourcepub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>
pub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>
Look up info for a specific tensor by name.
Sourcepub fn tensor_count(&self) -> usize
pub fn tensor_count(&self) -> usize
Number of tensors in the file.
Sourcepub fn metadata_count(&self) -> usize
pub fn metadata_count(&self) -> usize
Number of metadata key-value pairs.
Sourcepub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>
pub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>
Load a tensor as a raw buffer on the Metal device.
For quantized types (Q4_0, Q8_0, Q4_K, Q6_K) the buffer contains raw
GGML blocks with dtype U8 — these are consumed directly by
quantized_matmul_ggml kernels.
For F32 and F16 tensors the buffer has the corresponding typed dtype.
§Errors
Returns an error if the tensor name is not found, or if reading fails.
Sourcepub fn load_tensor_f32(
&self,
name: &str,
device: &MlxDevice,
) -> Result<MlxBuffer>
pub fn load_tensor_f32( &self, name: &str, device: &MlxDevice, ) -> Result<MlxBuffer>
Load a tensor, dequantizing to F32 on the CPU, then upload to the Metal device.
This is used for norm weights, embedding tables, and other tensors where the inference kernels operate on F32 directly.
§Errors
Returns an error if the tensor name is not found, reading fails, or dequantization encounters malformed data.
Sourcepub fn load_tensor_into_pool(
&self,
name: &str,
device: &MlxDevice,
pool: &mut MlxBufferPool,
) -> Result<MlxBuffer>
pub fn load_tensor_into_pool( &self, name: &str, device: &MlxDevice, pool: &mut MlxBufferPool, ) -> Result<MlxBuffer>
Load a tensor and register its underlying Metal buffer with pool’s
residency set, returning the MlxBuffer to the caller.
This is functionally equivalent to:
let buf = gguf.load_tensor(name, device)?;
pool.register_existing(device, &buf)?;but exists as a single call so callers don’t need to reach for the
underlying MlxBufferPool::register_existing API directly. See
that method’s docs for the residency-set ownership contract.
§Why a separate method instead of a pool parameter on load_tensor
load_tensor has stable callers across the codebase that pass only
&MlxDevice; making the pool registration optional via a new method
keeps the existing signature wire-compatible.
§Note on bucket-rounding
The buffer is allocated at exactly info.byte_len via
MlxDevice::alloc_buffer (no
bucket-rounding) and added to the pool’s residency set only —
it is not placed in the recycling free list. This is the path
hf2q’s static weight loader uses to gain MTLResidencySet hints
without paying the 48% bucket-rounding tax that would have
inflated 17 GB of weights to 25 GB.
§Errors
Same as load_tensor, plus any
MlxError::InvalidArgument from
MlxBufferPool::register_existing.