pub struct GgufFile { /* private fields */ }Expand description
A parsed GGUF file, ready for lazy tensor loading.
The file is kept open so that tensor data can be read on demand via
load_tensor and
load_tensor_f32.
Implementations§
Source§impl GgufFile
impl GgufFile
Sourcepub fn open(path: &Path) -> Result<Self>
pub fn open(path: &Path) -> Result<Self>
Open and parse a GGUF v3 file.
This reads the full header (magic, version, tensor count, metadata KV
pairs, tensor info entries) but does not read any tensor data.
Tensor data is loaded lazily via load_tensor or
load_tensor_f32.
§Errors
Returns MlxError::IoError if the file cannot be opened.
Returns MlxError::GgufParseError if the file is not valid GGUF v3.
Sourcepub fn metadata(&self, key: &str) -> Option<&MetadataValue>
pub fn metadata(&self, key: &str) -> Option<&MetadataValue>
Look up a metadata value by key.
Sourcepub fn metadata_string(&self, key: &str) -> Option<&str>
pub fn metadata_string(&self, key: &str) -> Option<&str>
Look up a metadata string value by key.
Sourcepub fn metadata_u32(&self, key: &str) -> Option<u32>
pub fn metadata_u32(&self, key: &str) -> Option<u32>
Look up a metadata u32 value by key.
Sourcepub fn metadata_f32(&self, key: &str) -> Option<f32>
pub fn metadata_f32(&self, key: &str) -> Option<f32>
Look up a metadata f32 value by key.
Sourcepub fn tensor_names(&self) -> Vec<&str>
pub fn tensor_names(&self) -> Vec<&str>
Return the names of all tensors in the file.
Sourcepub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>
pub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>
Look up info for a specific tensor by name.
Sourcepub fn tensor_count(&self) -> usize
pub fn tensor_count(&self) -> usize
Number of tensors in the file.
Sourcepub fn metadata_count(&self) -> usize
pub fn metadata_count(&self) -> usize
Number of metadata key-value pairs.
Sourcepub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>
pub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>
Load a tensor as a raw buffer on the Metal device.
For quantized types (Q4_0, Q8_0, Q4_K, Q6_K) the buffer contains raw
GGML blocks with dtype U8 — these are consumed directly by
quantized_matmul_ggml kernels.
For F32 and F16 tensors the buffer has the corresponding typed dtype.
§Errors
Returns an error if the tensor name is not found, or if reading fails.
Sourcepub fn load_tensor_f32(
&self,
name: &str,
device: &MlxDevice,
) -> Result<MlxBuffer>
pub fn load_tensor_f32( &self, name: &str, device: &MlxDevice, ) -> Result<MlxBuffer>
Load a tensor, dequantizing to F32 on the CPU, then upload to the Metal device.
This is used for norm weights, embedding tables, and other tensors where the inference kernels operate on F32 directly.
§Errors
Returns an error if the tensor name is not found, reading fails, or dequantization encounters malformed data.