Skip to main content

GgufFile

Struct GgufFile 

Source
pub struct GgufFile { /* private fields */ }
Expand description

A parsed GGUF file, ready for lazy tensor loading.

The file is kept open so that tensor data can be read on demand via load_tensor and load_tensor_f32.

Implementations§

Source§

impl GgufFile

Source

pub fn open(path: &Path) -> Result<Self>

Open and parse a GGUF v3 file.

This reads the full header (magic, version, tensor count, metadata KV pairs, tensor info entries) but does not read any tensor data. Tensor data is loaded lazily via load_tensor or load_tensor_f32.

§Errors

Returns MlxError::IoError if the file cannot be opened. Returns MlxError::GgufParseError if the file is not valid GGUF v3.

Source

pub fn metadata(&self, key: &str) -> Option<&MetadataValue>

Look up a metadata value by key.

Source

pub fn metadata_string(&self, key: &str) -> Option<&str>

Look up a metadata string value by key.

Source

pub fn metadata_u32(&self, key: &str) -> Option<u32>

Look up a metadata u32 value by key.

Source

pub fn metadata_f32(&self, key: &str) -> Option<f32>

Look up a metadata f32 value by key.

Source

pub fn tensor_names(&self) -> Vec<&str>

Return the names of all tensors in the file.

Source

pub fn tensor_info(&self, name: &str) -> Option<&TensorInfo>

Look up info for a specific tensor by name.

Source

pub fn tensor_count(&self) -> usize

Number of tensors in the file.

Source

pub fn metadata_count(&self) -> usize

Number of metadata key-value pairs.

Source

pub fn load_tensor(&self, name: &str, device: &MlxDevice) -> Result<MlxBuffer>

Load a tensor as a raw buffer on the Metal device.

For quantized types (Q4_0, Q8_0, Q4_K, Q6_K) the buffer contains raw GGML blocks with dtype U8 — these are consumed directly by quantized_matmul_ggml kernels.

For F32 and F16 tensors the buffer has the corresponding typed dtype.

§Errors

Returns an error if the tensor name is not found, or if reading fails.

Source

pub fn load_tensor_f32( &self, name: &str, device: &MlxDevice, ) -> Result<MlxBuffer>

Load a tensor, dequantizing to F32 on the CPU, then upload to the Metal device.

This is used for norm weights, embedding tables, and other tensors where the inference kernels operate on F32 directly.

§Errors

Returns an error if the tensor name is not found, reading fails, or dequantization encounters malformed data.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.