Skip to main content

QuantizedWeight

mlx_native::weight

Struct QuantizedWeight

pub struct QuantizedWeight { /* private fields */ }

Expand description

A quantized weight tensor loaded into Metal GPU buffers.

Tracks the tensor name, logical shape, original dtype, quantization parameters, and the Metal buffers holding the packed data, scales, and optional biases.

§Layout

packed_data — Packed quantized integers (e.g. 4-bit values packed 8-per-uint32, or 6-bit values packed 4-per-uint32).
scales — Per-group scale factors as f16 values.
biases — Per-group biases as f16 values (present for affine quant).

Implementations§

impl QuantizedWeight

pub fn new( tensor_name: String, shape: Vec<usize>, dtype: DType, bits: u8, group_size: usize, scales: MlxBuffer, biases: Option<MlxBuffer>, packed_data: MlxBuffer, ) -> Self

Construct a new QuantizedWeight with all fields specified.

This is the primary constructor used by load_quantized_weights. It does not validate buffer sizes — the caller is responsible for ensuring the buffers match the declared shape, bits, and group_size.

pub fn tensor_name(&self) -> &str

Full tensor name path.

pub fn shape(&self) -> &[usize]

Logical tensor shape (dimensions before quantization).

pub fn dtype(&self) -> DType

Original element dtype before quantization.

pub fn bits(&self) -> u8

Quantization bit-width.

pub fn group_size(&self) -> usize

Quantization group size.

pub fn scales(&self) -> &MlxBuffer

Borrow the per-group scales buffer.

pub fn biases(&self) -> Option<&MlxBuffer>

Borrow the per-group biases buffer, if present.

pub fn packed_data(&self) -> &MlxBuffer

Borrow the packed quantized data buffer.

pub fn element_count(&self) -> usize

Number of logical elements in the weight tensor (product of shape dims).

pub fn num_groups(&self) -> usize

Number of quantization groups along the last dimension.

This is ceil(last_dim / group_size).

Trait Implementations§

impl Debug for QuantizedWeight

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

impl Freeze for QuantizedWeight

impl RefUnwindSafe for QuantizedWeight

impl Send for QuantizedWeight

impl Sync for QuantizedWeight

impl Unpin for QuantizedWeight

impl UnsafeUnpin for QuantizedWeight

impl UnwindSafe for QuantizedWeight

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.