Skip to main content

QuantizedWeight

Struct QuantizedWeight 

Source
pub struct QuantizedWeight { /* private fields */ }
Expand description

A quantized weight tensor loaded into Metal GPU buffers.

Tracks the tensor name, logical shape, original dtype, quantization parameters, and the Metal buffers holding the packed data, scales, and optional biases.

§Layout

  • packed_data — Packed quantized integers (e.g. 4-bit values packed 8-per-uint32, or 6-bit values packed 4-per-uint32).
  • scales — Per-group scale factors as f16 values.
  • biases — Per-group biases as f16 values (present for affine quant).

Implementations§

Source§

impl QuantizedWeight

Source

pub fn new( tensor_name: String, shape: Vec<usize>, dtype: DType, bits: u8, group_size: usize, scales: MlxBuffer, biases: Option<MlxBuffer>, packed_data: MlxBuffer, ) -> Self

Construct a new QuantizedWeight with all fields specified.

This is the primary constructor used by load_quantized_weights. It does not validate buffer sizes — the caller is responsible for ensuring the buffers match the declared shape, bits, and group_size.

Source

pub fn tensor_name(&self) -> &str

Full tensor name path.

Source

pub fn shape(&self) -> &[usize]

Logical tensor shape (dimensions before quantization).

Source

pub fn dtype(&self) -> DType

Original element dtype before quantization.

Source

pub fn bits(&self) -> u8

Quantization bit-width.

Source

pub fn group_size(&self) -> usize

Quantization group size.

Source

pub fn scales(&self) -> &MlxBuffer

Borrow the per-group scales buffer.

Source

pub fn biases(&self) -> Option<&MlxBuffer>

Borrow the per-group biases buffer, if present.

Source

pub fn packed_data(&self) -> &MlxBuffer

Borrow the packed quantized data buffer.

Source

pub fn element_count(&self) -> usize

Number of logical elements in the weight tensor (product of shape dims).

Source

pub fn num_groups(&self) -> usize

Number of quantization groups along the last dimension.

This is ceil(last_dim / group_size).

Trait Implementations§

Source§

impl Debug for QuantizedWeight

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.