Skip to main content

load_quantized_weights

mlx_native::weight

Function load_quantized_weights

pub fn load_quantized_weights(
    model_dir: &Path,
    device: &MlxDevice,
) -> Result<Vec<QuantizedWeight>>

Expand description

Load quantized weights from a directory containing safetensors file(s) and a quantization_config.json.

This is the primary entry point for weight loading. It:

Reads quantization_config.json from the directory to determine per-tensor bit-widths and group sizes.
Discovers all *.safetensors files in the directory.
Memory-maps each file and loads tensors that look like quantized weight components (packed data, scales, biases) into Metal buffers.
Groups the components by base tensor name and constructs QuantizedWeight instances.

§Tensor Naming Convention

Quantized weights in safetensors use a naming convention:

<base_name>.weight — packed quantized data
<base_name>.scales — per-group scale factors
<base_name>.biases — per-group biases (optional, for affine quant)

§Arguments

model_dir — Path to the directory containing safetensors files and config.
device — The Metal device for buffer allocation.

§Errors

MlxError::IoError if the directory or files cannot be read.
MlxError::QuantConfigError if the quantization config is invalid.
MlxError::SafetensorsError if a safetensors file is malformed.