pub fn load_quantized_weights(
model_dir: &Path,
device: &MlxDevice,
) -> Result<Vec<QuantizedWeight>>Expand description
Load quantized weights from a directory containing safetensors file(s) and
a quantization_config.json.
This is the primary entry point for weight loading. It:
- Reads
quantization_config.jsonfrom the directory to determine per-tensor bit-widths and group sizes. - Discovers all
*.safetensorsfiles in the directory. - Memory-maps each file and loads tensors that look like quantized weight components (packed data, scales, biases) into Metal buffers.
- Groups the components by base tensor name and constructs
QuantizedWeightinstances.
§Tensor Naming Convention
Quantized weights in safetensors use a naming convention:
<base_name>.weight— packed quantized data<base_name>.scales— per-group scale factors<base_name>.biases— per-group biases (optional, for affine quant)
§Arguments
model_dir— Path to the directory containing safetensors files and config.device— The Metal device for buffer allocation.
§Errors
MlxError::IoErrorif the directory or files cannot be read.MlxError::QuantConfigErrorif the quantization config is invalid.MlxError::SafetensorsErrorif a safetensors file is malformed.