pub struct QuantizationConfig {
pub bits: u8,
pub group_size: usize,
pub per_tensor: HashMap<String, TensorQuantConfig>,
}Expand description
Per-tensor quantization configuration from quantization_config.json.
This mirrors the JSON structure produced by hf2q’s --quant auto mode,
where each tensor may have a different bit-width and group size.
Fields§
§bits: u8Default bit-width applied when a tensor has no per-tensor override.
group_size: usizeDefault group size applied when a tensor has no per-tensor override.
per_tensor: HashMap<String, TensorQuantConfig>Per-tensor overrides keyed by tensor name pattern. Each entry maps a tensor name (or glob pattern) to its quant config.
Implementations§
Source§impl QuantizationConfig
impl QuantizationConfig
Sourcepub fn from_file(path: &Path) -> Result<Self>
pub fn from_file(path: &Path) -> Result<Self>
Load and parse a quantization_config.json file from disk.
§Errors
Returns MlxError::IoError if the file cannot be read, or
MlxError::QuantConfigError if the JSON is malformed.
Sourcepub fn from_json(json: &str) -> Result<Self>
pub fn from_json(json: &str) -> Result<Self>
Parse a QuantizationConfig from a JSON string.
§Errors
Returns MlxError::QuantConfigError if the JSON is malformed.
Sourcepub fn from_model_config_json(json: &str) -> Result<Self>
pub fn from_model_config_json(json: &str) -> Result<Self>
Parse per-tensor quantization overrides from the "quantization" section
of an MLX model’s config.json.
In this format, the quantization section contains flat keys for tensor
names alongside the default bits and group_size:
{
"quantization": {
"bits": 4,
"group_size": 64,
"model.layers.0.mlp.down_proj": {"bits": 8, "group_size": 64}
}
}This parses the entire "quantization" object, extracting bits and
group_size as defaults, and any nested objects as per-tensor overrides.
Sourcepub fn from_model_config_file(path: &Path) -> Result<Self>
pub fn from_model_config_file(path: &Path) -> Result<Self>
Parse per-tensor overrides from a config.json file on disk.
Sourcepub fn config_for_tensor(&self, tensor_name: &str) -> (u8, usize)
pub fn config_for_tensor(&self, tensor_name: &str) -> (u8, usize)
Look up the quantization parameters for a specific tensor name.
Matching strategy (in order):
- Exact match in
per_tensor. - Strip
.weight/.scales/.biasessuffix, then exact match. - Strip
language_model.prefix (with or without suffix), then match. - Add
language_model.prefix (with or without suffix), then match.
If no override matches, returns the default bits and group_size.
Trait Implementations§
Source§impl Clone for QuantizationConfig
impl Clone for QuantizationConfig
Source§fn clone(&self) -> QuantizationConfig
fn clone(&self) -> QuantizationConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more