pub struct SmoothQuantMigrator {
pub config: SmoothQuantConfig,
}Expand description
Applies per-channel scaling to balance quantization difficulty.
The migrator operates on linear layers:
- Activations
X— shape[n_tokens, n_channels] - Weights
W— shape[n_out, n_channels](transposed:Y = X W^T)
Fields§
§config: SmoothQuantConfigMigration configuration.
Implementations§
Source§impl SmoothQuantMigrator
impl SmoothQuantMigrator
Sourcepub fn compute_migration_scales(
&self,
act_max: &[f32],
weight_max: &[f32],
) -> QuantResult<Vec<f32>>
pub fn compute_migration_scales( &self, act_max: &[f32], weight_max: &[f32], ) -> QuantResult<Vec<f32>>
Compute per-channel migration scales from pre-aggregated statistics.
§Parameters
act_max— per-channel max absolute value of activations (lengthn_ch).weight_max— per-channel (column) max absolute value of weights (lengthn_ch).
§Returns
Scale vector s of length n_ch where
s[j] = act_max[j]^alpha / weight_max[j]^(1−alpha).
§Errors
QuantError::DimensionMismatch—act_maxandweight_maxdiffer in length.QuantError::EmptyInput— either slice is empty.
Sourcepub fn compute_act_stats(
acts: &[f32],
n_tokens: usize,
n_channels: usize,
) -> QuantResult<Vec<f32>>
pub fn compute_act_stats( acts: &[f32], n_tokens: usize, n_channels: usize, ) -> QuantResult<Vec<f32>>
Compute per-channel max absolute values from an activation tensor.
§Parameters
acts— row-major activation matrix[n_tokens, n_channels].n_tokens— number of tokens (rows).n_channels— hidden dimension (columns).
§Errors
QuantError::DimensionMismatch— slice length ≠n_tokens × n_channels.QuantError::EmptyInput— either dimension is 0.
Sourcepub fn compute_weight_stats(
weights: &[f32],
n_out: usize,
n_channels: usize,
) -> QuantResult<Vec<f32>>
pub fn compute_weight_stats( weights: &[f32], n_out: usize, n_channels: usize, ) -> QuantResult<Vec<f32>>
Compute per-column (input-channel) max absolute values from a weight matrix.
§Parameters
weights— row-major weight matrix[n_out, n_channels].n_out— number of output features (rows).n_channels— number of input features (columns).
§Errors
QuantError::DimensionMismatch— slice length ≠n_out × n_channels.QuantError::EmptyInput— either dimension is 0.
Sourcepub fn smooth_activations(
acts: &mut [f32],
scales: &[f32],
n_tokens: usize,
n_channels: usize,
) -> QuantResult<()>
pub fn smooth_activations( acts: &mut [f32], scales: &[f32], n_tokens: usize, n_channels: usize, ) -> QuantResult<()>
Divide each activation channel j by scales[j] in-place.
§Errors
QuantError::DimensionMismatch— inconsistent lengths.
Sourcepub fn smooth_weights(
weights: &mut [f32],
scales: &[f32],
n_out: usize,
n_channels: usize,
) -> QuantResult<()>
pub fn smooth_weights( weights: &mut [f32], scales: &[f32], n_out: usize, n_channels: usize, ) -> QuantResult<()>
Multiply each weight column j (input channel) by scales[j] in-place.
Weights are assumed to have shape [n_out, n_channels].
§Errors
QuantError::DimensionMismatch— inconsistent lengths.
Sourcepub fn smooth_layer(
&self,
acts: &mut [f32],
weights: &mut [f32],
n_tokens: usize,
n_channels: usize,
n_out: usize,
) -> QuantResult<Vec<f32>>
pub fn smooth_layer( &self, acts: &mut [f32], weights: &mut [f32], n_tokens: usize, n_channels: usize, n_out: usize, ) -> QuantResult<Vec<f32>>
Smooth a complete linear layer: compute scales, apply to activations and weights.
§Parameters
acts— mutable activation matrix[n_tokens, n_channels].weights— mutable weight matrix[n_out, n_channels].n_tokens— token (batch) dimension.n_channels— input feature dimension.n_out— output feature dimension.
§Returns
The per-channel migration scales used (length n_channels).
§Errors
Propagates all dimension and empty-input errors from sub-operations.
Trait Implementations§
Source§impl Clone for SmoothQuantMigrator
impl Clone for SmoothQuantMigrator
Source§fn clone(&self) -> SmoothQuantMigrator
fn clone(&self) -> SmoothQuantMigrator
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more