Skip to main content

Module sensitivity

Module sensitivity 

Source
Expand description

§Quantization Sensitivity Analysis

Measures how sensitive each layer is to quantization at different bit-widths. More sensitive layers should be assigned higher bit-widths in a mixed-precision quantization scheme.

§Sensitivity metric

For each layer and each candidate bit-width, we quantize the weights with a MinMax symmetric scheme and compute the mean squared error between the original and dequantized weights:

sensitivity(layer, bits) = MSE(W, dequant(quant(W, bits)))

Structs§

LayerSensitivity
Sensitivity scores for one layer across multiple bit-widths.
SensitivityAnalyzer
Analyses per-layer quantization sensitivity.