Expand description
§Quantization Sensitivity Analysis
Measures how sensitive each layer is to quantization at different bit-widths. More sensitive layers should be assigned higher bit-widths in a mixed-precision quantization scheme.
§Sensitivity metric
For each layer and each candidate bit-width, we quantize the weights with a MinMax symmetric scheme and compute the mean squared error between the original and dequantized weights:
sensitivity(layer, bits) = MSE(W, dequant(quant(W, bits)))Structs§
- Layer
Sensitivity - Sensitivity scores for one layer across multiple bit-widths.
- Sensitivity
Analyzer - Analyses per-layer quantization sensitivity.