Module sensitivity

Expand description

§Quantization Sensitivity Analysis

Measures how sensitive each layer is to quantization at different bit-widths. More sensitive layers should be assigned higher bit-widths in a mixed-precision quantization scheme.

§Sensitivity metric

For each layer and each candidate bit-width, we quantize the weights with a MinMax symmetric scheme and compute the mean squared error between the original and dequantized weights:

sensitivity(layer, bits) = MSE(W, dequant(quant(W, bits)))

Structs§

LayerSensitivity: Sensitivity scores for one layer across multiple bit-widths.
SensitivityAnalyzer: Analyses per-layer quantization sensitivity.

Module sensitivity

Module sensitivity Copy item path

§Quantization Sensitivity Analysis

§Sensitivity metric

Structs§

Module sensitivity