Expand description
Model quantization utilities
Provides INT8 quantization for model weights and activations to reduce memory usage and improve inference speed.
Structs§
- Dynamic
Quantizer - Dynamic quantization - quantize at runtime
- PerChannel
Quant - Per-channel quantization for conv/linear layers
- Quant
Params - Quantization parameters
- Quantized
Tensor - Quantized tensor representation
Functions§
- dequantize
- Dequantize i8 to f32
- dequantize_
value - Dequantize single value
- quantization_
error - Calculate quantization error (MSE)
- quantize_
value - Quantize single value
- quantize_
weights - Quantize f32 weights to i8
- quantize_
with_ params - Quantize with given parameters
- sqnr
- Calculate signal-to-quantization-noise ratio (SQNR) in dB