Module quantize

Module quantize 

Source
Expand description

Model quantization utilities

Provides INT8 quantization for model weights and activations to reduce memory usage and improve inference speed.

Structs§

DynamicQuantizer
Dynamic quantization - quantize at runtime
PerChannelQuant
Per-channel quantization for conv/linear layers
QuantParams
Quantization parameters
QuantizedTensor
Quantized tensor representation

Functions§

dequantize
Dequantize i8 to f32
dequantize_value
Dequantize single value
quantization_error
Calculate quantization error (MSE)
quantize_value
Quantize single value
quantize_weights
Quantize f32 weights to i8
quantize_with_params
Quantize with given parameters
sqnr
Calculate signal-to-quantization-noise ratio (SQNR) in dB