Skip to main content

Module quantization

Module quantization 

Source
Expand description

Model quantization for reduced precision inference

Quantization reduces model size and improves inference speed by converting floating-point weights and activations to lower precision formats.

Structs§

QuantizationConfig
Quantization configuration
QuantizationConfigBuilder
Builder for quantization configuration
QuantizationParams
Quantization parameters
QuantizationResult
Result of model quantization

Enums§

QuantizationMode
Quantization mode
QuantizationType
Quantization type

Functions§

calibrate_quantization
Calibrates quantization parameters using a dataset
dequantize_tensor
Dequantizes a tensor using the provided parameters
quantize_model
Quantizes an ONNX model
quantize_tensor
Quantizes a tensor using the provided parameters