Expand description
Model quantization for reduced precision inference
Quantization reduces model size and improves inference speed by converting floating-point weights and activations to lower precision formats.
Structs§
- Quantization
Config - Quantization configuration
- Quantization
Config Builder - Builder for quantization configuration
- Quantization
Params - Quantization parameters
- Quantization
Result - Result of model quantization
Enums§
- Quantization
Mode - Quantization mode
- Quantization
Type - Quantization type
Functions§
- calibrate_
quantization - Calibrates quantization parameters using a dataset
- dequantize_
tensor - Dequantizes a tensor using the provided parameters
- quantize_
model - Quantizes an ONNX model
- quantize_
tensor - Quantizes a tensor using the provided parameters