Module quantization

Module quantization 

Source
Expand description

Model quantization support for performance optimization.

This module provides support for quantized ONNX models:

  • INT8 quantization for CPU inference
  • FP16 quantization for GPU inference
  • Dynamic quantization
  • Quantization configuration and validation

Quantized models can significantly reduce:

  • Model size (2-4x smaller)
  • Memory usage (2-4x less)
  • Inference latency (1.5-3x faster)

With minimal accuracy loss (typically <1%).

Structs§

ModelQuantizer
Model quantizer (placeholder for actual quantization logic)
QuantizationBenefits
Estimated benefits of quantization
QuantizationConfig
Configuration for model quantization
QuantizedModelInfo
Information about a quantized model

Enums§

QuantizationMethod
Quantization method
QuantizationPrecision
Quantization precision level