Expand description
Model quantization support for performance optimization.
This module provides support for quantized ONNX models:
- INT8 quantization for CPU inference
- FP16 quantization for GPU inference
- Dynamic quantization
- Quantization configuration and validation
Quantized models can significantly reduce:
- Model size (2-4x smaller)
- Memory usage (2-4x less)
- Inference latency (1.5-3x faster)
With minimal accuracy loss (typically <1%).
Structs§
- Model
Quantizer - Model quantizer (placeholder for actual quantization logic)
- Quantization
Benefits - Estimated benefits of quantization
- Quantization
Config - Configuration for model quantization
- Quantized
Model Info - Information about a quantized model
Enums§
- Quantization
Method - Quantization method
- Quantization
Precision - Quantization precision level