Expand description
Advanced quantization support for model compression and acceleration.
This module provides comprehensive quantization capabilities including:
- Multiple quantization schemes (INT8, INT4, FP8, binary)
- Quantization-Aware Training (QAT) support
- Post-Training Quantization (PTQ) with calibration
- Per-channel and per-tensor quantization
- Symmetric and asymmetric quantization
- Dynamic and static quantization modes
- Quantization simulation for accuracy validation
Structs§
- Calibration
Stats - Statistics collected during calibration.
- Fake
Quantize - Fake quantization for QAT (simulates quantization during training).
- NodeId
- Node identifier (0-based index into graph.nodes).
- Quantization
Config - Quantization configuration for a graph or model.
- Quantization
Params - Quantization parameters for a tensor.
- Quantization
Summary - Summary of quantization results.
- Quantizer
- Quantizer for converting graphs to quantized representations.
Enums§
- Calibration
Strategy - Calibration strategy for post-training quantization.
- Quantization
Error - Quantization-related errors.
- Quantization
Granularity - Quantization granularity.
- Quantization
Mode - Quantization mode.
- Quantization
Symmetry - Quantization symmetry mode.
- Quantization
Type - Quantization data types.