Expand description
INT8 Quantization with π-Based Calibration
Implements efficient INT8 quantization for CNN inference using π-derived constants to avoid quantization boundary resonance artifacts.
§Why π?
In low-precision quantization, values tend to collapse into repeating buckets when scale factors align with powers of two. Using π-derived constants breaks this symmetry:
- π is irrational (non-repeating, infinite structure)
- Avoids power-of-2 boundary alignment
- Provides deterministic anti-resonance offsets
§Quantization Schemes
- Symmetric: For weights (zero-centered distributions)
- Asymmetric: For activations (ReLU outputs are non-negative)
- Per-channel: Different scale per output channel (higher accuracy)
- Per-tensor: Single scale for entire tensor (faster)
§Performance
INT8 inference provides:
- 4x memory reduction vs FP32
- 2-3x speedup on AVX2/AVX-512 (VNNI)
- 2-4x speedup on ARM NEON (SDOT)
Modules§
- pi_
constants - π-based scale factors to avoid power-of-2 resonance
Structs§
- PerChannel
Quant Params - Per-channel quantization parameters
- Quant
Params - Quantization parameters for a tensor or channel
- Quantized
Tensor - Quantized INT8 tensor storage
Enums§
- Quantization
Type - Quantization type
Functions§
- dequantize_
batch - Batch dequantize i8 to f32
- dequantize_
batch_ ⚠avx2 - AVX2 batch dequantization
- dequantize_
simd - SIMD-dispatched dequantization
- quantize_
batch - Batch quantize f32 to i8 using π-calibration
- quantize_
batch_ ⚠avx2 - AVX2 batch quantization (8 values at a time)
- quantize_
simd - SIMD-dispatched quantization