Skip to main content

Module quantize

Module quantize 

Source
Expand description

INT8 Quantization with π-Based Calibration

Implements efficient INT8 quantization for CNN inference using π-derived constants to avoid quantization boundary resonance artifacts.

§Why π?

In low-precision quantization, values tend to collapse into repeating buckets when scale factors align with powers of two. Using π-derived constants breaks this symmetry:

  • π is irrational (non-repeating, infinite structure)
  • Avoids power-of-2 boundary alignment
  • Provides deterministic anti-resonance offsets

§Quantization Schemes

  • Symmetric: For weights (zero-centered distributions)
  • Asymmetric: For activations (ReLU outputs are non-negative)
  • Per-channel: Different scale per output channel (higher accuracy)
  • Per-tensor: Single scale for entire tensor (faster)

§Performance

INT8 inference provides:

  • 4x memory reduction vs FP32
  • 2-3x speedup on AVX2/AVX-512 (VNNI)
  • 2-4x speedup on ARM NEON (SDOT)

Modules§

pi_constants
π-based scale factors to avoid power-of-2 resonance

Structs§

PerChannelQuantParams
Per-channel quantization parameters
QuantParams
Quantization parameters for a tensor or channel
QuantizedTensor
Quantized INT8 tensor storage

Enums§

QuantizationType
Quantization type

Functions§

dequantize_batch
Batch dequantize i8 to f32
dequantize_batch_avx2
AVX2 batch dequantization
dequantize_simd
SIMD-dispatched dequantization
quantize_batch
Batch quantize f32 to i8 using π-calibration
quantize_batch_avx2
AVX2 batch quantization (8 values at a time)
quantize_simd
SIMD-dispatched quantization