Module quantize

Expand description

INT8 Quantization with π-Based Calibration

Implements efficient INT8 quantization for CNN inference using π-derived constants to avoid quantization boundary resonance artifacts.

§Why π?

In low-precision quantization, values tend to collapse into repeating buckets when scale factors align with powers of two. Using π-derived constants breaks this symmetry:

π is irrational (non-repeating, infinite structure)
Avoids power-of-2 boundary alignment
Provides deterministic anti-resonance offsets

§Quantization Schemes

Symmetric: For weights (zero-centered distributions)
Asymmetric: For activations (ReLU outputs are non-negative)
Per-channel: Different scale per output channel (higher accuracy)
Per-tensor: Single scale for entire tensor (faster)

§Performance

INT8 inference provides:

4x memory reduction vs FP32
2-3x speedup on AVX2/AVX-512 (VNNI)
2-4x speedup on ARM NEON (SDOT)

Modules§

pi_constants: π-based scale factors to avoid power-of-2 resonance

Structs§

PerChannelQuantParams: Per-channel quantization parameters
QuantParams: Quantization parameters for a tensor or channel
QuantizedTensor: Quantized INT8 tensor storage

Enums§

QuantizationType: Quantization type

Functions§

dequantize_batch: Batch dequantize i8 to f32
dequantize_batch_avx2^⚠: AVX2 batch dequantization
dequantize_simd: SIMD-dispatched dequantization
quantize_batch: Batch quantize f32 to i8 using π-calibration
quantize_batch_avx2^⚠: AVX2 batch quantization (8 values at a time)
quantize_simd: SIMD-dispatched quantization

Module quantize

Module quantize Copy item path

§Why π?

§Quantization Schemes

§Performance

Modules§

Structs§

Enums§

Functions§

Module quantize