Skip to main content

Module quantize

Module quantize 

Source
Expand description

Scalar Quantization (SQ8): FP32 → INT8 per-dimension.

Each dimension is independently quantized to [0, 255] using per-dimension min/max calibration. Provides 4x RAM reduction with <1% recall loss.

Distance computation uses asymmetric mode: query stays in FP32, candidates are in INT8. This avoids quantizing the query and preserves accuracy.

Storage: D bytes per vector (vs 4D bytes for FP32).

Structs§

Sq8Codec
SQ8 calibration parameters: per-dimension min/max.