Skip to main content

Module sq8

Module sq8 

Source
Expand description

Scalar Quantization (SQ8): FP32 → INT8 per-dimension.

Each dimension is independently quantized to [0, 255] using per-dimension min/max calibration. This is the default production quantization for HNSW traversal: 4x RAM reduction with <1% recall loss.

Distance computation uses asymmetric mode: query stays in FP32, candidates are in INT8. This avoids quantizing the query and preserves accuracy at the cost of a dequantize-per-dimension during distance computation.

Storage: D bytes per vector (vs 4D bytes for FP32).

Structs§

Sq8Codec
SQ8 calibration parameters: per-dimension min/max.