Expand description
Scalar Quantization (SQ8): FP32 → INT8 per-dimension.
Each dimension is independently quantized to [0, 255] using per-dimension
min/max calibration. This is the default production quantization for
HNSW traversal: 4x RAM reduction with <1% recall loss.
Distance computation uses asymmetric mode: query stays in FP32, candidates are in INT8. This avoids quantizing the query and preserves accuracy at the cost of a dequantize-per-dimension during distance computation.
Storage: D bytes per vector (vs 4D bytes for FP32).
Structs§
- Sq8Codec
- SQ8 calibration parameters: per-dimension min/max.