Expand description
Scalar Quantization (SQ8) and Binary Quantization for memory-efficient vector storage.
This module implements quantization strategies to reduce memory usage:
§Benefits
| Metric | f32 | SQ8 | Binary |
|---|---|---|---|
| RAM/vector (768d) | 3 KB | 770 bytes | 96 bytes |
| Cache efficiency | Baseline | ~4x better | ~32x better |
| Recall loss | 0% | ~0.5-1% | ~5-10% |
Structs§
- Binary
Quantized Vector - A binary quantized vector using 1-bit per dimension.
- PQCodebook
- Per-subspace centroid tables learned with k-means.
- PQVector
- Compressed representation of a vector: one centroid id per subspace.
- Product
Quantizer - Product quantizer model and helpers for train/encode/decode.
- Quantized
Vector - A quantized vector using 8-bit scalar quantization.
Enums§
- Storage
Mode - Storage mode for vectors.
Functions§
- cosine_
similarity_ quantized - Computes approximate cosine similarity between a query (f32) and quantized vector.
- cosine_
similarity_ quantized_ simd - SIMD-optimized cosine similarity between f32 query and SQ8 vector.
- distance_
pq - Asymmetric distance computation (ADC): query is f32, candidate is PQ-coded.
- dot_
product_ quantized - Computes the approximate dot product between a query vector (f32) and a quantized vector.
- dot_
product_ quantized_ simd - SIMD-optimized dot product between f32 query and SQ8 quantized vector.
- euclidean_
squared_ quantized - Computes the approximate squared Euclidean distance between a query (f32) and quantized vector.
- euclidean_
squared_ quantized_ simd - SIMD-optimized squared Euclidean distance between f32 query and SQ8 vector.