Expand description
Scalar Quantization (SQ8) and Binary Quantization for memory-efficient vector storage.
This module implements quantization strategies to reduce memory usage:
§Benefits
| Metric | f32 | SQ8 | Binary |
|---|---|---|---|
| RAM/vector (768d) | 3 KB | 770 bytes | 96 bytes |
| Cache efficiency | Baseline | ~4x better | ~32x better |
| Recall loss | 0% | ~0.5-1% | ~5-10% |
Structs§
- Binary
Quantized Vector - A binary quantized vector using 1-bit per dimension.
- PQCodebook
- Per-subspace centroid tables learned with k-means.
- PQVector
- Compressed representation of a vector: one centroid id per subspace.
- Product
Quantizer - Product quantizer model and helpers for train/encode/decode.
- Quantized
Vector - A quantized vector using 8-bit scalar quantization.
- RaBitQ
Correction - Scalar correction factors for a
RaBitQ-encoded vector. - RaBitQ
Index RaBitQindex holding the random rotation matrix and dataset centroid.- RaBitQ
Vector - Binary-quantized vector with scalar correction factors.
Enums§
- Storage
Mode - Storage mode for vectors.
Traits§
- Quantization
Codec - Trait for serializing and deserializing quantized vectors to/from bytes.
Functions§
- cosine_
similarity_ quantized - Computes approximate cosine similarity between a query (f32) and quantized vector.
- cosine_
similarity_ quantized_ simd - SIMD-optimized cosine similarity between f32 query and SQ8 vector.
- dot_
product_ quantized - Computes the approximate dot product between a query vector (f32) and a quantized vector.
- dot_
product_ quantized_ simd - Dot product between f32 query and SQ8 quantized vector with 8-wide unrolling.
- euclidean_
squared_ quantized - Computes the approximate squared Euclidean distance between a query (f32) and quantized vector.
- euclidean_
squared_ quantized_ simd - SIMD-optimized squared Euclidean distance between f32 query and SQ8 vector.
- train_
opq - Train a PQ codebook with optional PCA pre-rotation.