Expand description
Quantized embedding cache with scalar quantization, product quantization, and asymmetric distance computation.
QuantizedEmbeddingCache stores compressed int8 codes for each vector,
supporting two compression schemes:
- Scalar Quantization (SQ) – maps each fp32 scalar to an int8 value using per-dimension or global min/max ranges.
- Product Quantization (PQ) – splits the vector into sub-spaces and quantizes each sub-space to a centroid index, enabling very high compression ratios.
Distance is computed asymmetrically: the query is kept in fp32 while the database codes are decompressed on-the-fly, giving better accuracy than comparing compressed codes directly.
Compression ratio and distance accuracy metrics are tracked automatically.
§Pure Rust Policy
No unsafe code, no C/Fortran FFI, no CUDA runtime calls.
Structs§
- Cache
Metrics - Compression ratio and distance accuracy metrics.
- PqCodebook
- A single PQ codebook:
n_centroidscentroids, each of dimensionsub_dim. - Quantized
Cache Config - Configuration for
QuantizedEmbeddingCache. - Quantized
Embedding Cache - Quantized embedding cache with scalar or product quantization and asymmetric distance computation for compressed similarity search.
- Scalar
DimParams - Per-dimension parameters for scalar quantization.
Enums§
- Quantization
Scheme - Which compression scheme to use in the cache.