Skip to main content

Module quantized_cache

Module quantized_cache 

Source
Expand description

Quantized embedding cache with scalar quantization, product quantization, and asymmetric distance computation.

QuantizedEmbeddingCache stores compressed int8 codes for each vector, supporting two compression schemes:

  • Scalar Quantization (SQ) – maps each fp32 scalar to an int8 value using per-dimension or global min/max ranges.
  • Product Quantization (PQ) – splits the vector into sub-spaces and quantizes each sub-space to a centroid index, enabling very high compression ratios.

Distance is computed asymmetrically: the query is kept in fp32 while the database codes are decompressed on-the-fly, giving better accuracy than comparing compressed codes directly.

Compression ratio and distance accuracy metrics are tracked automatically.

§Pure Rust Policy

No unsafe code, no C/Fortran FFI, no CUDA runtime calls.

Structs§

CacheMetrics
Compression ratio and distance accuracy metrics.
PqCodebook
A single PQ codebook: n_centroids centroids, each of dimension sub_dim.
QuantizedCacheConfig
Configuration for QuantizedEmbeddingCache.
QuantizedEmbeddingCache
Quantized embedding cache with scalar or product quantization and asymmetric distance computation for compressed similarity search.
ScalarDimParams
Per-dimension parameters for scalar quantization.

Enums§

QuantizationScheme
Which compression scheme to use in the cache.