Skip to main content

Module embed_compression

Module embed_compression 

Source
Expand description

Embedding Compression: Quantization and Product Quantization for efficient storage and approximate nearest-neighbor search.

Provides:

  • QuantizedEmbedding โ€” 8-bit scalar quantization via min-max
  • EmbeddingQuantizer โ€” batch quantization for 4-bit and 8-bit
  • ProductQuantizer โ€” product quantization with per-subspace k-means

Structsยง

EmbeddingQuantizer
Batch quantizer supporting 4-bit and 8-bit precision.
ProductQuantizer
Product quantizer: divides the embedding into subspaces and learns a codebook per subspace for efficient approximate nearest-neighbor search.
QuantizedEmbedding
A single embedding quantized to 8-bit precision via min-max scaling.