Expand description
Embedding Compression: Quantization and Product Quantization for efficient storage and approximate nearest-neighbor search.
Provides:
QuantizedEmbeddingโ 8-bit scalar quantization via min-maxEmbeddingQuantizerโ batch quantization for 4-bit and 8-bitProductQuantizerโ product quantization with per-subspace k-means
Structsยง
- Embedding
Quantizer - Batch quantizer supporting 4-bit and 8-bit precision.
- Product
Quantizer - Product quantizer: divides the embedding into subspaces and learns a codebook per subspace for efficient approximate nearest-neighbor search.
- Quantized
Embedding - A single embedding quantized to 8-bit precision via min-max scaling.