Expand description
Binary Quantization for Vector Embeddings
Compresses fp32 vectors to binary (1 bit per dimension) for ultra-fast approximate nearest neighbor search using Hamming distance.
§Compression Ratio
- fp32: 4 bytes per dimension
- binary: 1 bit per dimension = 32x compression
Example: 1024-dim vector
- fp32: 4096 bytes
- binary: 128 bytes
§Algorithm
Simple sign-based quantization:
- positive values → 1
- negative/zero values → 0
For normalized embeddings (e.g., from sentence transformers), this preserves ~95-97% of retrieval quality.
§Usage
ⓘ
// Quantize a vector
let binary = BinaryVector::from_f32(&embedding);
// Compute Hamming distance (number of differing bits)
let distance = binary.hamming_distance(&other);
// For retrieval: lower Hamming distance = more similar§References
- “Embedding Quantization” - HuggingFace Blog
- Binary embedding with Matryoshka representation learning
Structs§
- Binary
Index - Index of binary vectors for fast batch search
- Binary
Search Result - Result from binary search with rescoring capability
- Binary
Vector - Binary quantized vector stored as packed u64 words
Functions§
- hamming_
distance_ simd - Compute Hamming distance between two packed binary vectors using SIMD