Expand description
§Product Quantization (PQ) for Vector Compression
Product Quantization compresses high-dimensional vectors by:
- Dividing each vector into M subspaces (segments)
- Training K centroids per subspace (codebook)
- Representing each segment by its nearest centroid ID (1 byte for K=256)
§Compression Ratio
For 128-dim float vectors (512 bytes) with M=8 subspaces:
- Original: 512 bytes
- Compressed: 8 bytes (one centroid ID per subspace)
- Compression: 64x
§Usage
ⓘ
use diskann_rs::pq::{ProductQuantizer, PQConfig};
// Train a quantizer on sample vectors
let vectors: Vec<Vec<f32>> = load_your_training_data();
let config = PQConfig::default(); // 8 subspaces, 256 centroids each
let pq = ProductQuantizer::train(&vectors, config).unwrap();
// Encode vectors (each becomes M bytes)
let codes: Vec<Vec<u8>> = vectors.iter().map(|v| pq.encode(v)).collect();
// Compute asymmetric distance (query vs quantized database vector)
let query = vec![0.0f32; 128];
let dist = pq.asymmetric_distance(&query, &codes[0]);§Asymmetric Distance Computation (ADC)
For search, we compute exact query-to-centroid distances once, then use a lookup table for fast distance approximation.
Structs§
- PQConfig
- Configuration for Product Quantization
- PQStats
- Statistics about a ProductQuantizer
- Product
Quantizer - Trained Product Quantizer