Module pq

Module pq 

Source
Expand description

§Product Quantization (PQ) for Vector Compression

Product Quantization compresses high-dimensional vectors by:

  1. Dividing each vector into M subspaces (segments)
  2. Training K centroids per subspace (codebook)
  3. Representing each segment by its nearest centroid ID (1 byte for K=256)

§Compression Ratio

For 128-dim float vectors (512 bytes) with M=8 subspaces:

  • Original: 512 bytes
  • Compressed: 8 bytes (one centroid ID per subspace)
  • Compression: 64x

§Usage

use diskann_rs::pq::{ProductQuantizer, PQConfig};

// Train a quantizer on sample vectors
let vectors: Vec<Vec<f32>> = load_your_training_data();
let config = PQConfig::default(); // 8 subspaces, 256 centroids each
let pq = ProductQuantizer::train(&vectors, config).unwrap();

// Encode vectors (each becomes M bytes)
let codes: Vec<Vec<u8>> = vectors.iter().map(|v| pq.encode(v)).collect();

// Compute asymmetric distance (query vs quantized database vector)
let query = vec![0.0f32; 128];
let dist = pq.asymmetric_distance(&query, &codes[0]);

§Asymmetric Distance Computation (ADC)

For search, we compute exact query-to-centroid distances once, then use a lookup table for fast distance approximation.

Structs§

PQConfig
Configuration for Product Quantization
PQStats
Statistics about a ProductQuantizer
ProductQuantizer
Trained Product Quantizer