Vector Quantizer
Simple vector quantization utilities and functions.
cargo add vector_quantizer
Example usage:
use Result;
use Array2;
use RandomExt;
use PQ;
use StandardNormal;
// Helper function to calculate compression ratio
// Helper function to calculate Mean Squared Error
See a more detailed example here: /src/bin/example.rs
Performance Benchmarks
The PQ implementation was tested on datasets ranging from 1,000 to 1,000,000 vectors (128 dimensions each), using 16 subspaces and 256 centroids per subspace. Key findings:
- Memory Efficiency: Consistently achieves 96.88% memory reduction across all dataset sizes
- Processing Speed:
- Fitting: Scales linearly, processing 100k vectors in ~3.7s (1M vectors in ~38s)
- Compression: Very efficient, handling ~278k vectors per second (1M vectors in 3.57s)
- Quality Metrics:
- Reconstruction Error: Remains low (0.013-0.021) across all dataset sizes
- Recall@10: Ranges from 0.40 (small datasets) to 0.18 (large datasets)
The benchmark was tested on a 2022 MacBook Pro, M2 Pro, 16GB RAM. Run your own tests by running:
Acknowledgements
The code in this repository is mostly adapted from https://github.com/xinyandai/product-quantization, a great Python lib for vector quantization.
The original code and the one written in this repository is derived from "Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search" by Dai, Xinyan and Yan, Xiao and Ng, Kelvin KW and Liu, Jie and Cheng, James: https://arxiv.org/abs/1911.04654