mnemonist-quant
TurboQuant vector quantization for mnemonist — near-optimal MSE and inner-product quantizers.
Implements the algorithms from TurboQuant (arXiv:2504.19874):
TurboQuantMse— MSE-optimal quantizer using random rotation + Lloyd-Max codebooksTurboQuantProd— unbiased inner-product quantizer (MSE + QJL residual)CompressedEmbeddingStore— binary storage format for quantized embeddings
Usage
use ;
References
- TurboQuant: Redefining AI Efficiency with Extreme Compression — Google Research blog
- TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate — arXiv:2504.19874
- Optimal Quantization for Matrix Multiplication — arXiv:2502.02617
- Quantization of Large Language Models with an Overdetermined Linear System — arXiv:2406.03482
License
See LICENSE in the repository root.