Skip to main content

Crate turboquant

Crate turboquant 

Source
Expand description

TurboQuant – KV-Cache quantization with zero accuracy loss.

Implements Google’s TurboQuant algorithm (Zandieh et al., ICLR 2026) for compressing LLM key-value caches to 3-4 bits per value.

Re-exports§

pub use attention::PackedImport;
pub use attention::QuantizedKVCache;
pub use error::Result;
pub use error::TurboQuantError;
pub use packed::PackedBlock;
pub use packed::TurboQuantConfig;
pub use qjl::compute_qjl_signs;
pub use qjl::estimate_inner_product;
pub use qjl::estimate_inner_product_single;
pub use qjl::estimate_inner_product_with_codebook;
pub use qjl::precompute_query_projections;
pub use qjl::quantize_with_qjl;
pub use qjl::quantize_with_qjl_resources;
pub use qjl::EstimationContext;
pub use qjl::QjlBatchResources;
pub use qjl::QjlBlock;
pub use quantize::dequantize_into_with_codebook;
pub use quantize::dequantize_vec;
pub use quantize::dequantize_vec_with_codebook;
pub use quantize::quantize_vec;
pub use quantize::quantize_vec_with_codebook;
pub use rotation::RotationOrder;

Modules§

attention
High-level quantized KV-cache attention API.
codebook
Codebook lookup, Beta PDF, and pre-computed tables for Lloyd-Max quantization.
error
math
Mathematical helper functions used across TurboQuant modules.
packed
Packed data structures for quantized blocks.
qjl
QJL (Quantized Johnson-Lindenstrauss) bias correction.
quantize
Core quantize / dequantize pipeline.
rotation
Walsh-Hadamard Transform and random rotation.