Skip to main content

Crate turbo_quant

Crate turbo_quant 

Source
Expand description

§turbo-quant

Rust implementation of TurboQuant, PolarQuant, and QJL — the vector quantization algorithms from Google Research (ICLR 2026).

These algorithms compress high-dimensional vectors (embeddings, KV cache entries) to 3–8 bits per value with zero accuracy loss and no dataset-specific calibration.

§Key Properties

  • Data-oblivious: no k-means training, no codebook, no calibration set. The rotation is seeded once and works on any distribution.
  • Deterministic: identical (dim, bits, seed) always produces the same quantizer. State can be fully reconstructed from four integers.
  • Zero accuracy loss: at 3+ bits, inner product estimates are provably unbiased and achieve near-optimal distortion (within ~2.7× of Shannon limit).
  • Instant indexing: unlike Product Quantization, there is no offline training phase. Vectors can be indexed as they arrive.

§Quick Start

use turbo_quant::{TurboQuantizer, PolarQuantizer};

// Compress 1536-dimensional embeddings (OpenAI/sentence-transformer size).
let dim = 64; // use 1536 in production
let q = TurboQuantizer::new(dim, 8, 32, /* seed */ 42).unwrap();

let database_vector: Vec<f32> = vec![0.1; dim]; // your embedding here
let query_vector: Vec<f32> = vec![0.1; dim];    // your query here

// Compress the database vector (store this, not the raw f32 array).
let code = q.encode(&database_vector).unwrap();

// At query time: estimate inner product without decompressing.
let score = q.inner_product_estimate(&code, &query_vector).unwrap();

// Or just use PolarQuant for a simpler single-stage compressor.
let pq = PolarQuantizer::new(dim, 8, 42).unwrap();
let polar_code = pq.encode(&database_vector).unwrap();
let polar_score = pq.inner_product_estimate(&polar_code, &query_vector).unwrap();

§Choosing Parameters

Use caseRecommended bitsRecommended projections
Semantic search (recall@10)8dim / 4
KV cache compression4–6dim / 8
Maximum compression3dim / 16

§References

  • TurboQuant: Zandieh et al., ICLR 2026
  • PolarQuant: Zandieh et al., AISTATS 2026
  • QJL: Zandieh et al., AAAI 2025

Re-exports§

pub use error::Result;
pub use error::TurboQuantError;
pub use kv::CompressedToken;
pub use kv::KvCacheCompressor;
pub use kv::KvCacheConfig;
pub use polar::PolarCode;
pub use polar::PolarQuantizer;
pub use qjl::QjlQuantizer;
pub use qjl::QjlSketch;
pub use rotation::Rotation;
pub use rotation::StoredRotation;
pub use turbo::BatchStats;
pub use turbo::TurboCode;
pub use turbo::TurboQuantizer;

Modules§

error
kv
KV cache compression for transformer attention.
polar
PolarQuant: high-efficiency vector compression via polar coordinate encoding.
qjl
Quantized Johnson-Lindenstrauss (QJL) transform for unbiased inner product estimation.
rotation
Random rotation matrices for whitening high-dimensional vectors before quantization.
turbo
TurboQuant: two-stage vector compression combining PolarQuant and QJL.