turboquant 0.1.0

Implementation of Google's TurboQuant algorithm for vector quantization
Documentation

TurboQuant

TurboQuant is a Rust library for vector quantization on normalized vectors, built around the TurboQuant pipeline for LLM KV-cache research.

Installation

[dependencies]
turboquant = "0.1.0"

Rust 1.87.0+ is required.

Quick Start

use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;

let raw: Vec<f64> = (0..dim).map(|i| i as f64).collect();
let x = normalize(&raw)?;

let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;

println!(
    "compression={:.1}x mse={:.6}",
    q.compression_ratio(),
    tq.actual_mse(&x)?
);
# let _ = x_hat;
# Ok::<(), turboquant::TurboQuantError>(())

Core APIs

Type Purpose Notes
TurboQuantMSE Quantize for reconstruction quality Random rotation + scalar codebook
TurboQuantProd Quantize for inner-product estimation MSE stage plus QJL residual sketch
QJL Standalone 1-bit sketch Accepts finite vectors
PolarQuant Experimental alternative quantizer Requires power-of-two dimension
QuantizedKVCache Single-head quantized cache Useful for attention-score experiments
MultiHeadKVCache Multi-head cache wrapper Enforces matching token counts per head

Common Workflows

Reconstruction-Oriented Quantization

use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;
let x = normalize(&vec![1.0; dim])?;

let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;
let bound = tq.distortion_bound();

# let _ = (x_hat, bound);
# Ok::<(), turboquant::TurboQuantError>(())

Inner-Product Estimation

use turboquant::TurboQuantProd;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantProd::new(dim, 3, 42)?;
let x = normalize(&vec![1.0; dim])?;
let query = normalize(&vec![0.5; dim])?;

let q = tq.quantize(&x)?;
let score = tq.estimate_inner_product(&q, &query)?;

# let _ = score;
# Ok::<(), turboquant::TurboQuantError>(())

Quantized KV Cache

use turboquant::kv_cache::{KVCacheConfig, QuantStrategy, QuantizedKVCache};
use turboquant::utils::normalize;

let dim = 64;
let config = KVCacheConfig::new(dim)
    .with_key_bits(4)
    .with_value_bits(4)
    .with_key_strategy(QuantStrategy::Prod)
    .with_max_tokens(128);

let mut cache = QuantizedKVCache::new(config)?;
let keys = vec![normalize(&vec![1.0; dim])?];
let values = vec![normalize(&vec![1.0; dim])?];
let query = normalize(&vec![1.0; dim])?;

cache.append(&keys, &values)?;
let scores = cache.attention_scores(&query)?;

# let _ = scores;
# Ok::<(), turboquant::TurboQuantError>(())

Input Contract

Most of the library assumes finite, unit-norm vectors.

  • TurboQuantMSE, TurboQuantProd, and KV-cache keys and values require normalized inputs
  • TurboQuantProd::new requires bit_width >= 2
  • PolarQuant::new requires a power-of-two dimension
  • QJL accepts finite vectors and validates sketch lengths before use
  • Multi-head cache append operations require every head to receive the same number of new tokens

If you are feeding raw activations into the library, call turboquant::utils::normalize first unless you already enforce that invariant upstream.

Malformed payloads are checked before dequantization or score estimation. The crate returns typed errors such as LengthMismatch, InvalidQuantizationIndex, and BitWidthMismatch instead of silently truncating data.

How It Is Structured

input vector
  -> random rotation
  -> scalar quantization                     (TurboQuantMSE)
  -> residual sketch for dot products       (TurboQuantProd / QJL)
  -> packed storage for batches and caches  (batch, bitpack, kv_cache)

License

MIT. See LICENSE.

References

  1. Zandieh, Mirrokni. TurboQuant: A Near-Optimal Vector Quantizer for LLM KV Cache
  2. Quantized Johnson-Lindenstrauss Transform
  3. PolarQuant: Polar Coordinate Quantization for KV Caches