TurboQuant
TurboQuant is a Rust library for vector quantization on normalized vectors, built around the TurboQuant pipeline for LLM KV-cache research.
Installation
[]
= "0.1.0"
Rust 1.87.0+ is required.
Quick Start
use TurboQuantMSE;
use normalize;
let dim = 128;
let tq = new?;
let raw: = .map.collect;
let x = normalize?;
let q = tq.quantize?;
let x_hat = tq.dequantize?;
println!;
# let _ = x_hat;
# Ok::
Core APIs
| Type | Purpose | Notes |
|---|---|---|
TurboQuantMSE |
Quantize for reconstruction quality | Random rotation + scalar codebook |
TurboQuantProd |
Quantize for inner-product estimation | MSE stage plus QJL residual sketch |
QJL |
Standalone 1-bit sketch | Accepts finite vectors |
PolarQuant |
Experimental alternative quantizer | Requires power-of-two dimension |
QuantizedKVCache |
Single-head quantized cache | Useful for attention-score experiments |
MultiHeadKVCache |
Multi-head cache wrapper | Enforces matching token counts per head |
Common Workflows
Reconstruction-Oriented Quantization
use TurboQuantMSE;
use normalize;
let dim = 128;
let tq = new?;
let x = normalize?;
let q = tq.quantize?;
let x_hat = tq.dequantize?;
let bound = tq.distortion_bound;
# let _ = ;
# Ok::
Inner-Product Estimation
use TurboQuantProd;
use normalize;
let dim = 128;
let tq = new?;
let x = normalize?;
let query = normalize?;
let q = tq.quantize?;
let score = tq.estimate_inner_product?;
# let _ = score;
# Ok::
Quantized KV Cache
use ;
use normalize;
let dim = 64;
let config = new
.with_key_bits
.with_value_bits
.with_key_strategy
.with_max_tokens;
let mut cache = new?;
let keys = vec!;
let values = vec!;
let query = normalize?;
cache.append?;
let scores = cache.attention_scores?;
# let _ = scores;
# Ok::
Input Contract
Most of the library assumes finite, unit-norm vectors.
TurboQuantMSE,TurboQuantProd, and KV-cache keys and values require normalized inputsTurboQuantProd::newrequiresbit_width >= 2PolarQuant::newrequires a power-of-two dimensionQJLaccepts finite vectors and validates sketch lengths before use- Multi-head cache append operations require every head to receive the same number of new tokens
If you are feeding raw activations into the library, call turboquant::utils::normalize first unless you already enforce that invariant upstream.
Malformed payloads are checked before dequantization or score estimation. The crate returns typed errors such as LengthMismatch, InvalidQuantizationIndex, and BitWidthMismatch instead of silently truncating data.
How It Is Structured
input vector
-> random rotation
-> scalar quantization (TurboQuantMSE)
-> residual sketch for dot products (TurboQuantProd / QJL)
-> packed storage for batches and caches (batch, bitpack, kv_cache)
License
MIT. See LICENSE.