# TurboQuant
TurboQuant is a Rust library for vector quantization on normalized vectors, built around the TurboQuant pipeline for LLM KV-cache research.
## Installation
```toml
[dependencies]
turboquant = "0.1.0"
```
Rust `1.87.0+` is required.
## Quick Start
```rust
use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;
let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;
let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;
println!(
"compression={:.1}x mse={:.6}",
q.compression_ratio(),
tq.actual_mse(&x)?
);
# let _ = x_hat;
# Ok::<(), turboquant::TurboQuantError>(())
```
## Core APIs
| `TurboQuantMSE` | Quantize for reconstruction quality | Random rotation + scalar codebook |
| `TurboQuantProd` | Quantize for inner-product estimation | MSE stage plus QJL residual sketch |
| `QJL` | Standalone 1-bit sketch | Accepts finite vectors |
| `PolarQuant` | Experimental alternative quantizer | Requires power-of-two dimension |
| `QuantizedKVCache` | Single-head quantized cache | Useful for attention-score experiments |
| `MultiHeadKVCache` | Multi-head cache wrapper | Enforces matching token counts per head |
## Common Workflows
### Reconstruction-Oriented Quantization
```rust
use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;
let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;
let x = normalize(&vec![1.0; dim])?;
let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;
let bound = tq.distortion_bound();
# let _ = (x_hat, bound);
# Ok::<(), turboquant::TurboQuantError>(())
```
### Inner-Product Estimation
```rust
use turboquant::TurboQuantProd;
use turboquant::utils::normalize;
let dim = 128;
let tq = TurboQuantProd::new(dim, 3, 42)?;
let x = normalize(&vec![1.0; dim])?;
let query = normalize(&vec![0.5; dim])?;
let q = tq.quantize(&x)?;
let score = tq.estimate_inner_product(&q, &query)?;
# let _ = score;
# Ok::<(), turboquant::TurboQuantError>(())
```
### Quantized KV Cache
```rust
use turboquant::kv_cache::{KVCacheConfig, QuantStrategy, QuantizedKVCache};
use turboquant::utils::normalize;
let dim = 64;
let config = KVCacheConfig::new(dim)
.with_key_bits(4)
.with_value_bits(4)
.with_key_strategy(QuantStrategy::Prod)
.with_max_tokens(128);
let mut cache = QuantizedKVCache::new(config)?;
let keys = vec![normalize(&vec![1.0; dim])?];
let values = vec![normalize(&vec![1.0; dim])?];
let query = normalize(&vec![1.0; dim])?;
cache.append(&keys, &values)?;
let scores = cache.attention_scores(&query)?;
# let _ = scores;
# Ok::<(), turboquant::TurboQuantError>(())
```
## Input Contract
Most of the library assumes **finite, unit-norm vectors**.
- `TurboQuantMSE`, `TurboQuantProd`, and KV-cache keys and values require normalized inputs
- `TurboQuantProd::new` requires `bit_width >= 2`
- `PolarQuant::new` requires a power-of-two dimension
- `QJL` accepts finite vectors and validates sketch lengths before use
- Multi-head cache append operations require every head to receive the same number of new tokens
If you are feeding raw activations into the library, call `turboquant::utils::normalize` first unless you already enforce that invariant upstream.
Malformed payloads are checked before dequantization or score estimation. The crate returns typed errors such as `LengthMismatch`, `InvalidQuantizationIndex`, and `BitWidthMismatch` instead of silently truncating data.
## How It Is Structured
```text
input vector
-> random rotation
-> scalar quantization (TurboQuantMSE)
-> residual sketch for dot products (TurboQuantProd / QJL)
-> packed storage for batches and caches (batch, bitpack, kv_cache)
```
## License
MIT. See [LICENSE](LICENSE).
## References
1. Zandieh, Mirrokni. [*TurboQuant: A Near-Optimal Vector Quantizer for LLM KV Cache*](https://arxiv.org/abs/2504.19874)
2. [*Quantized Johnson-Lindenstrauss Transform*](https://arxiv.org/abs/2406.03482)
3. [*PolarQuant: Polar Coordinate Quantization for KV Caches*](https://arxiv.org/abs/2502.02617)