turboquant 0.1.0

Implementation of Google's TurboQuant algorithm for vector quantization
Documentation
# TurboQuant

TurboQuant is a Rust library for vector quantization on normalized vectors, built around the TurboQuant pipeline for LLM KV-cache research.

## Installation

```toml
[dependencies]
turboquant = "0.1.0"
```

Rust `1.87.0+` is required.

## Quick Start

```rust
use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;

let raw: Vec<f64> = (0..dim).map(|i| i as f64).collect();
let x = normalize(&raw)?;

let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;

println!(
    "compression={:.1}x mse={:.6}",
    q.compression_ratio(),
    tq.actual_mse(&x)?
);
# let _ = x_hat;
# Ok::<(), turboquant::TurboQuantError>(())
```

## Core APIs

| Type | Purpose | Notes |
|------|---------|-------|
| `TurboQuantMSE` | Quantize for reconstruction quality | Random rotation + scalar codebook |
| `TurboQuantProd` | Quantize for inner-product estimation | MSE stage plus QJL residual sketch |
| `QJL` | Standalone 1-bit sketch | Accepts finite vectors |
| `PolarQuant` | Experimental alternative quantizer | Requires power-of-two dimension |
| `QuantizedKVCache` | Single-head quantized cache | Useful for attention-score experiments |
| `MultiHeadKVCache` | Multi-head cache wrapper | Enforces matching token counts per head |

## Common Workflows

### Reconstruction-Oriented Quantization

```rust
use turboquant::TurboQuantMSE;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantMSE::new(dim, 4, 42)?;
let x = normalize(&vec![1.0; dim])?;

let q = tq.quantize(&x)?;
let x_hat = tq.dequantize(&q)?;
let bound = tq.distortion_bound();

# let _ = (x_hat, bound);
# Ok::<(), turboquant::TurboQuantError>(())
```

### Inner-Product Estimation

```rust
use turboquant::TurboQuantProd;
use turboquant::utils::normalize;

let dim = 128;
let tq = TurboQuantProd::new(dim, 3, 42)?;
let x = normalize(&vec![1.0; dim])?;
let query = normalize(&vec![0.5; dim])?;

let q = tq.quantize(&x)?;
let score = tq.estimate_inner_product(&q, &query)?;

# let _ = score;
# Ok::<(), turboquant::TurboQuantError>(())
```

### Quantized KV Cache

```rust
use turboquant::kv_cache::{KVCacheConfig, QuantStrategy, QuantizedKVCache};
use turboquant::utils::normalize;

let dim = 64;
let config = KVCacheConfig::new(dim)
    .with_key_bits(4)
    .with_value_bits(4)
    .with_key_strategy(QuantStrategy::Prod)
    .with_max_tokens(128);

let mut cache = QuantizedKVCache::new(config)?;
let keys = vec![normalize(&vec![1.0; dim])?];
let values = vec![normalize(&vec![1.0; dim])?];
let query = normalize(&vec![1.0; dim])?;

cache.append(&keys, &values)?;
let scores = cache.attention_scores(&query)?;

# let _ = scores;
# Ok::<(), turboquant::TurboQuantError>(())
```

## Input Contract

Most of the library assumes **finite, unit-norm vectors**.

- `TurboQuantMSE`, `TurboQuantProd`, and KV-cache keys and values require normalized inputs
- `TurboQuantProd::new` requires `bit_width >= 2`
- `PolarQuant::new` requires a power-of-two dimension
- `QJL` accepts finite vectors and validates sketch lengths before use
- Multi-head cache append operations require every head to receive the same number of new tokens

If you are feeding raw activations into the library, call `turboquant::utils::normalize` first unless you already enforce that invariant upstream.

Malformed payloads are checked before dequantization or score estimation. The crate returns typed errors such as `LengthMismatch`, `InvalidQuantizationIndex`, and `BitWidthMismatch` instead of silently truncating data.

## How It Is Structured

```text
input vector
  -> random rotation
  -> scalar quantization                     (TurboQuantMSE)
  -> residual sketch for dot products       (TurboQuantProd / QJL)
  -> packed storage for batches and caches  (batch, bitpack, kv_cache)
```

## License

MIT. See [LICENSE](LICENSE).

## References

1. Zandieh, Mirrokni. [*TurboQuant: A Near-Optimal Vector Quantizer for LLM KV Cache*]https://arxiv.org/abs/2504.19874
2. [*Quantized Johnson-Lindenstrauss Transform*]https://arxiv.org/abs/2406.03482
3. [*PolarQuant: Polar Coordinate Quantization for KV Caches*]https://arxiv.org/abs/2502.02617