BitPolar
Near-optimal vector quantization with zero training overhead — compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data required.
Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.
Key Properties
- Data-oblivious — no training, no codebooks, no calibration data
- Deterministic — fully defined by 4 integers:
(dimension, bits, projections, seed) - Provably unbiased — inner product estimates satisfy
E[estimate] = exactat 3+ bits - Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
- Instant indexing — vectors compress on arrival, 600x faster than Product Quantization
Quick Start
use TurboQuantizer;
use VectorQuantizer;
// Create quantizer from 4 integers — no training needed
let q = new.unwrap;
// Encode a vector
let vector = vec!;
let code = q.encode.unwrap;
// Estimate inner product without decompression
let query = vec!;
let score = q.inner_product_estimate.unwrap;
// Decode back to approximate vector
let reconstructed = q.decode;
API Overview
| Type | Description |
|---|---|
TurboQuantizer |
Two-stage quantizer (Polar + QJL) — the primary API |
PolarQuantizer |
Single-stage polar coordinate encoding |
QjlQuantizer |
1-bit Johnson-Lindenstrauss sketching |
KvCacheCompressor |
Transformer KV cache compression |
MultiHeadKvCache |
Multi-head attention KV cache |
DistortionTracker |
Online quality monitoring (EMA MSE/bias) |
How It Works
Input f32 vector
|
v
[Random Rotation] Haar-distributed orthogonal matrix (QR of Gaussian)
| Spreads energy uniformly across coordinates
v
[PolarQuant] Groups d dims into d/2 pairs -> polar coords
(Stage 1) Radii: lossless f32 | Angles: b-bit quantized
|
v
[QJL Residual] Sketches reconstruction error
(Stage 2) 1 sign bit per projection -> unbiased correction
|
v
TurboCode { polar: PolarCode, residual: QjlSketch }
Inner product estimation combines both stages:
<v, q> ~ IP_polar(code, q) + IP_qjl(residual_sketch, q)
Parameter Selection
| Use Case | Bits | Projections | Notes |
|---|---|---|---|
| Semantic search | 4-8 | dim/4 | Best accuracy for retrieval |
| KV cache | 3-6 | dim/8 | Memory vs attention quality |
| Maximum compression | 3 | dim/16 | Still provably unbiased |
| Lightweight similarity | -- | dim/4 | QJL standalone (1-bit sketches) |
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
Yes | Standard library (nalgebra QR decomposition) |
serde-support |
Yes | Serde serialization for all types |
simd |
No | Hand-tuned NEON/AVX2 kernels |
parallel |
No | Parallel batch operations via rayon |
tracing-support |
No | OpenTelemetry-compatible instrumentation |
Performance
Run benchmarks:
Run examples:
Traits
BitPolar exposes composable traits for ecosystem integration:
VectorQuantizer— core encode/decode/IP/L2 interfaceBatchQuantizer— parallel batch operations (behindparallelfeature)RotationStrategy— pluggable rotation (QR, Walsh-Hadamard, identity)SerializableCode— compact binary serialization
References
- TurboQuant (ICLR 2026): arXiv 2504.19874
- PolarQuant (AISTATS 2026): arXiv 2502.02617
- QJL (AAAI 2025): arXiv 2406.03482
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, commit message conventions, and how to add a new quantization strategy.
License
Licensed under:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)