BitPolar

Near-optimal vector quantization with zero training overhead — compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data required.

Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.

Key Properties

Data-oblivious — no training, no codebooks, no calibration data
Deterministic — fully defined by 4 integers: (dimension, bits, projections, seed)
Provably unbiased — inner product estimates satisfy E[estimate] = exact at 3+ bits
Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
Instant indexing — vectors compress on arrival, 600x faster than Product Quantization

Quick Start

use bitpolar::TurboQuantizer;
use bitpolar::traits::VectorQuantizer;

// Create quantizer from 4 integers — no training needed
let q = TurboQuantizer::new(128, 4, 32, 42).unwrap();

// Encode a vector
let vector = vec![0.1_f32; 128];
let code = q.encode(&vector).unwrap();

// Estimate inner product without decompression
let query = vec![0.05_f32; 128];
let score = q.inner_product_estimate(&code, &query).unwrap();

// Decode back to approximate vector
let reconstructed = q.decode(&code);

API Overview

Type	Description
`TurboQuantizer`	Two-stage quantizer (Polar + QJL) — the primary API
`PolarQuantizer`	Single-stage polar coordinate encoding
`QjlQuantizer`	1-bit Johnson-Lindenstrauss sketching
`KvCacheCompressor`	Transformer KV cache compression
`MultiHeadKvCache`	Multi-head attention KV cache
`DistortionTracker`	Online quality monitoring (EMA MSE/bias)

How It Works

Input f32 vector
    |
    v
[Random Rotation]     Haar-distributed orthogonal matrix (QR of Gaussian)
    |                  Spreads energy uniformly across coordinates
    v
[PolarQuant]          Groups d dims into d/2 pairs -> polar coords
 (Stage 1)            Radii: lossless f32 | Angles: b-bit quantized
    |
    v
[QJL Residual]        Sketches reconstruction error
 (Stage 2)            1 sign bit per projection -> unbiased correction
    |
    v
TurboCode { polar: PolarCode, residual: QjlSketch }

Inner product estimation combines both stages: <v, q> ~ IP_polar(code, q) + IP_qjl(residual_sketch, q)

Parameter Selection

Use Case	Bits	Projections	Notes
Semantic search	4-8	dim/4	Best accuracy for retrieval
KV cache	3-6	dim/8	Memory vs attention quality
Maximum compression	3	dim/16	Still provably unbiased
Lightweight similarity	--	dim/4	QJL standalone (1-bit sketches)

Feature Flags

Feature	Default	Description
`std`	Yes	Standard library (nalgebra QR decomposition)
`serde-support`	Yes	Serde serialization for all types
`simd`	No	Hand-tuned NEON/AVX2 kernels
`parallel`	No	Parallel batch operations via rayon
`tracing-support`	No	OpenTelemetry-compatible instrumentation

Performance

Run benchmarks:

cargo bench

Run examples:

cargo run --example vector_search
cargo run --example kv_cache

Traits

BitPolar exposes composable traits for ecosystem integration:

VectorQuantizer — core encode/decode/IP/L2 interface
BatchQuantizer — parallel batch operations (behind parallel feature)
RotationStrategy — pluggable rotation (QR, Walsh-Hadamard, identity)
SerializableCode — compact binary serialization

References

TurboQuant (ICLR 2026): arXiv 2504.19874
PolarQuant (AISTATS 2026): arXiv 2502.02617
QJL (AAAI 2025): arXiv 2406.03482

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, commit message conventions, and how to add a new quantization strategy.

License

Licensed under:

MIT License (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

bitpolar 0.1.0