Expand description
§bitpolar
Zero-overhead vector quantization for semantic search and KV cache compression.
Implements three algorithms from Google Research:
- TurboQuant (ICLR 2026) — two-stage composition for near-optimal compression
- PolarQuant (AISTATS 2026) — polar coordinate encoding with lossless radii
- QJL (AAAI 2025) — 1-bit Johnson-Lindenstrauss sketching
§Key Properties
- Data-oblivious: No training, no codebooks, no calibration data
- Deterministic: Fully defined by 4 integers:
(dimension, bits, projections, seed) - Provably unbiased: Inner product estimates are unbiased at 3+ bits
- Near-optimal: Within ~2.7x of Shannon rate-distortion limit
- Instant indexing: Vectors compress on arrival — no offline training
§Quick Start
use bitpolar::TurboQuantizer;
use bitpolar::traits::VectorQuantizer;
// Create quantizer from 4 integers — no training needed
let q = TurboQuantizer::new(128, 4, 32, 42).unwrap();
// Encode a vector
let vector = vec![0.1_f32; 128];
let code = q.encode(&vector).unwrap();
// Estimate inner product without decompression
let query = vec![0.05_f32; 128];
let score = q.inner_product_estimate(&code, &query).unwrap();
println!("Estimated IP: {score}");
// Decode back to approximate vector
let reconstructed = q.decode(&code);
assert_eq!(reconstructed.len(), 128);§Architecture
Input f32 vector
│
▼
┌─────────────────┐
│ Random Rotation │ Haar-distributed orthogonal matrix (QR of Gaussian)
│ (StoredRotation) │ Spreads energy uniformly across coordinates
└────────┬────────┘
│
▼
┌─────────────────┐
│ PolarQuant │ Groups d dims into d/2 pairs → polar coords
│ (Stage 1) │ Radii: lossless f32 | Angles: b-bit quantized
└────────┬────────┘
│
▼
┌─────────────────┐
│ QJL Residual │ Sketches reconstruction error
│ (Stage 2) │ 1 sign bit per projection → unbiased correction
└────────┬────────┘
│
▼
TurboCode { polar: PolarCode, residual_sketch: QjlSketch }§Feature Flags
| Feature | Default | Description |
|---|---|---|
std | Yes | Standard library (nalgebra QR decomposition) |
serde-support | Yes | Serde serialization for all types |
simd | No | Hand-tuned NEON/AVX2 kernels |
parallel | No | Parallel batch operations via rayon |
tracing-support | No | OpenTelemetry-compatible instrumentation |
ffi | No | C FFI exports |
Re-exports§
pub use error::Result;pub use error::TurboQuantError;pub use polar::PolarCode;pub use polar::PolarQuantizer;pub use qjl::QjlQuantizer;pub use qjl::QjlSketch;pub use turbo::TurboCode;pub use turbo::TurboQuantizer;pub use kv_cache::KvCacheCompressor;pub use kv_cache::KvCacheConfig;pub use kv_cache::MultiHeadKvCache;pub use distortion::DistortionTracker;pub use stats::BatchStats;pub use stats::DistortionMetrics;pub use rotation::StoredRotation;pub use wht::WhtRotation;pub use tiered::Tier;pub use tiered::TieredCode;pub use tiered::TieredQuantization;pub use resilient::ResilientCode;pub use resilient::ResilientQuantizer;pub use search::OversampledSearch;
Modules§
- adaptive
- Adaptive per-vector bit-width selection with promote/demote Adaptive bit-width quantization.
- distortion
- Online distortion tracking for quality monitoring Online distortion tracking for quality monitoring.
- error
- Error types — all public APIs return
Result<T, TurboQuantError>Error types for bitpolar. - kv_
cache - KV cache compressor for transformer attention KV cache compressor for transformer attention.
- metrics
- Prometheus-compatible metrics export for monitoring Prometheus-compatible metrics export for BitPolar operations.
- polar
- PolarQuant: polar coordinate vector encoding (Stage 1) PolarQuant: polar coordinate vector encoding (Stage 1 of TurboQuant).
- qjl
- Quantized Johnson-Lindenstrauss 1-bit sketching (Stage 2) Quantized Johnson-Lindenstrauss (QJL) 1-bit sketching.
- resilient
- Resilient quantization with automatic primary→fallback strategy Resilient quantization with automatic fallback.
- rotation
- Haar-distributed orthogonal rotation matrix (O(d²) memory) Haar-distributed orthogonal rotation matrix.
- search
- Oversampled approximate nearest-neighbor search with exact re-ranking Oversampled approximate nearest-neighbor search.
- stats
- Compression statistics and quality metrics Compression statistics and quality metrics.
- tiered
- Tiered quantization: hot, warm, and cold storage tiers Tiered quantization: hot, warm, and cold storage tiers.
- traits
- Core traits for ecosystem integration: VectorQuantizer, BatchQuantizer, etc. Core traits for ecosystem integration.
- turbo
- TurboQuantizer: two-stage composition (Polar + QJL) TurboQuantizer: two-stage composition of PolarQuant + QJL residual.
- wht
- Walsh-Hadamard Transform rotation (O(d) memory, O(d log d) time) Walsh-Hadamard Transform (WHT) rotation strategy.
Constants§
- COMPACT_
FORMAT_ VERSION - Compact binary format version for all serialized codes.