Compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data. Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.
Key Properties
- Data-oblivious — no training, no codebooks, no calibration data
- Deterministic — fully defined by 4 integers:
(dimension, bits, projections, seed) - Provably unbiased — inner product estimates satisfy
E[estimate] = exactat 3+ bits - Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
- Instant indexing — vectors compress on arrival, 600x faster than Product Quantization
What's New in 0.3.x
-
58 integrations — every major AI framework, vector database, and ML library
-
PyTorch torchao — embedding quantizer, BitPolarLinear, KV cache
-
FAISS drop-in — API-compatible IndexBitPolarIP/L2 replacement
-
LlamaIndex, Haystack, DSPy — VectorStore and Retriever integrations
-
Agentic AI — LangGraph, CrewAI, OpenAI Agents, Google ADK, SmolAgents, PydanticAI
-
Agent memory — Mem0, Zep, Letta backends
-
11 vector databases — Milvus, Weaviate, Pinecone, Redis, ES, DuckDB, SQLite, and more
-
LLM inference — llama.cpp, SGLang, TensorRT, Ollama, MLX KV cache compression
-
ML frameworks — JAX/Flax, TensorFlow/Keras, scikit-learn pipeline
-
30 Python examples covering all integrations
-
Walsh-Hadamard Transform — O(d log d) rotation with O(d) memory (577x less than Haar QR)
-
Python bindings — PyO3 + maturin, zero-copy numpy integration
-
WASM bindings — browser-side vector search via wasm-bindgen
-
no_stdsupport — embedded/edge deployment withallocfeature
Quick Start
Rust
[]
= "0.3"
use TurboQuantizer;
use VectorQuantizer;
// Create quantizer from 4 integers — no training needed
let q = new.unwrap;
// Encode a vector
let vector = vec!;
let code = q.encode.unwrap;
// Estimate inner product without decompression
let query = vec!;
let score = q.inner_product_estimate.unwrap;
// Decode back to approximate vector
let reconstructed = q.decode;
Python
# Create quantizer — no training needed
=
# Encode/decode
=
=
=
# Build a search index
=
, =
JavaScript (WASM)
import init from 'bitpolar-wasm';
await ;
const q = ;
const code = q.;
const decoded = q.;
const index = ;
index.;
const results = index.;
Walsh-Hadamard Transform
The WHT provides an O(d log d) alternative to Haar QR rotation:
| Property | Haar QR (0.1.x) | Walsh-Hadamard (0.2.x+) |
|---|---|---|
| Time complexity | O(d²) | O(d log d) |
| Memory | O(d²) — 2.3 MB @ d=768 | O(d) — 4 KB @ d=768 |
| Quality | Exact Haar distribution | Near-Haar (JL guarantees) |
| Deterministic | Yes (seed-based) | Yes (seed-based) |
use WhtRotation;
use RotationStrategy;
let wht = new.unwrap;
let rotated = wht.rotate;
let recovered = wht.rotate_inverse;
API Overview
Core Quantizers
| Type | Description | Use Case |
|---|---|---|
TurboQuantizer |
Two-stage (Polar + QJL) | Primary API — best quality |
PolarQuantizer |
Polar coordinate encoding | Simpler, fallback option |
QjlQuantizer |
1-bit JL sketching | Residual correction |
WhtRotation |
Walsh-Hadamard rotation | Fast, memory-efficient rotation |
Specialized Wrappers
| Type | Description |
|---|---|
KvCacheCompressor |
Transformer KV cache compression |
MultiHeadKvCache |
Multi-head attention KV cache |
TieredQuantization |
Hot (8-bit) / Warm (4-bit) / Cold (3-bit) |
ResilientQuantizer |
Primary + fallback for production robustness |
OversampledSearch |
Two-phase approximate + exact re-ranking |
DistortionTracker |
Online quality monitoring (EMA MSE/bias) |
Language Bindings
| Package | Install | Language |
|---|---|---|
bitpolar |
cargo add bitpolar |
Rust |
bitpolar |
pip install bitpolar |
Python (PyO3) |
@mmgehlot/bitpolar-wasm |
npm install @mmgehlot/bitpolar-wasm |
JavaScript (WASM) |
@mmgehlot/bitpolar |
npm install @mmgehlot/bitpolar |
Node.js (NAPI-RS) |
bitpolar-go |
go get github.com/mmgehlot/bitpolar/... |
Go (CGO) |
bitpolar |
Maven Central | Java (JNI) |
bitpolar-pg |
cargo pgrx install |
PostgreSQL |
58 Integrations — Every Major AI Framework
BitPolar is the single canonical library for vector quantization across the entire AI/ML ecosystem.
RAG & Search Frameworks
| Integration | Package | Description |
|---|---|---|
| LangChain | langchain_bitpolar |
VectorStore with compressed similarity search |
| LlamaIndex | llamaindex_bitpolar |
BasePydanticVectorStore for LlamaIndex |
| Haystack | bitpolar_haystack |
DocumentStore + Retriever component |
| DSPy | bitpolar_dspy |
Retriever module for DSPy pipelines |
| FAISS | bitpolar_faiss |
Drop-in replacement for faiss.IndexFlatIP/L2 |
| ChromaDB | bitpolar_chroma |
EmbeddingFunction + two-phase search store |
Agentic AI Frameworks
| Integration | Package | Description |
|---|---|---|
| LangGraph | bitpolar_langgraph |
Compressed checkpoint saver for stateful agents |
| CrewAI | bitpolar_crewai |
Memory backend for agent teams |
| OpenAI Agents SDK | bitpolar_openai_agents |
Function-calling tools for OpenAI agents |
| Google ADK | bitpolar_google_adk |
Tool for Google Agent Development Kit |
| Anthropic MCP | bitpolar_anthropic |
MCP server (stdio + SSE) for Claude |
| AutoGen | bitpolar_autogen |
Memory store for Microsoft agents |
| SmolAgents | bitpolar_smolagents |
HuggingFace agent tool |
| PydanticAI | bitpolar_pydantic_ai |
Type-safe Pydantic tool definitions |
| Agno (Phidata) | bitpolar_agno |
Knowledge base for high-perf agents |
Agent Memory Frameworks
| Integration | Package | Description |
|---|---|---|
| Mem0 | bitpolar_mem0 |
Vector store backend for Mem0 |
| Zep | bitpolar_zep |
Compressed store with time-decay scoring |
| Letta (MemGPT) | bitpolar_letta |
Archival memory tier |
Vector Databases
| Integration | Package | Description |
|---|---|---|
| Qdrant | bitpolar_embeddings.qdrant |
Two-phase HNSW + BitPolar re-ranking |
| Milvus | bitpolar_milvus |
Client-side compression with reranking |
| Weaviate | bitpolar_weaviate |
Client-side compression with reranking |
| Pinecone | bitpolar_pinecone |
Metadata-stored compressed codes |
| Redis | bitpolar_redis |
Byte string storage with pipeline search |
| Elasticsearch | bitpolar_elasticsearch |
kNN search + BitPolar reranking |
| PostgreSQL | bitpolar-pg |
Native pgrx extension (SQL functions) |
| DuckDB | bitpolar_duckdb |
BLOB storage with SQL queries |
| SQLite | bitpolar_sqlite_vec |
Zero-dependency embedded vector search |
| Supabase | bitpolar_supabase |
Serverless pgvector compression |
| Neon | bitpolar_neon |
Serverless Postgres driver |
LLM Inference Engines (KV Cache)
| Integration | Package | Description |
|---|---|---|
| vLLM | bitpolar_vllm |
KV cache quantizer + DynamicCache |
| HuggingFace Transformers | bitpolar_transformers |
Drop-in DynamicCache replacement |
| llama.cpp | bitpolar_llamacpp |
KV cache compression |
| SGLang | bitpolar_sglang |
RadixAttention cache compression |
| TensorRT-LLM | bitpolar_tensorrt |
KV cache quantizer plugin |
| Ollama | bitpolar_ollama |
Embedding compression client |
| ONNX Runtime | bitpolar_onnx |
Model embedding quantizer |
| Apple MLX | bitpolar_mlx |
Apple Silicon quantizer |
ML Frameworks
| Integration | Package | Description |
|---|---|---|
| PyTorch | bitpolar_torch |
Embedding quantizer, BitPolarLinear, KV cache |
| PyTorch (native) | bitpolar_torch_native |
PT2E quantizer backend |
| JAX/Flax | bitpolar_jax |
JAX array compression + Flax module |
| TensorFlow | bitpolar_tensorflow |
Keras layers for compression |
| scikit-learn | bitpolar_sklearn |
TransformerMixin for sklearn pipelines |
Cloud & Enterprise
| Integration | Package | Description |
|---|---|---|
| Spring AI | BitPolarVectorStore.java |
Java VectorStore for Spring Boot |
| Vercel AI SDK | bitpolar_vercel |
Embedding compression middleware |
| AWS Bedrock | bitpolar_bedrock |
Titan/Cohere embedding compression |
| Triton | bitpolar_triton |
NVIDIA Inference Server backend |
| gRPC | bitpolar-server |
Language-agnostic compression service |
| MCP | bitpolar_mcp |
AI coding assistant tool server |
| CLI | bitpolar-cli |
Command-line compress/search/bench |
How It Works
Input f32 vector
│
▼
┌─────────────────┐
│ Random Rotation │ WHT (O(d log d)) or Haar QR (O(d²))
│ │ Spreads energy uniformly across coordinates
└────────┬────────┘
│
▼
┌─────────────────┐
│ PolarQuant │ Groups d dims into d/2 pairs → polar coords
│ (Stage 1) │ Radii: lossless f32 │ Angles: b-bit quantized
└────────┬────────┘
│
▼
┌─────────────────┐
│ QJL Residual │ Sketches reconstruction error
│ (Stage 2) │ 1 sign bit per projection → unbiased correction
└────────┬────────┘
│
▼
TurboCode { polar: PolarCode, residual: QjlSketch }
Inner product estimation: ⟨v, q⟩ ≈ IP_polar(code, q) + IP_qjl(sketch, q)
Parameter Selection
| Use Case | Bits | Projections | Notes |
|---|---|---|---|
| Semantic search | 4-8 | dim/4 | Best accuracy for retrieval |
| KV cache | 3-6 | dim/8 | Memory vs attention quality |
| Maximum compression | 3 | dim/16 | Still provably unbiased |
| Lightweight similarity | — | dim/4 | QJL standalone (1-bit sketches) |
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
Yes | Standard library (nalgebra QR, full rotation) |
alloc |
No | Heap allocation without std (Vec via alloc crate) |
serde-support |
Yes | Serde serialization for all types |
simd |
No | Hand-tuned NEON/AVX2 kernels |
parallel |
No | Parallel batch operations via rayon |
tracing-support |
No | OpenTelemetry-compatible instrumentation |
ffi |
No | C FFI exports for cross-language bindings |
no_std Support
BitPolar works on embedded/edge targets with no_std:
[]
= { = "0.3", = false, = ["alloc"] }
Uses libm for math functions and alloc for Vec/String. The Walsh-Hadamard rotation is available without std (unlike Haar QR which requires nalgebra).
Traits
BitPolar exposes composable traits for ecosystem integration:
VectorQuantizer— core encode/decode/IP/L2 interfaceBatchQuantizer— parallel batch operations (behindparallelfeature)RotationStrategy— pluggable rotation (QR, Walsh-Hadamard, identity)SerializableCode— compact binary serialization
Examples
30 Python examples + 9 Rust examples + JavaScript, Go, Java examples.
# Rust
# Python (30 examples covering all 58 integrations)
See examples/README.md for the full list.
Performance
Run benchmarks:
References
- TurboQuant (ICLR 2026): arXiv 2504.19874
- PolarQuant (AISTATS 2026): arXiv 2502.02617
- QJL (AAAI 2025): arXiv 2406.03482
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to add a new quantization strategy.
License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.