bitpolar 0.3.3

Compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data. Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.

Key Properties

Data-oblivious — no training, no codebooks, no calibration data
Deterministic — fully defined by 4 integers: (dimension, bits, projections, seed)
Provably unbiased — inner product estimates satisfy E[estimate] = exact at 3+ bits
Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
Instant indexing — vectors compress on arrival, 600x faster than Product Quantization

What's New in 0.3.x

58 integrations — every major AI framework, vector database, and ML library
PyTorch torchao — embedding quantizer, BitPolarLinear, KV cache
FAISS drop-in — API-compatible IndexBitPolarIP/L2 replacement
LlamaIndex, Haystack, DSPy — VectorStore and Retriever integrations
Agentic AI — LangGraph, CrewAI, OpenAI Agents, Google ADK, SmolAgents, PydanticAI
Agent memory — Mem0, Zep, Letta backends
11 vector databases — Milvus, Weaviate, Pinecone, Redis, ES, DuckDB, SQLite, and more
LLM inference — llama.cpp, SGLang, TensorRT, Ollama, MLX KV cache compression
ML frameworks — JAX/Flax, TensorFlow/Keras, scikit-learn pipeline
30 Python examples covering all integrations
Walsh-Hadamard Transform — O(d log d) rotation with O(d) memory (577x less than Haar QR)
Python bindings — PyO3 + maturin, zero-copy numpy integration
WASM bindings — browser-side vector search via wasm-bindgen
no_std support — embedded/edge deployment with alloc feature

Quick Start

Rust

[dependencies]
bitpolar = "0.3"

use bitpolar::TurboQuantizer;
use bitpolar::traits::VectorQuantizer;

// Create quantizer from 4 integers — no training needed
let q = TurboQuantizer::new(128, 4, 32, 42).unwrap();

// Encode a vector
let vector = vec![0.1_f32; 128];
let code = q.encode(&vector).unwrap();

// Estimate inner product without decompression
let query = vec![0.05_f32; 128];
let score = q.inner_product_estimate(&code, &query).unwrap();

// Decode back to approximate vector
let reconstructed = q.decode(&code);

Python

pip install bitpolar

import numpy as np
import bitpolar

# Create quantizer — no training needed
q = bitpolar.TurboQuantizer(dim=768, bits=4, projections=192, seed=42)

# Encode/decode
embedding = np.random.randn(768).astype(np.float32)
code = q.encode(embedding)
decoded = q.decode(code)

# Build a search index
index = bitpolar.VectorIndex(dim=768, bits=4)
for i, vec in enumerate(embeddings):
    index.add(i, vec)

ids, scores = index.search(query, top_k=10)

JavaScript (WASM)

import init, { WasmQuantizer, WasmVectorIndex } from 'bitpolar-wasm';

await init();

const q = new WasmQuantizer(128, 4, 32, 42n);
const code = q.encode(new Float32Array(128).fill(0.1));
const decoded = q.decode(code);

const index = new WasmVectorIndex(128, 4, 32, 42n);
index.add(0, vector);
const results = index.search(query, 5);

Walsh-Hadamard Transform

The WHT provides an O(d log d) alternative to Haar QR rotation:

Property	Haar QR (0.1.x)	Walsh-Hadamard (0.2.x+)
Time complexity	O(d²)	O(d log d)
Memory	O(d²) — 2.3 MB @ d=768	O(d) — 4 KB @ d=768
Quality	Exact Haar distribution	Near-Haar (JL guarantees)
Deterministic	Yes (seed-based)	Yes (seed-based)

use bitpolar::wht::WhtRotation;
use bitpolar::traits::RotationStrategy;

let wht = WhtRotation::new(768, 42).unwrap();
let rotated = wht.rotate(&embedding);
let recovered = wht.rotate_inverse(&rotated);

API Overview

Core Quantizers

Type	Description	Use Case
`TurboQuantizer`	Two-stage (Polar + QJL)	Primary API — best quality
`PolarQuantizer`	Polar coordinate encoding	Simpler, fallback option
`QjlQuantizer`	1-bit JL sketching	Residual correction
`WhtRotation`	Walsh-Hadamard rotation	Fast, memory-efficient rotation

Specialized Wrappers

Type	Description
`KvCacheCompressor`	Transformer KV cache compression
`MultiHeadKvCache`	Multi-head attention KV cache
`TieredQuantization`	Hot (8-bit) / Warm (4-bit) / Cold (3-bit)
`ResilientQuantizer`	Primary + fallback for production robustness
`OversampledSearch`	Two-phase approximate + exact re-ranking
`DistortionTracker`	Online quality monitoring (EMA MSE/bias)

Language Bindings

Package	Install	Language
`bitpolar`	`cargo add bitpolar`	Rust
`bitpolar`	`pip install bitpolar`	Python (PyO3)
`@mmgehlot/bitpolar-wasm`	`npm install @mmgehlot/bitpolar-wasm`	JavaScript (WASM)
`@mmgehlot/bitpolar`	`npm install @mmgehlot/bitpolar`	Node.js (NAPI-RS)
`bitpolar-go`	`go get github.com/mmgehlot/bitpolar/...`	Go (CGO)
`bitpolar`	Maven Central	Java (JNI)
`bitpolar-pg`	`cargo pgrx install`	PostgreSQL

58 Integrations — Every Major AI Framework

BitPolar is the single canonical library for vector quantization across the entire AI/ML ecosystem.

RAG & Search Frameworks

Integration	Package	Description
LangChain	`langchain_bitpolar`	VectorStore with compressed similarity search
LlamaIndex	`llamaindex_bitpolar`	BasePydanticVectorStore for LlamaIndex
Haystack	`bitpolar_haystack`	DocumentStore + Retriever component
DSPy	`bitpolar_dspy`	Retriever module for DSPy pipelines
FAISS	`bitpolar_faiss`	Drop-in replacement for `faiss.IndexFlatIP/L2`
ChromaDB	`bitpolar_chroma`	EmbeddingFunction + two-phase search store

Agentic AI Frameworks

Integration	Package	Description
LangGraph	`bitpolar_langgraph`	Compressed checkpoint saver for stateful agents
CrewAI	`bitpolar_crewai`	Memory backend for agent teams
OpenAI Agents SDK	`bitpolar_openai_agents`	Function-calling tools for OpenAI agents
Google ADK	`bitpolar_google_adk`	Tool for Google Agent Development Kit
Anthropic MCP	`bitpolar_anthropic`	MCP server (stdio + SSE) for Claude
AutoGen	`bitpolar_autogen`	Memory store for Microsoft agents
SmolAgents	`bitpolar_smolagents`	HuggingFace agent tool
PydanticAI	`bitpolar_pydantic_ai`	Type-safe Pydantic tool definitions
Agno (Phidata)	`bitpolar_agno`	Knowledge base for high-perf agents

Agent Memory Frameworks

Integration	Package	Description
Mem0	`bitpolar_mem0`	Vector store backend for Mem0
Zep	`bitpolar_zep`	Compressed store with time-decay scoring
Letta (MemGPT)	`bitpolar_letta`	Archival memory tier

Vector Databases

Integration	Package	Description
Qdrant	`bitpolar_embeddings.qdrant`	Two-phase HNSW + BitPolar re-ranking
Milvus	`bitpolar_milvus`	Client-side compression with reranking
Weaviate	`bitpolar_weaviate`	Client-side compression with reranking
Pinecone	`bitpolar_pinecone`	Metadata-stored compressed codes
Redis	`bitpolar_redis`	Byte string storage with pipeline search
Elasticsearch	`bitpolar_elasticsearch`	kNN search + BitPolar reranking
PostgreSQL	`bitpolar-pg`	Native pgrx extension (SQL functions)
DuckDB	`bitpolar_duckdb`	BLOB storage with SQL queries
SQLite	`bitpolar_sqlite_vec`	Zero-dependency embedded vector search
Supabase	`bitpolar_supabase`	Serverless pgvector compression
Neon	`bitpolar_neon`	Serverless Postgres driver

LLM Inference Engines (KV Cache)

Integration	Package	Description
vLLM	`bitpolar_vllm`	KV cache quantizer + DynamicCache
HuggingFace Transformers	`bitpolar_transformers`	Drop-in DynamicCache replacement
llama.cpp	`bitpolar_llamacpp`	KV cache compression
SGLang	`bitpolar_sglang`	RadixAttention cache compression
TensorRT-LLM	`bitpolar_tensorrt`	KV cache quantizer plugin
Ollama	`bitpolar_ollama`	Embedding compression client
ONNX Runtime	`bitpolar_onnx`	Model embedding quantizer
Apple MLX	`bitpolar_mlx`	Apple Silicon quantizer

ML Frameworks

Integration	Package	Description
PyTorch	`bitpolar_torch`	Embedding quantizer, BitPolarLinear, KV cache
PyTorch (native)	`bitpolar_torch_native`	PT2E quantizer backend
JAX/Flax	`bitpolar_jax`	JAX array compression + Flax module
TensorFlow	`bitpolar_tensorflow`	Keras layers for compression
scikit-learn	`bitpolar_sklearn`	TransformerMixin for sklearn pipelines

Cloud & Enterprise

Integration	Package	Description
Spring AI	`BitPolarVectorStore.java`	Java VectorStore for Spring Boot
Vercel AI SDK	`bitpolar_vercel`	Embedding compression middleware
AWS Bedrock	`bitpolar_bedrock`	Titan/Cohere embedding compression
Triton	`bitpolar_triton`	NVIDIA Inference Server backend
gRPC	`bitpolar-server`	Language-agnostic compression service
MCP	`bitpolar_mcp`	AI coding assistant tool server
CLI	`bitpolar-cli`	Command-line compress/search/bench

How It Works

Input f32 vector
    │
    ▼
┌─────────────────┐
│ Random Rotation  │  WHT (O(d log d)) or Haar QR (O(d²))
│                  │  Spreads energy uniformly across coordinates
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   PolarQuant     │  Groups d dims into d/2 pairs → polar coords
│   (Stage 1)      │  Radii: lossless f32 │ Angles: b-bit quantized
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  QJL Residual    │  Sketches reconstruction error
│   (Stage 2)      │  1 sign bit per projection → unbiased correction
└────────┬────────┘
         │
         ▼
TurboCode { polar: PolarCode, residual: QjlSketch }

Inner product estimation: ⟨v, q⟩ ≈ IP_polar(code, q) + IP_qjl(sketch, q)

Parameter Selection

Use Case	Bits	Projections	Notes
Semantic search	4-8	dim/4	Best accuracy for retrieval
KV cache	3-6	dim/8	Memory vs attention quality
Maximum compression	3	dim/16	Still provably unbiased
Lightweight similarity	—	dim/4	QJL standalone (1-bit sketches)

Feature Flags

Feature	Default	Description
`std`	Yes	Standard library (nalgebra QR, full rotation)
`alloc`	No	Heap allocation without std (Vec via alloc crate)
`serde-support`	Yes	Serde serialization for all types
`simd`	No	Hand-tuned NEON/AVX2 kernels
`parallel`	No	Parallel batch operations via rayon
`tracing-support`	No	OpenTelemetry-compatible instrumentation
`ffi`	No	C FFI exports for cross-language bindings

`no_std` Support

BitPolar works on embedded/edge targets with no_std:

[dependencies]
bitpolar = { version = "0.3", default-features = false, features = ["alloc"] }

Uses libm for math functions and alloc for Vec/String. The Walsh-Hadamard rotation is available without std (unlike Haar QR which requires nalgebra).

Traits

BitPolar exposes composable traits for ecosystem integration:

VectorQuantizer — core encode/decode/IP/L2 interface
BatchQuantizer — parallel batch operations (behind parallel feature)
RotationStrategy — pluggable rotation (QR, Walsh-Hadamard, identity)
SerializableCode — compact binary serialization

Examples

30 Python examples + 9 Rust examples + JavaScript, Go, Java examples.

# Rust
cargo run --example search_vector_database
cargo run --example llm_kv_cache

# Python (30 examples covering all 58 integrations)
python examples/python/01_quickstart.py           # Core API
python examples/python/12_pytorch_quantizer.py     # PyTorch integration
python examples/python/13_llamaindex_vectorstore.py # LlamaIndex
python examples/python/14_faiss_dropin.py          # FAISS replacement
python examples/python/18_openai_agents_tool.py    # OpenAI Agents
python examples/python/23_vector_databases.py      # DuckDB, SQLite, etc.
python examples/python/30_complete_rag.py          # End-to-end RAG pipeline

See examples/README.md for the full list.

Performance

Run benchmarks:

cargo bench

References

TurboQuant (ICLR 2026): arXiv 2504.19874
PolarQuant (AISTATS 2026): arXiv 2502.02617
QJL (AAAI 2025): arXiv 2406.03482

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to add a new quantization strategy.

License

Licensed under either of:

MIT License (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

at your option.