Expand description
§turbo-quant
Experimental Rust implementation of TurboQuant, PolarQuant, and QJL profile-defined vector quantization algorithm families.
The crate creates derived compressed sidecars for high-dimensional vectors. Quality is workload-dependent and must be measured with exact fallback gates.
§Key Properties
- Data-oblivious codec construction: no k-means or trained codebook is built inside the crate. Retrieval quality still depends on the deployment distribution, filters, and workload-specific benchmark gates.
- Deterministic: identical
(dim, bits, seed)always produces the same quantizer. State can be fully reconstructed from four integers. - Measured quality: inner product estimates are approximate and retrieval deployments still need recall/rank gates.
- Instant indexing: unlike Product Quantization, there is no offline training phase. Vectors can be indexed as they arrive.
§Quick Start
use turbo_quant::{TurboQuantizer, PolarQuantizer};
// Compress 1536-dimensional embeddings (OpenAI/sentence-transformer size).
let dim = 64; // use 1536 in production
let q = TurboQuantizer::new(dim, 8, 32, /* seed */ 42).unwrap();
let database_vector: Vec<f32> = vec![0.1; dim]; // your embedding here
let query_vector: Vec<f32> = vec![0.1; dim]; // your query here
// Create a compressed sidecar for the database vector.
let code = q.encode(&database_vector).unwrap();
// At query time: estimate inner product without decompressing.
let score = q.inner_product_estimate(&code, &query_vector).unwrap();
// Or just use PolarQuant for a simpler single-stage compressor.
let pq = PolarQuantizer::new(dim, 8, 42).unwrap();
let polar_code = pq.encode(&database_vector).unwrap();
let polar_score = pq.inner_product_estimate(&polar_code, &query_vector).unwrap();§Choosing Parameters
| Use case | Recommended bits | Recommended projections |
|---|---|---|
| Semantic search (recall@10) | 8 | dim / 4 |
| KV cache compression | 4–6 | dim / 8 |
| Maximum compression | 3 | dim / 16 |
§References
- TurboQuant-style two-stage polar plus residual sketch compression.
- Polar-coordinate quantization after seeded rotation.
- Quantized Johnson-Lindenstrauss sign-projection sketches.
Re-exports§
pub use baseline::ByteAccountingV1;pub use codebook::ScalarCodebook;pub use error::Result;pub use error::TurboQuantError;pub use eval::BenchmarkComparisonV1;pub use eval::BenchmarkCorpus;pub use eval::BenchmarkReceiptV1;pub use eval::CompressionEvalV1;pub use index::ScoredCandidate;pub use index::SearchOptions;pub use index::SearchReceiptV1;pub use index::TurboSidecarEntry;pub use index::TurboSidecarIndex;pub use kv::AttentionScale;pub use kv::AttentionScoreOptions;pub use kv::CompressedToken;pub use kv::KvCacheCompressor;pub use kv::KvCacheConfig;pub use kv::KvMemoryReportV1;pub use kv::KvQuantPolicy;pub use kv::KvRuntimeConfig;pub use kv::KvShadowScore;pub use kv::KvShadowToken;pub use packed::PackedPolarCode;pub use packed::PackedQjlSketch;pub use packed::PackedTurboCode;pub use polar::PolarCode;pub use polar::PolarProjectedQuery;pub use polar::PolarQuantizer;pub use profile::CodecProfileV1;pub use profile::CompressionPolicyV1;pub use profile::CompressionReceiptV1;pub use profile::ValidationState;pub use qjl::QjlProjectedQuery;pub use qjl::QjlQuantizer;pub use qjl::QjlSketch;pub use qjl::QjlSketchProvenanceV1;pub use radius::CompressedRadiiV1;pub use radius::RadiusCodecProfileV1;pub use rotation::FastHadamardRotation;pub use rotation::Rotation;pub use rotation::RotationBackend;pub use rotation::RotationKind;pub use rotation::StoredRotation;pub use turbo::BatchStats;pub use turbo::TurboCode;pub use turbo::TurboMode;pub use turbo::TurboProjectedQuery;pub use turbo::TurboQuantizer;pub use wire::TurboCodeWireHeader;pub use wire::TurboCodeWireV1;pub use wire::TURBO_CODE_WIRE_MAGIC;
Modules§
- baseline
- Byte accounting helpers for sidecar and shadow-mode receipts.
- bitpack
- Bitpacking helpers for production codec payloads.
- codebook
- Small scalar codebook utilities used by quantizer profiles and tests.
- error
- eval
- Evaluation and benchmark receipt data structures.
- index
- Caller-owned-ID sidecar candidate index.
- kv
- KV cache compression for transformer attention.
- packed
- Explicit packed sidecar payloads.
- polar
- PolarQuant: high-efficiency vector compression via polar coordinate encoding.
- profile
- Stable codec profiles, compression policies, and receipts.
- qjl
- Quantized Johnson-Lindenstrauss (QJL) transform for approximate inner product estimation.
- radius
- Radius compression profiles for packed sidecar payloads.
- rotation
- Random rotation matrices for whitening high-dimensional vectors before quantization.
- turbo
- TurboQuant: profile-selected vector compression using PolarQuant with optional QJL.
- wire
- Deterministic compact wire encoding for
TurboCode.