Skip to main content

Crate turbo_quant

Crate turbo_quant 

Source
Expand description

§turbo-quant

Experimental Rust implementation of TurboQuant, PolarQuant, and QJL profile-defined vector quantization algorithm families.

The crate creates derived compressed sidecars for high-dimensional vectors. Quality is workload-dependent and must be measured with exact fallback gates.

§Key Properties

  • Data-oblivious codec construction: no k-means or trained codebook is built inside the crate. Retrieval quality still depends on the deployment distribution, filters, and workload-specific benchmark gates.
  • Deterministic: identical (dim, bits, seed) always produces the same quantizer. State can be fully reconstructed from four integers.
  • Measured quality: inner product estimates are approximate and retrieval deployments still need recall/rank gates.
  • Instant indexing: unlike Product Quantization, there is no offline training phase. Vectors can be indexed as they arrive.

§Quick Start

use turbo_quant::{TurboQuantizer, PolarQuantizer};

// Compress 1536-dimensional embeddings (OpenAI/sentence-transformer size).
let dim = 64; // use 1536 in production
let q = TurboQuantizer::new(dim, 8, 32, /* seed */ 42).unwrap();

let database_vector: Vec<f32> = vec![0.1; dim]; // your embedding here
let query_vector: Vec<f32> = vec![0.1; dim];    // your query here

// Create a compressed sidecar for the database vector.
let code = q.encode(&database_vector).unwrap();

// At query time: estimate inner product without decompressing.
let score = q.inner_product_estimate(&code, &query_vector).unwrap();

// Or just use PolarQuant for a simpler single-stage compressor.
let pq = PolarQuantizer::new(dim, 8, 42).unwrap();
let polar_code = pq.encode(&database_vector).unwrap();
let polar_score = pq.inner_product_estimate(&polar_code, &query_vector).unwrap();

§Choosing Parameters

Use caseRecommended bitsRecommended projections
Semantic search (recall@10)8dim / 4
KV cache compression4–6dim / 8
Maximum compression3dim / 16

§References

  • TurboQuant-style two-stage polar plus residual sketch compression.
  • Polar-coordinate quantization after seeded rotation.
  • Quantized Johnson-Lindenstrauss sign-projection sketches.

Re-exports§

pub use baseline::ByteAccountingV1;
pub use codebook::ScalarCodebook;
pub use error::Result;
pub use error::TurboQuantError;
pub use eval::BenchmarkComparisonV1;
pub use eval::BenchmarkCorpus;
pub use eval::BenchmarkReceiptV1;
pub use eval::CompressionEvalV1;
pub use index::ScoredCandidate;
pub use index::SearchOptions;
pub use index::SearchReceiptV1;
pub use index::TurboSidecarEntry;
pub use index::TurboSidecarIndex;
pub use kv::AttentionScale;
pub use kv::AttentionScoreOptions;
pub use kv::CompressedToken;
pub use kv::KvCacheCompressor;
pub use kv::KvCacheConfig;
pub use kv::KvMemoryReportV1;
pub use kv::KvQuantPolicy;
pub use kv::KvRuntimeConfig;
pub use kv::KvShadowScore;
pub use kv::KvShadowToken;
pub use packed::PackedPolarCode;
pub use packed::PackedQjlSketch;
pub use packed::PackedTurboCode;
pub use polar::PolarCode;
pub use polar::PolarProjectedQuery;
pub use polar::PolarQuantizer;
pub use profile::CodecProfileV1;
pub use profile::CompressionPolicyV1;
pub use profile::CompressionReceiptV1;
pub use profile::ValidationState;
pub use qjl::QjlProjectedQuery;
pub use qjl::QjlQuantizer;
pub use qjl::QjlSketch;
pub use qjl::QjlSketchProvenanceV1;
pub use radius::CompressedRadiiV1;
pub use radius::RadiusCodecProfileV1;
pub use rotation::FastHadamardRotation;
pub use rotation::Rotation;
pub use rotation::RotationBackend;
pub use rotation::RotationKind;
pub use rotation::StoredRotation;
pub use turbo::BatchStats;
pub use turbo::TurboCode;
pub use turbo::TurboMode;
pub use turbo::TurboProjectedQuery;
pub use turbo::TurboQuantizer;
pub use wire::TurboCodeWireHeader;
pub use wire::TurboCodeWireV1;
pub use wire::TURBO_CODE_WIRE_MAGIC;

Modules§

baseline
Byte accounting helpers for sidecar and shadow-mode receipts.
bitpack
Bitpacking helpers for production codec payloads.
codebook
Small scalar codebook utilities used by quantizer profiles and tests.
error
eval
Evaluation and benchmark receipt data structures.
index
Caller-owned-ID sidecar candidate index.
kv
KV cache compression for transformer attention.
packed
Explicit packed sidecar payloads.
polar
PolarQuant: high-efficiency vector compression via polar coordinate encoding.
profile
Stable codec profiles, compression policies, and receipts.
qjl
Quantized Johnson-Lindenstrauss (QJL) transform for approximate inner product estimation.
radius
Radius compression profiles for packed sidecar payloads.
rotation
Random rotation matrices for whitening high-dimensional vectors before quantization.
turbo
TurboQuant: profile-selected vector compression using PolarQuant with optional QJL.
wire
Deterministic compact wire encoding for TurboCode.