turbo-quant
turbo-quant is an experimental Rust crate for derived vector-compression sidecars inspired by TurboQuant, PolarQuant, and Quantized Johnson-Lindenstrauss (QJL) style sketches.
It is designed for systems that keep canonical vectors elsewhere, then use compact sidecars for candidate generation, memory accounting, compression experiments, and exact-rerank workflows. It is not a canonical vector store, not a replacement for exact vectors, and approximate scores are not ground truth.
What this crate is
- A deterministic vector sidecar codec for embedding/search experiments.
- A PolarQuant-style compressor with optional QJL residual sketches.
- A compact sidecar index that returns approximate candidates plus an explicit search receipt.
- A KV-cache shadow-mode experiment surface for measuring compressed key/value behavior.
- A source-compatible update over the original
0.1.xAPI surface.
What this crate is not
- It is not a canonical vector store.
- It is not a replacement for exact vectors in correctness-sensitive retrieval.
- It is not a reversible compression library.
- It does not guarantee quality for every corpus, model, embedding distribution, or KV-cache workload.
- It should not be promoted into a production retrieval path without local benchmark gates and exact fallback.
The safe integration pattern is:
canonical vectors / raw KV state
+
derived turbo-quant sidecars
-> approximate candidate generation
-> exact rerank / exact fallback
-> measured promotion decision
Installation
[]
= "0.2"
Minimum supported Rust version: 1.75.
Quick start: encode and score one vector
use TurboQuantizer;
Sidecar candidate search
TurboSidecarIndex is intentionally a sidecar index. It returns approximate candidates and a receipt that declares exact rerank is required.
use ;
After candidate generation, rerank candidates against caller-owned exact vectors or a trusted exact scorer.
KV-cache shadow mode
KV-cache compression is exposed as an experiment surface. Keep exact shadows while measuring quality before promotion.
use ;
API compatibility
0.2.x preserves the original public compatibility surface from 0.1.x while adding new sidecar, wire, receipt, and runtime-policy APIs.
Compatibility-preserved examples include:
PolarCode { dim, bits, radii, angle_indices }QjlSketch { dim, projections, signs }TurboCode { polar_code, residual_sketch }KvCacheConfig { head_dim, bits, projections, seed }CompressedToken { compressed_key, compressed_value }- legacy constructors such as
PolarQuantizer::new,QjlQuantizer::new,TurboQuantizer::new, andKvCacheCompressor::new
The new APIs are additive and should be treated as the preferred integration surface for measured sidecar workflows.
Release honesty
Compressed codes are derived artifacts. Any quality-sensitive use should preserve:
- the source vector or exact KV state,
- the codec profile,
- benchmark receipts for the target workload,
- exact rerank or exact fallback, and
- clear degradation behavior when approximation is insufficient.
Do not treat approximate scores as ground truth.
Testing before release
The release gate for this crate is intentionally strict:
This repository includes release helper scripts under scripts/ that run these gates, validate this README, validate crates.io package scope, and write local release receipts.
Feature and design notes
- Rotation/profile selection is deterministic from explicit parameters.
- Packed/wire representations are acceleration artifacts, not truth-bearing storage.
SearchReceiptV1makes approximate-only candidate generation explicit.CompressionReceiptV1records byte accounting and warnings for derived sidecars.- KV-cache support is shadow-mode-first and should remain benchmark-gated.
License
MIT