scr-runtime-compression
Runtime integration adapter for semantic-memory's compression
layer.
scr-runtime-compression is the runtime adapter that lets
semantic-memory use turbo-quant and fib-quant without
taking a hard dependency on either. The two key types are:
CompressedSearchPath— a search path that uses a compressed candidate index (fromturbo-quantorfib-quant) followed by exact rerank against the raw vectors. This is the production-mode path for memory recall with a compressible corpus.ExactFallbackAdapter— a typed wrapper that takes any compressed representation and a raw-fallback, and always returns the raw result. The contract: the adapter emits aFallbackReceiptV1on every call, so the audit trail captures which path served the request.
The crate is alpha. The runtime adapter works but the GPU path is gated off-by-default because the per-call H2D/D2H overhead negates the kernel speedup at the current call granularity.
What's in the box
CompressedSearchPath
The oversample factor is the key control: higher oversample
gives better recall at the cost of more rerank work. The
default is 4 (matches the turbo-quant smoke benchmark setup).
ExactFallbackAdapter
The adapter's contract:
- If the compressed index is admissible for the query
(size, accuracy, latency), it serves the result from the
compressed index and emits a
FallbackReceiptV1 { path: "compressed" }. - If the compressed index is not admissible (e.g. caller
asked for
Admissibility::Exact), it serves from the raw store and emits aFallbackReceiptV1 { path: "raw" }. - Every call emits exactly one receipt. The audit trail records the path taken.
Feature flags
| Feature | Default | What it enables |
|---|---|---|
turbo |
yes | turbo-quant codec adapter |
fib |
yes | fib-quant codec adapter |
polar |
yes | Polar-only compression (asymmetric) |
qjl |
yes | QJL sketches for residual recovery |
gpu |
no | GPU dispatch via gpu-backend |
The default is ["turbo", "fib", "polar", "qjl"] — all four
codecs available, no GPU (because the GPU path is slower in
integration at this time).
Quick Start
use ;
use ;
use ;
Run it: cargo run --example basic_search (see examples/).
Test coverage
- Exact-fallback contract tests — every adapter call emits exactly one receipt, and the path recorded matches the actual path taken.
- Oversample sweep tests — k=10 with oversample = 1, 4, 16, 64; assert recall@10 and rerank-cost.
- Admissibility routing — caller says
Admissibility::Exact, the adapter routes to raw; caller saysAdmissibility::Approximate, the adapter routes to compressed. cargo test --all-featuresclean.cargo clippy --all-targets -- -D warningsclean.
MSRV
Rust 1.75 (2021 edition). Stable features only.
Dependencies
bytemuck(withderive) — for safe zero-copy codec output.serde(withderive).serde_json.thiserror.chrono(for receipt timestamps).quant-governor— for the policy routing layer.turbo-quant(optional) — for theturbofeature.fib-quant(optional) — for thefibfeature.
License
MIT. See LICENSE-MIT for the full text.
Changelog
See CHANGELOG.md for the release history.
Where it's used
scr-runtime-compression is the integration layer for:
semantic-memory— every recall over a corpus withAdmissibility::Standardor below routes throughCompressedSearchPath.- The
quant-governorpolicy engine — when the policy routes to a compressed codec, theExactFallbackAdapteris the one that actually executes the call.
Any system that wants to add governed compression to an
existing search path can adopt scr-runtime-compression
directly.