scr-runtime-compression 0.1.0

Runtime integration adapter for semantic-memory compression layer — CompressedSearchPath and ExactFallbackAdapter delegates to turbo-quant/fib-quant
Documentation

scr-runtime-compression

Runtime integration adapter for semantic-memory's compression layer.

scr-runtime-compression is the runtime adapter that lets semantic-memory use turbo-quant and fib-quant without taking a hard dependency on either. The two key types are:

  • CompressedSearchPath — a search path that uses a compressed candidate index (from turbo-quant or fib-quant) followed by exact rerank against the raw vectors. This is the production-mode path for memory recall with a compressible corpus.
  • ExactFallbackAdapter — a typed wrapper that takes any compressed representation and a raw-fallback, and always returns the raw result. The contract: the adapter emits a FallbackReceiptV1 on every call, so the audit trail captures which path served the request.

The crate is alpha. The runtime adapter works but the GPU path is gated off-by-default because the per-call H2D/D2H overhead negates the kernel speedup at the current call granularity.

What's in the box

CompressedSearchPath

pub struct CompressedSearchPath {
    compressed_index: Box<dyn CompressedIndex>,
    raw_corpus: Vec<Vec<f32>>,
    profile_digest: CodecProfileDigest,
}

impl CompressedSearchPath {
    pub fn search(&self, query: &[f32], k: usize) -> Result<SearchResult, CompressionError> {
        // 1. Get top-k * oversample candidates from compressed index
        // 2. Exact rerank on raw_corpus
        // 3. Return reranked top-k with a Receipt
    }
}

The oversample factor is the key control: higher oversample gives better recall at the cost of more rerank work. The default is 4 (matches the turbo-quant smoke benchmark setup).

ExactFallbackAdapter

pub struct ExactFallbackAdapter<C: CompressedIndex, R: RawStore> {
    compressed: C,
    raw: R,
    // Emits FallbackReceiptV1 on every call
}

The adapter's contract:

  • If the compressed index is admissible for the query (size, accuracy, latency), it serves the result from the compressed index and emits a FallbackReceiptV1 { path: "compressed" }.
  • If the compressed index is not admissible (e.g. caller asked for Admissibility::Exact), it serves from the raw store and emits a FallbackReceiptV1 { path: "raw" }.
  • Every call emits exactly one receipt. The audit trail records the path taken.

Feature flags

Feature Default What it enables
turbo yes turbo-quant codec adapter
fib yes fib-quant codec adapter
polar yes Polar-only compression (asymmetric)
qjl yes QJL sketches for residual recovery
gpu no GPU dispatch via gpu-backend

The default is ["turbo", "fib", "polar", "qjl"] — all four codecs available, no GPU (because the GPU path is slower in integration at this time).

Quick Start

use scr_runtime_compression::{CompressedSearchPath, ExactFallbackAdapter};
use turbo_quant::{TurboSidecarCode, TurboSidecarIndex};
use quant_codec_core::{KvTensorShape, DType};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build a compressed index.
    let code = TurboSidecarCode::encode(&profile, &corpus)?;
    let index = TurboSidecarIndex::build(&profile, code)?;

    // Wrap it in the search path.
    let path = CompressedSearchPath::new(index, corpus.clone(), profile_digest);

    // Search.
    let result = path.search(&query, 10)?;
    assert_eq!(result.results.len(), 10);
    println!("Receipt: {:?}", result.receipt);
    Ok(())
}

Run it: cargo run --example basic_search (see examples/).

Test coverage

  • Exact-fallback contract tests — every adapter call emits exactly one receipt, and the path recorded matches the actual path taken.
  • Oversample sweep tests — k=10 with oversample = 1, 4, 16, 64; assert recall@10 and rerank-cost.
  • Admissibility routing — caller says Admissibility::Exact, the adapter routes to raw; caller says Admissibility::Approximate, the adapter routes to compressed.
  • cargo test --all-features clean.
  • cargo clippy --all-targets -- -D warnings clean.

MSRV

Rust 1.75 (2021 edition). Stable features only.

Dependencies

  • bytemuck (with derive) — for safe zero-copy codec output.
  • serde (with derive).
  • serde_json.
  • thiserror.
  • chrono (for receipt timestamps).
  • quant-governor — for the policy routing layer.
  • turbo-quant (optional) — for the turbo feature.
  • fib-quant (optional) — for the fib feature.

License

MIT. See LICENSE-MIT for the full text.

Changelog

See CHANGELOG.md for the release history.

Where it's used

scr-runtime-compression is the integration layer for:

  • semantic-memory — every recall over a corpus with Admissibility::Standard or below routes through CompressedSearchPath.
  • The quant-governor policy engine — when the policy routes to a compressed codec, the ExactFallbackAdapter is the one that actually executes the call.

Any system that wants to add governed compression to an existing search path can adopt scr-runtime-compression directly.