Skip to main content

Crate lean_semantic_search_store

Crate lean_semantic_search_store 

Source
Expand description

Persistent SQLite-backed semantic index implementing the retrieval Corpus seam.

This crate is the large-corpus counterpart to the in-memory inverted index in lean-semantic-search-retrieval. It owns the semantic index only: opaque-key postings, per-key fanout, the document total, and the contract DeclarationFeatureRows needed to rebuild an anchor from a corpus member. It carries no declaration display text, module or kind fields, provenance, labels, probe caches, or any duplicate-audit or proof-agent vocabulary — those stay with consumers.

Build a corpus with StoreBuilder, publishing it atomically; open it read-only with Store, which implements Corpus so retrieval ranks over a persisted index without loading it into memory. The ranking algorithm, anchor planning, policy, and output shape are unchanged: a Store is just another Corpus, and retrieve_across fans one anchor across several of them.

Reuse is gated by Store::open_fresh, which accepts a persisted corpus only on a matching opaque corpus_token and matching versions and reports every mismatch or corruption as a CacheMiss rather than an error. The neutral set_latest/cleanup primitives manage content-addressed corpus directories and the atomic latest-pointer the caller resolves.

See docs/architecture/05-sqlite-store.md for the schema and the read/write design, and docs/architecture/06-cache-lifecycle.md for the freshness contract and the lifecycle primitives.

Structs§

CleanupEntry
One corpus directory a cleanup classified.
CleanupReport
What a cleanup found and, if executed, did.
Store
A persisted corpus opened read-only.
StoreBuilder
Builds a persisted corpus, then publishes it atomically.

Enums§

CacheMiss
Why a persisted corpus cannot be reused for a request. Every variant is a cache miss that tells the caller to rebuild — never a transport error.
CleanupMode
Whether a cleanup reports its plan or carries it out.
CorpusLookup
The outcome of an open-or-reject: either a usable corpus or the reason it was rejected.
Ingest
One item in the build stream.
StoreError
An error from building or opening a persisted corpus.

Constants§

STORE_SCHEMA_VERSION
The store’s own on-disk schema identity. Bumped when the table layout changes. Stored in metadata and verified on open.

Functions§

cleanup
Remove every corpus directory under root except those the caller still wants and the one the latest pointer targets.
corpus_dir
The directory a corpus with content address name occupies under root. The caller owns the name; the store never parses it.
index_path
The index file a builder writes to (and a reader opens) for corpus name. Pass it to StoreBuilder::create and Store::open_fresh.
latest_index_path
The index path the latest pointer under root resolves to, ready to open, or None if nothing is published.
latest_name
The content address the latest pointer under root names, or None if no pointer is published or it is unreadable.
open_latest_fresh
Resolve the latest published corpus under root and open-or-reject it against expected_token.
set_latest
Atomically publish corpus name as the latest under root.