Expand description
Persistent SQLite-backed semantic index implementing the retrieval
Corpus seam.
This crate is the large-corpus counterpart to the in-memory inverted index
in lean-semantic-search-retrieval. It owns the semantic index only:
opaque-key postings, per-key fanout, the document total, and the contract
DeclarationFeatureRows
needed to rebuild an anchor from a corpus member. It carries no declaration
display text, module or kind fields, provenance, labels, probe caches, or any
duplicate-audit or proof-agent vocabulary — those stay with consumers.
Build a corpus with StoreBuilder, publishing it atomically; open it
read-only with Store, which implements Corpus so retrieval ranks over a
persisted index without loading it into memory. The ranking algorithm, anchor
planning, policy, and output shape are unchanged: a Store is just another
Corpus, and retrieve_across fans one anchor across several of them.
Reuse is gated by Store::open_fresh, which accepts a persisted corpus only
on a matching opaque corpus_token and matching versions and reports every
mismatch or corruption as a CacheMiss rather than an error. The neutral
set_latest/cleanup primitives manage content-addressed corpus
directories and the atomic latest-pointer the caller resolves.
See docs/architecture/05-sqlite-store.md for the schema and the read/write
design, and docs/architecture/06-cache-lifecycle.md for the freshness
contract and the lifecycle primitives.
Structs§
- Cleanup
Entry - One corpus directory a
cleanupclassified. - Cleanup
Report - What a
cleanupfound and, if executed, did. - Store
- A persisted corpus opened read-only.
- Store
Builder - Builds a persisted corpus, then publishes it atomically.
Enums§
- Cache
Miss - Why a persisted corpus cannot be reused for a request. Every variant is a cache miss that tells the caller to rebuild — never a transport error.
- Cleanup
Mode - Whether a
cleanupreports its plan or carries it out. - Corpus
Lookup - The outcome of an open-or-reject: either a usable corpus or the reason it was rejected.
- Ingest
- One item in the build stream.
- Store
Error - An error from building or opening a persisted corpus.
Constants§
- STORE_
SCHEMA_ VERSION - The store’s own on-disk schema identity. Bumped when the table layout
changes. Stored in
metadataand verified on open.
Functions§
- cleanup
- Remove every corpus directory under
rootexcept those the caller still wants and the one thelatestpointer targets. - corpus_
dir - The directory a corpus with content address
nameoccupies underroot. The caller owns the name; the store never parses it. - index_
path - The index file a builder writes to (and a reader opens) for corpus
name. Pass it toStoreBuilder::createandStore::open_fresh. - latest_
index_ path - The index path the
latestpointer underrootresolves to, ready to open, orNoneif nothing is published. - latest_
name - The content address the
latestpointer underrootnames, orNoneif no pointer is published or it is unreadable. - open_
latest_ fresh - Resolve the latest published corpus under
rootand open-or-reject it againstexpected_token. - set_
latest - Atomically publish corpus
nameas the latest underroot.