Expand description
Resident, bounded file-content cache shared across the search-index build and
ctx_search (issue #148).
Before this module the trigram search_index
build read every file in the corpus to extract trigrams and then threw the
content away, after which ctx_search read the narrowed candidate files
again to run the regex line-by-line — the corpus was read from disk
twice. This cache lets the first reader (whichever it is) populate file
contents once, keyed by absolute path and validated by (mtime, size), and
every subsequent reader reuse them as an in-memory hit.
Correctness: an entry is only ever served when the file’s current
(mtime, size) exactly matches the stored identity, so any edit (which
changes mtime, and usually size) is a guaranteed miss — results can never go
stale. A miss simply falls back to a disk read.
Bounds & safety:
- Total resident bytes are capped (
LEAN_CTX_CONTENT_CACHE_MB, default 128 MB) with approximate-LRU eviction, so a large corpus cannot grow the cache without limit. - Inserts are skipped while the process is under memory pressure, and the
eviction orchestrator can
clearthe cache onUnloadIndices/EmergencyDrop.
Structs§
- Cache
Stats - Observability snapshot:
(hits, misses, entries, bytes, evictions). - File
State - Identity of one file version. A changed mtime or size ⇒ stale ⇒ cache miss.
Mirrors the
(mtime, size)pair the BM25 index already trusts for staleness.
Functions§
- clear
- Drop all entries, freeing the heap. Called by the eviction orchestrator under memory pressure; the cache simply re-warms on subsequent reads.
- get
- Look up
path; returns the cached content only when the supplied current(mtime, size)matches the stored identity. A mismatch evicts the stale entry and reports a miss.stateis passed in (not re-stated) because hot callers already hold the metadata. - get_
or_ read - Read a file through the cache: returns cached content on a fresh hit, else
reads from disk (UTF-8), populates the cache, and returns it.
Noneon a non-UTF-8/unreadable/unstatable file. Convenience for callers without their own size/special-file gating (the search-index build andctx_searchuse the explicitget/insertpair so they keep their own skip rules). - insert
- Insert (or replace) the content for
pathat versionstate. Skipped while the process is under memory pressure or when the cache is disabled, so the cache never adds to a memory problem. - memory_
usage_ bytes - Approximate resident heap used by cached contents, in bytes.
- stats