Skip to main content

Module content_cache

Module content_cache 

Source
Expand description

Resident, bounded file-content cache shared across the search-index build and ctx_search (issue #148).

Before this module the trigram search_index build read every file in the corpus to extract trigrams and then threw the content away, after which ctx_search read the narrowed candidate files again to run the regex line-by-line — the corpus was read from disk twice. This cache lets the first reader (whichever it is) populate file contents once, keyed by absolute path and validated by (mtime, size), and every subsequent reader reuse them as an in-memory hit.

Correctness: an entry is only ever served when the file’s current (mtime, size) exactly matches the stored identity, so any edit (which changes mtime, and usually size) is a guaranteed miss — results can never go stale. A miss simply falls back to a disk read.

Bounds & safety:

  • Total resident bytes are capped (LEAN_CTX_CONTENT_CACHE_MB, default 128 MB) with approximate-LRU eviction, so a large corpus cannot grow the cache without limit.
  • Inserts are skipped while the process is under memory pressure, and the eviction orchestrator can clear the cache on UnloadIndices / EmergencyDrop.

Structs§

CacheStats
Observability snapshot: (hits, misses, entries, bytes, evictions).
FileState
Identity of one file version. A changed mtime or size ⇒ stale ⇒ cache miss. Mirrors the (mtime, size) pair the BM25 index already trusts for staleness.

Functions§

clear
Drop all entries, freeing the heap. Called by the eviction orchestrator under memory pressure; the cache simply re-warms on subsequent reads.
get
Look up path; returns the cached content only when the supplied current (mtime, size) matches the stored identity. A mismatch evicts the stale entry and reports a miss. state is passed in (not re-stated) because hot callers already hold the metadata.
get_or_read
Read a file through the cache: returns cached content on a fresh hit, else reads from disk (UTF-8), populates the cache, and returns it. None on a non-UTF-8/unreadable/unstatable file. Convenience for callers without their own size/special-file gating (the search-index build and ctx_search use the explicit get/insert pair so they keep their own skip rules).
insert
Insert (or replace) the content for path at version state. Skipped while the process is under memory pressure or when the cache is disabled, so the cache never adds to a memory problem.
memory_usage_bytes
Approximate resident heap used by cached contents, in bytes.
stats