Expand description
Semantic cache for LLM responses (ML Feature 3, MVP).
Caches (prompt, response) pairs keyed by the prompt’s embedding
vector. A lookup returns the cached response when any entry has
cosine similarity to the query embedding ≥ the caller’s threshold
and has not expired.
This is a linear-scan implementation — fine for caches up to ~10k entries. A future sprint swaps the scan for the existing HNSW index when entry counts make that worth the added complexity. The external surface stays the same.
Eviction is lazy: lookup drops any expired entry it skips
over, and [Self::evict_expired] sweeps the whole set on
demand. Bounded size is enforced on insert via LRU-by-age
(oldest entries dropped first once max_entries is reached).
Structs§
- Semantic
Cache - The cache itself. Cloning shares state via the inner
Arc. - Semantic
Cache Config - Compile-time tuneables for a cache instance.
- Semantic
Cache Entry - One cached entry.
- Semantic
Cache Stats - Runtime statistics — exposed via
SELECT * FROM ML_CACHE_STATSlater, and useful in tests now.