Module semantic_cache

Expand description

Semantic cache for LLM responses (ML Feature 3, MVP).

Caches (prompt, response) pairs keyed by the prompt’s embedding vector. A lookup returns the cached response when any entry has cosine similarity to the query embedding ≥ the caller’s threshold and has not expired.

This is a linear-scan implementation — fine for caches up to ~10k entries. A future sprint swaps the scan for the existing HNSW index when entry counts make that worth the added complexity. The external surface stays the same.

Eviction is lazy: lookup drops any expired entry it skips over, and [Self::evict_expired] sweeps the whole set on demand. Bounded size is enforced on insert via LRU-by-age (oldest entries dropped first once max_entries is reached).

Structs§

SemanticCache: The cache itself. Cloning shares state via the inner Arc.
SemanticCacheConfig: Compile-time tuneables for a cache instance.
SemanticCacheEntry: One cached entry.
SemanticCacheStats: Runtime statistics — exposed via SELECT * FROM ML_CACHE_STATS later, and useful in tests now.