Skip to main content

Module ann_cache

Module ann_cache 

Source
Expand description

Process-wide cache for the HNSW AnnIndex used by dense semantic search.

Building an HNSW graph is O(n log n) with a wide construction beam, so doing it per query would be slower than brute force. This cache keeps one built index keyed by a content fingerprint of the embedding set: repeated queries over the same corpus reuse the graph and get sub-linear search, while a changed corpus (different fingerprint) transparently triggers a rebuild.

It is threshold-gated — corpora below ANN_MIN_VECTORS skip the cache and use exact SIMD brute-force top-k, which is both faster (no graph overhead) and exact. The threshold is deliberately high: at lean-ctx’s typical scale (a few thousand chunks) exact brute force over int8/SIMD dot products is only ~1-2 ms, so HNSW’s approximate recall is not worth trading. HNSW activates only for genuinely large corpora where exact scan would dominate latency. On any lock failure it falls back to brute force, so correctness never depends on the cache being available.

Constants§

ANN_MIN_VECTORS
Minimum corpus size before an HNSW graph is worth building and caching. Below this, exact SIMD brute force is faster and exact (no recall loss).

Functions§

topk
Returns the top-k (index, similarity) pairs for query over embeddings, sorted by descending similarity.