Skip to main content

Module cache

Module cache 

Source
Expand description

Bounded adjacency cache for graph traversal optimization.

This module provides a bounded, thread-safe cache of adjacency lists used by crate::graph::SqliteGraph to avoid re-querying SQLite for neighbor sets that were recently read. It sits on the hot read path exercised by every adjacency lookup (see SqliteGraphBackend::neighborsfetch_outgoing → this cache), so its shape is performance-relevant.

§What this is

  • Bounded. Backed by the lru crate’s LruCache with a fixed capacity (default DEFAULT_CACHE_CAPACITY). When the capacity is exceeded an entry is evicted, so the cache’s memory footprint is bounded by capacity, not by the graph size. The hit path uses peek (a shared read lock) rather than get (an exclusive write lock that would refresh access recency), so effective eviction order is insertion-order (FIFO), not access-order (LRU). This is a deliberate tradeoff — see AdjacencyCache::get — chosen to keep the hit path on a cheap read lock (matching the old AHashMap cost profile) while preserving the correctness property that matters: bounded memory.
  • Shared values via Arc. Values are stored as Arc<Vec<i64>>. A cache hit hands the caller a cheaply-cloned Arc (one atomic refcount bump) — it never copies the Vec<i64>. This matters for high-degree “hub” nodes (call sites, definition hubs): copying a degree-1000 adjacency list on every hit dominated the read path before; an Arc clone is O(1).
  • Thread-safe. The LruCache is guarded by a parking_lot::RwLock. A hit takes a shared read lock (peek); an insert takes an exclusive write lock. The lock is not contended because writes (cache invalidation) are rare relative to reads.

§What this is not (historical note)

An earlier revision of this module documented an “LRU-K (K=2)” policy with “traversal-score tracking” and “high-degree pinning”. That policy was never implemented — the struct was an unbounded AHashMap with no eviction, no capacity, and no scoring, that cloned the whole Vec<i64> on every hit. The doc described an aspirational design the code did not match. This revision replaces both the code and the doc with a real, bounded cache, which is what runs here. A simple bound is used because the cache is invalidated wholesale on every write (see crate::graph::SqliteGraph::invalidate_caches), so sophisticated recency policies add little beyond a bound; the bound itself is the correctness property that was missing.

§Invalidation

Adjacency data is invalidated wholesale whenever the graph is mutated (edge insert, entity change, recovery, transaction commit) via crate::graph::SqliteGraph::invalidate_caches. This guarantees read-after-write consistency: a stale adjacency list can never be returned after an edge is added or removed. The wholesale-clear model is why the cache does not need per-entry write invalidation.

§Performance characteristics (measured)

Measured 2026-06-19 against a synthetic hub/leaf graph (benches/substrate_readpath.rs):

operationbefore (AHashMap + Vec clone)after (Arc, peek)
cache hit, degree 10~17.6 ns~15 ns
cache hit, degree 100~24.4 ns~13 ns
cache hit, degree 1000~131 ns (Vec copy dominated)~13 ns
cache miss~12.7 µs (SQLite)~12.7 µs (SQLite)
BFS traversal (consumer path)baseline-4 %

The remaining cost floor on a hit is the RwLock acquire/release pair (~13 ns). Eliminating it requires a lock-free container and is tracked as a follow-up; it is independent of the bounded-eviction and zero-copy-hit work shipped here.

Structs§

AdjacencyCache
Bounded, thread-safe LRU cache of adjacency lists.
CacheStats

Constants§

DEFAULT_CACHE_CAPACITY
Default maximum number of adjacency lists held in one AdjacencyCache.