Skip to main content

Module cache

Module cache 

Source
Expand description

LRU-K adjacency cache for graph traversal optimization.

This module provides a traversal-aware caching system designed specifically for graph workloads. It uses an LRU-K eviction policy with K=2, which distinguishes between sequential access patterns (graph traversals) and random access (node lookups).

§Cache Design

§LRU-K Eviction (K=2)

Traditional LRU evicts based on the most recent access, which can lead to poor cache behavior for graph traversals where nodes are visited in predictable patterns. LRU-K tracks the K most recent accesses and uses the K-th most recent access (correlated reference) for eviction decisions:

  • K=1: Traditional LRU (evicts least recently used)
  • K=2: Correlated reference frequency (ideal for traversals)
  • K>2: Too much overhead for marginal benefit

For K=2, nodes visited repeatedly in traversals get 2nd-last access timestamps that prevent eviction, while one-off lookups get evicted quickly.

§Traversal Score Tracking

The cache tracks traversal patterns by computing a score for each node:

  • Sequential hits (node visited again in same traversal) → higher score
  • Random access (isolated lookups) → lower score
  • High-degree nodes protected from eviction (cache pinning)

This prevents cache pollution from random queries while preserving traversal working sets.

§Cache Invalidation Policies

§Insert-Based Invalidation

When edges are inserted, affected adjacency lists are automatically invalidated:

graph.insert_edge(node1, "CONNECTS", node2, vec![])?;
// Cache for node1 and node2 is automatically invalidated

This ensures read-after-write consistency - you never see stale edges after inserting new ones.

§Manual Invalidation

For advanced use cases, manual cache control is available:

use sqlitegraph::cache::AdjacencyCache;

let cache = AdjacencyCache::new();

// Clear all cache entries
cache.clear();

// Remove specific node from cache
cache.remove(node_id);

§Performance Characteristics

§Cache Hit Ratio

For BFS/DFS traversal workloads:

  • Expected hit ratio: 95%+ for depth-first traversals
  • BFS hit ratio: 80-95% depending on branching factor
  • Random access: 20-40% (limited benefit for pure lookups)

§Memory Overhead

  • Per-entry: O(degree) for stored adjacency lists
  • Metadata: ~32 bytes per entry (timestamps, scores)
  • Total overhead: ~10-20% additional memory vs uncached

§Latency Impact

  • Cache hit: ~100ns (hash lookup + clone)
  • Cache miss: ~10-100μs (SQLite query + cache insert)
  • Speedup: 100-1000x for cached accesses

§When to Use Cache

§Good Cache Workloads

  • Graph traversals: BFS, DFS, k-hop queries
  • Shortest path: Repeated neighbor access
  • PageRank: Iterative neighbor iteration
  • Community detection: Repeated local queries

§Poor Cache Workloads

  • Random node lookups: No repeated access pattern
  • Write-heavy workloads: Frequent invalidation
  • One-shot queries: No benefit for single accesses
  • Large graphs with low locality: Cache thrashing

§Usage Examples

The cache is automatically enabled in SqliteGraph for traversal operations:

use sqlitegraph::{SqliteGraph, algo::connected_components};

let graph = SqliteGraph::open("my_graph.db")?;

// BFS traversal uses cache automatically
let neighbors = graph.fetch_outgoing(node_id)?;

// Algorithm with repeated traversal benefits from cache
let components = connected_components(&graph)?;

For custom caching, use AdjacencyCache directly:

use sqlitegraph::cache::AdjacencyCache;

let cache = AdjacencyCache::new();

// Cache miss - loads from database
let neighbors = cache.get(node_id)
    .unwrap_or_else(|| load_neighbors_from_db(node_id));

// Cache hit - returns stored neighbors
let neighbors = cache.get(node_id)?;

Structs§

AdjacencyCache
CacheStats