Module semantic_cache

Expand description

Semantic Query Caching with Predicate Subsumption

This module implements intelligent query result caching that goes beyond simple string matching. It detects when a new query’s results can be computed by filtering cached results from a previous query, without re-executing against storage.

§Key Insight

If Query A’s predicate P_A is LESS RESTRICTIVE than Query B’s predicate P_B, then B’s results are a SUBSET of A’s results:

Query A: SELECT * FROM orders WHERE amount > 100  (cached: 500 rows)
Query B: SELECT * FROM orders WHERE amount > 150  (new query)

Since amount > 150 is STRICTER than amount > 100,
B's results ⊆ A's results

Instead of scanning storage: Filter A's cached 500 rows → B's results

§Supported Subsumption Patterns

Numeric Range Tightening:
- col > 100 (cached) → col > 150 (new): Filter cached
- col < 500 (cached) → col < 300 (new): Filter cached
- col BETWEEN 100 AND 500 (cached) → col BETWEEN 200 AND 400 (new): Filter cached
Equality Subset:
- col IN (1,2,3,4,5) (cached) → col IN (2,3) (new): Filter cached
AND Conjunction Strengthening:
- A (cached) → A AND B (new): Filter cached
- A AND B (cached) → A AND B AND C (new): Filter cached

§Not Supported (Triggers Re-execution)

OR predicates (may expand result set)
Different tables
Different column sets
Non-comparable predicates (LIKE, function calls)

§Transaction Isolation Considerations

Important: The semantic cache is currently global and does not account for MVCC transaction isolation. This means:

Cache entries are shared across all transactions
A transaction might see cached results from another transaction’s read
Cache invalidation on DML ensures committed changes are reflected

This is safe for:

Single-connection usage
Read-only workloads
Scenarios where eventual consistency is acceptable

For strict serializable isolation with concurrent writes, consider:

Disabling the cache during critical transactions
Using explicit cache invalidation between operations

Future enhancement: Per-transaction cache scoping with timestamp-based invalidation

Structs§

CachedResult: Cached query result with metadata
QueryFingerprint: Fingerprint for a cacheable query pattern
SemanticCache: Semantic Query Cache
SemanticCacheStats: Statistics for the semantic cache (lock-free with atomics)
SemanticCacheStatsSnapshot: Snapshot of cache statistics (plain values for reading)

Enums§

CacheLookupResult: Result of a cache lookup
SubsumptionResult: Subsumption relationship between predicates

Constants§

DEFAULT_CACHE_TTL_SECS: Time-to-live for cached results in seconds.
DEFAULT_MAX_CACHED_ROWS: Maximum number of rows to cache per query result.
DEFAULT_MAX_GLOBAL_CACHED_ROWS: Global maximum total rows across all cache entries.
DEFAULT_SEMANTIC_CACHE_SIZE: Maximum number of cached query results per table+column combination.

Functions§

check_subsumption: Check if new_predicate is subsumed by cached_predicate