Expand description
Semantic Query Caching with Predicate Subsumption
This module implements intelligent query result caching that goes beyond simple string matching. It detects when a new query’s results can be computed by filtering cached results from a previous query, without re-executing against storage.
§Key Insight
If Query A’s predicate P_A is LESS RESTRICTIVE than Query B’s predicate P_B, then B’s results are a SUBSET of A’s results:
Query A: SELECT * FROM orders WHERE amount > 100 (cached: 500 rows)
Query B: SELECT * FROM orders WHERE amount > 150 (new query)
Since amount > 150 is STRICTER than amount > 100,
B's results ⊆ A's results
Instead of scanning storage: Filter A's cached 500 rows → B's results§Supported Subsumption Patterns
-
Numeric Range Tightening:
col > 100(cached) →col > 150(new): Filter cachedcol < 500(cached) →col < 300(new): Filter cachedcol BETWEEN 100 AND 500(cached) →col BETWEEN 200 AND 400(new): Filter cached
-
Equality Subset:
col IN (1,2,3,4,5)(cached) →col IN (2,3)(new): Filter cached
-
AND Conjunction Strengthening:
A(cached) →A AND B(new): Filter cachedA AND B(cached) →A AND B AND C(new): Filter cached
§Not Supported (Triggers Re-execution)
- OR predicates (may expand result set)
- Different tables
- Different column sets
- Non-comparable predicates (LIKE, function calls)
§Transaction Isolation Considerations
Important: The semantic cache is currently global and does not account for MVCC transaction isolation. This means:
- Cache entries are shared across all transactions
- A transaction might see cached results from another transaction’s read
- Cache invalidation on DML ensures committed changes are reflected
This is safe for:
- Single-connection usage
- Read-only workloads
- Scenarios where eventual consistency is acceptable
For strict serializable isolation with concurrent writes, consider:
- Disabling the cache during critical transactions
- Using explicit cache invalidation between operations
Future enhancement: Per-transaction cache scoping with timestamp-based invalidation
Structs§
- Cached
Result - Cached query result with metadata
- Query
Fingerprint - Fingerprint for a cacheable query pattern
- Semantic
Cache - Semantic Query Cache
- Semantic
Cache Stats - Statistics for the semantic cache (lock-free with atomics)
- Semantic
Cache Stats Snapshot - Snapshot of cache statistics (plain values for reading)
Enums§
- Cache
Lookup Result - Result of a cache lookup
- Subsumption
Result - Subsumption relationship between predicates
Constants§
- DEFAULT_
CACHE_ TTL_ SECS - Time-to-live for cached results in seconds.
- DEFAULT_
MAX_ CACHED_ ROWS - Maximum number of rows to cache per query result.
- DEFAULT_
MAX_ GLOBAL_ CACHED_ ROWS - Global maximum total rows across all cache entries.
- DEFAULT_
SEMANTIC_ CACHE_ SIZE - Maximum number of cached query results per table+column combination.
Functions§
- check_
subsumption - Check if new_predicate is subsumed by cached_predicate