Module semantic_cache

Module semantic_cache 

Source
Expand description

Semantic Query Caching with Predicate Subsumption

This module implements intelligent query result caching that goes beyond simple string matching. It detects when a new query’s results can be computed by filtering cached results from a previous query, without re-executing against storage.

§Key Insight

If Query A’s predicate P_A is LESS RESTRICTIVE than Query B’s predicate P_B, then B’s results are a SUBSET of A’s results:

Query A: SELECT * FROM orders WHERE amount > 100  (cached: 500 rows)
Query B: SELECT * FROM orders WHERE amount > 150  (new query)

Since amount > 150 is STRICTER than amount > 100,
B's results ⊆ A's results

Instead of scanning storage: Filter A's cached 500 rows → B's results

§Supported Subsumption Patterns

  1. Numeric Range Tightening:

    • col > 100 (cached) → col > 150 (new): Filter cached
    • col < 500 (cached) → col < 300 (new): Filter cached
    • col BETWEEN 100 AND 500 (cached) → col BETWEEN 200 AND 400 (new): Filter cached
  2. Equality Subset:

    • col IN (1,2,3,4,5) (cached) → col IN (2,3) (new): Filter cached
  3. AND Conjunction Strengthening:

    • A (cached) → A AND B (new): Filter cached
    • A AND B (cached) → A AND B AND C (new): Filter cached

§Not Supported (Triggers Re-execution)

  • OR predicates (may expand result set)
  • Different tables
  • Different column sets
  • Non-comparable predicates (LIKE, function calls)

§Transaction Isolation Considerations

Important: The semantic cache is currently global and does not account for MVCC transaction isolation. This means:

  • Cache entries are shared across all transactions
  • A transaction might see cached results from another transaction’s read
  • Cache invalidation on DML ensures committed changes are reflected

This is safe for:

  • Single-connection usage
  • Read-only workloads
  • Scenarios where eventual consistency is acceptable

For strict serializable isolation with concurrent writes, consider:

  • Disabling the cache during critical transactions
  • Using explicit cache invalidation between operations

Future enhancement: Per-transaction cache scoping with timestamp-based invalidation

Structs§

CachedResult
Cached query result with metadata
QueryFingerprint
Fingerprint for a cacheable query pattern
SemanticCache
Semantic Query Cache
SemanticCacheStats
Statistics for the semantic cache (lock-free with atomics)
SemanticCacheStatsSnapshot
Snapshot of cache statistics (plain values for reading)

Enums§

CacheLookupResult
Result of a cache lookup
SubsumptionResult
Subsumption relationship between predicates

Constants§

DEFAULT_CACHE_TTL_SECS
Time-to-live for cached results in seconds.
DEFAULT_MAX_CACHED_ROWS
Maximum number of rows to cache per query result.
DEFAULT_MAX_GLOBAL_CACHED_ROWS
Global maximum total rows across all cache entries.
DEFAULT_SEMANTIC_CACHE_SIZE
Maximum number of cached query results per table+column combination.

Functions§

check_subsumption
Check if new_predicate is subsumed by cached_predicate