Crate llm_edge_cache

Crate llm_edge_cache

Expand description

Multi-Tier Caching System for LLM Edge Agent

This module implements a high-performance multi-tier caching system with:

L1: In-memory cache (Moka) - <1ms latency, TinyLFU eviction
L2: Distributed cache (Redis) - 1-2ms latency, persistent across instances

§Architecture

Request → L1 Lookup (in-memory)
           ├─ HIT → Return (0.1ms)
           └─ MISS
               ↓
          L2 Lookup (Redis)
           ├─ HIT → Populate L1 + Return (2ms)
           └─ MISS
               ↓
          Provider Execution
               ↓
          Async Write → L1 + L2 (non-blocking)

§Performance Targets

L1 Latency: <1ms (typically <100μs)
L2 Latency: 1-2ms
Overall Hit Rate: >50% (MVP), >70% (Beta)
L1 TTL: 5 minutes (default)
L2 TTL: 1 hour (default)

Modules§

key: Cache key generation using SHA-256 hashing
l1: L1 In-Memory Cache using Moka
l2: L2 Distributed Cache using Redis
metrics: Cache metrics tracking and reporting

Structs§

CacheHealthStatus: Cache health status
CacheManager: Multi-tier cache orchestrator

Enums§

CacheLookupResult: Result of a cache lookup operation