Crate llm_edge_cache

Crate llm_edge_cache 

Source
Expand description

Multi-Tier Caching System for LLM Edge Agent

This module implements a high-performance multi-tier caching system with:

  • L1: In-memory cache (Moka) - <1ms latency, TinyLFU eviction
  • L2: Distributed cache (Redis) - 1-2ms latency, persistent across instances

§Architecture

Request → L1 Lookup (in-memory)
           ├─ HIT → Return (0.1ms)
           └─ MISS
               ↓
          L2 Lookup (Redis)
           ├─ HIT → Populate L1 + Return (2ms)
           └─ MISS
               ↓
          Provider Execution
               ↓
          Async Write → L1 + L2 (non-blocking)

§Performance Targets

  • L1 Latency: <1ms (typically <100μs)
  • L2 Latency: 1-2ms
  • Overall Hit Rate: >50% (MVP), >70% (Beta)
  • L1 TTL: 5 minutes (default)
  • L2 TTL: 1 hour (default)

Modules§

key
Cache key generation using SHA-256 hashing
l1
L1 In-Memory Cache using Moka
l2
L2 Distributed Cache using Redis
metrics
Cache metrics tracking and reporting

Structs§

CacheHealthStatus
Cache health status
CacheManager
Multi-tier cache orchestrator

Enums§

CacheLookupResult
Result of a cache lookup operation