Expand description
Tensor-native LLM response cache - Module 10 of Neumann
Tensor-native semantic caching for LLM responses with exact and semantic matching, cost tracking, and background eviction.
§Architecture
Uses TensorStore as its backing store, aligning with the tensor-native
paradigm used by tensor_vault and tensor_blob. Cache entries are stored as
TensorData with standardized field prefixes.
§Cache Layers
- Exact Cache: O(1) hash-based lookup for identical queries
- Semantic Cache: O(log n) HNSW-based similarity search
- Embedding Cache: O(1) cached embeddings for queries
§Example
use tensor_cache::{Cache, CacheConfig};
// Configure cache with 3-dimensional embeddings
let mut config = CacheConfig::default();
config.embedding_dim = 3;
let cache = Cache::with_config(config).unwrap();
// Store a response
let embedding = vec![0.1, 0.2, 0.3];
cache
.put("What is 2+2?", &embedding, "4", "gpt-4", None)
.unwrap();
// Look up (tries exact first, then semantic)
if let Some(hit) = cache.get("What is 2+2?", Some(&embedding)) {
println!("Cached: {}", hit.response);
}Structs§
- Cache
- LLM response cache with tensor-native storage.
- Cache
Config - Cache configuration with capacity limits, TTL, and eviction settings.
- Cache
Hit - Result of a successful cache lookup.
- Cache
Stats - Thread-safe cache statistics with atomic counters.
- Eviction
Manager - Manages background eviction for cache maintenance.
- Eviction
Scorer - Calculates eviction priority scores. Lower scores are evicted first.
- Model
Pricing - Model pricing (cost per 1000 tokens in dollars).
- Sparse
Vector - A vector that only stores non-zero values.
- Stats
Snapshot - Point-in-time snapshot of cache statistics for reporting.
- Token
Counter - Token counter using tiktoken’s
cl100k_baseencoding (GPT-4, GPT-3.5-turbo, ada-002). Falls back to character-based estimation (~4 chars per token) if tiktoken unavailable.
Enums§
- Cache
Error - Cache
Layer - Distance
Metric - Distance metric for vector similarity/distance computation.
- Eviction
Strategy - Eviction strategy for cache entries.