Skip to main content

Crate tensor_cache

Crate tensor_cache 

Source
Expand description

Tensor-native LLM response cache - Module 10 of Neumann

Tensor-native semantic caching for LLM responses with exact and semantic matching, cost tracking, and background eviction.

§Architecture

Uses TensorStore as its backing store, aligning with the tensor-native paradigm used by tensor_vault and tensor_blob. Cache entries are stored as TensorData with standardized field prefixes.

§Cache Layers

  • Exact Cache: O(1) hash-based lookup for identical queries
  • Semantic Cache: O(log n) HNSW-based similarity search
  • Embedding Cache: O(1) cached embeddings for queries

§Example

use tensor_cache::{Cache, CacheConfig};

// Configure cache with 3-dimensional embeddings
let mut config = CacheConfig::default();
config.embedding_dim = 3;
let cache = Cache::with_config(config).unwrap();

// Store a response
let embedding = vec![0.1, 0.2, 0.3];
cache
    .put("What is 2+2?", &embedding, "4", "gpt-4", None)
    .unwrap();

// Look up (tries exact first, then semantic)
if let Some(hit) = cache.get("What is 2+2?", Some(&embedding)) {
    println!("Cached: {}", hit.response);
}

Structs§

Cache
LLM response cache with tensor-native storage.
CacheConfig
Cache configuration with capacity limits, TTL, and eviction settings.
CacheHit
Result of a successful cache lookup.
CacheStats
Thread-safe cache statistics with atomic counters.
EvictionManager
Manages background eviction for cache maintenance.
EvictionScorer
Calculates eviction priority scores. Lower scores are evicted first.
ModelPricing
Model pricing (cost per 1000 tokens in dollars).
SparseVector
A vector that only stores non-zero values.
StatsSnapshot
Point-in-time snapshot of cache statistics for reporting.
TokenCounter
Token counter using tiktoken’s cl100k_base encoding (GPT-4, GPT-3.5-turbo, ada-002). Falls back to character-based estimation (~4 chars per token) if tiktoken unavailable.

Enums§

CacheError
CacheLayer
DistanceMetric
Distance metric for vector similarity/distance computation.
EvictionStrategy
Eviction strategy for cache entries.

Type Aliases§

Result