Expand description
§Caching Module
This module provides comprehensive caching functionality for the Ultrafast Models SDK. It supports both in-memory and distributed caching with automatic expiration, intelligent cache key generation, and performance optimization.
§Overview
The caching system provides:
- Multiple Cache Backends: In-memory and distributed caching
- Automatic Expiration: TTL-based cache invalidation
- Intelligent Key Generation: Hash-based cache keys for consistency
- Performance Optimization: Reduces API calls and improves response times
- Thread Safety: Safe concurrent access across multiple threads
- Cache Statistics: Hit rates, memory usage, and performance metrics
§Cache Backends
§In-Memory Cache
Fast local caching suitable for single-instance deployments:
- Low Latency: Sub-millisecond access times
- Memory Efficient: LRU eviction with configurable size limits
- Automatic Cleanup: Expired entries removed automatically
- Thread Safe: Concurrent access support with mutex protection
§Distributed Cache
Hybrid caching for multi-instance deployments:
- Local + Distributed: Combines in-memory and distributed storage
- Shared State: Cache shared across multiple instances
- Fallback Mechanism: Local cache as backup for distributed cache
- Consistency: Eventual consistency with local caching
§Cache Key Strategy
The system uses intelligent cache key generation:
- Chat Completions:
chat:{model}:{messages_hash} - Embeddings:
embedding:{model}:{input_hash} - Image Generation:
image:{model}:{prompt_hash} - Consistent Hashing: Deterministic keys for identical requests
§Usage Examples
§Basic Caching
use ultrafast_models_sdk::cache::{InMemoryCache, Cache, CachedResponse};
use ultrafast_models_sdk::models::{ChatRequest, ChatResponse};
use std::time::Duration;
// Create in-memory cache
let cache = InMemoryCache::new(1000); // 1000 entries capacity
// Create a chat request
let request = ChatRequest {
model: "gpt-4".to_string(),
messages: vec![Message::user("Hello, world!")],
..Default::default()
};
// Generate cache key
let cache_key = CacheKeyBuilder::build_chat_key(&request);
// Check cache first
if let Some(cached) = cache.get(&cache_key) {
println!("Cache hit: {}", cached.response.choices[0].message.content);
return Ok(cached.response);
}
// Make API call and cache result
let response = provider.chat_completion(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(3600));
cache.set(&cache_key, cached_response, Duration::from_secs(3600));
Ok(response)§Distributed Caching
use ultrafast_models_sdk::cache::{DistributedCache, Cache};
// Create distributed cache
let cache = DistributedCache::new(500); // 500 local entries
// Cache operations work the same way
let cache_key = CacheKeyBuilder::build_embedding_key("text-embedding-ada-002", "Hello");
if let Some(cached) = cache.get(&cache_key) {
println!("Cache hit from distributed cache");
return Ok(cached.response);
}
// Cache miss - make API call
let response = provider.embedding(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(1800));
cache.set(&cache_key, cached_response, Duration::from_secs(1800));
Ok(response)§Cache Configuration
use ultrafast_models_sdk::cache::{CacheConfig, CacheType};
use std::time::Duration;
let config = CacheConfig {
enabled: true,
ttl: Duration::from_secs(3600), // 1 hour TTL
max_size: 1000,
cache_type: CacheType::InMemory,
};
// Use with client
let client = UltrafastClient::standalone()
.with_openai("your-key")
.with_cache(config)
.build()?;§Performance Benefits
The caching system provides significant performance improvements:
- Reduced Latency: Cached responses served in <1ms
- Lower Costs: Fewer API calls to providers
- Improved Throughput: Higher request handling capacity
- Better User Experience: Faster response times
- Reduced Load: Less stress on provider APIs
§Cache Invalidation
The system supports multiple invalidation strategies:
- TTL-based: Automatic expiration after configured time
- Manual Invalidation: Explicit cache entry removal
- Pattern-based: Remove entries matching patterns
- Full Clear: Clear entire cache (admin only)
§Best Practices
- Appropriate TTL: Set TTL based on response freshness requirements
- Monitor Hit Rates: Track cache effectiveness and adjust accordingly
- Memory Management: Configure appropriate cache sizes for your workload
- Key Design: Use consistent and unique cache keys
- Error Handling: Implement fallback for cache failures
- Monitoring: Track cache performance and memory usage
Structs§
- Cache
Config - Configuration for cache behavior.
- Cache
KeyBuilder - Utility for generating consistent cache keys.
- Cached
Response - Cached response with metadata.
- Distributed
Cache - Distributed cache implementation with local fallback.
- InMemory
Cache - In-memory cache implementation using LRU eviction.
Enums§
- Cache
Type - Available cache backend types.
Traits§
- Cache
- Trait for cache implementations.