Module cache

Expand description

§Caching Module

This module provides comprehensive caching functionality for the Ultrafast Models SDK. It supports both in-memory and distributed caching with automatic expiration, intelligent cache key generation, and performance optimization.

§Overview

The caching system provides:

Multiple Cache Backends: In-memory and distributed caching
Automatic Expiration: TTL-based cache invalidation
Intelligent Key Generation: Hash-based cache keys for consistency
Performance Optimization: Reduces API calls and improves response times
Thread Safety: Safe concurrent access across multiple threads
Cache Statistics: Hit rates, memory usage, and performance metrics

§Cache Backends

§In-Memory Cache

Fast local caching suitable for single-instance deployments:

Low Latency: Sub-millisecond access times
Memory Efficient: LRU eviction with configurable size limits
Automatic Cleanup: Expired entries removed automatically
Thread Safe: Concurrent access support with mutex protection

§Distributed Cache

Hybrid caching for multi-instance deployments:

Local + Distributed: Combines in-memory and distributed storage
Shared State: Cache shared across multiple instances
Fallback Mechanism: Local cache as backup for distributed cache
Consistency: Eventual consistency with local caching

§Cache Key Strategy

The system uses intelligent cache key generation:

Chat Completions: chat:{model}:{messages_hash}
Embeddings: embedding:{model}:{input_hash}
Image Generation: image:{model}:{prompt_hash}
Consistent Hashing: Deterministic keys for identical requests

§Usage Examples

§Basic Caching

use ultrafast_models_sdk::cache::{InMemoryCache, Cache, CachedResponse};
use ultrafast_models_sdk::models::{ChatRequest, ChatResponse};
use std::time::Duration;

// Create in-memory cache
let cache = InMemoryCache::new(1000); // 1000 entries capacity

// Create a chat request
let request = ChatRequest {
    model: "gpt-4".to_string(),
    messages: vec![Message::user("Hello, world!")],
    ..Default::default()
};

// Generate cache key
let cache_key = CacheKeyBuilder::build_chat_key(&request);

// Check cache first
if let Some(cached) = cache.get(&cache_key) {
    println!("Cache hit: {}", cached.response.choices[0].message.content);
    return Ok(cached.response);
}

// Make API call and cache result
let response = provider.chat_completion(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(3600));
cache.set(&cache_key, cached_response, Duration::from_secs(3600));

Ok(response)

§Distributed Caching

use ultrafast_models_sdk::cache::{DistributedCache, Cache};

// Create distributed cache
let cache = DistributedCache::new(500); // 500 local entries

// Cache operations work the same way
let cache_key = CacheKeyBuilder::build_embedding_key("text-embedding-ada-002", "Hello");

if let Some(cached) = cache.get(&cache_key) {
    println!("Cache hit from distributed cache");
    return Ok(cached.response);
}

// Cache miss - make API call
let response = provider.embedding(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(1800));
cache.set(&cache_key, cached_response, Duration::from_secs(1800));

Ok(response)

§Cache Configuration

use ultrafast_models_sdk::cache::{CacheConfig, CacheType};
use std::time::Duration;

let config = CacheConfig {
    enabled: true,
    ttl: Duration::from_secs(3600), // 1 hour TTL
    max_size: 1000,
    cache_type: CacheType::InMemory,
};

// Use with client
let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_cache(config)
    .build()?;

§Performance Benefits

The caching system provides significant performance improvements:

Reduced Latency: Cached responses served in <1ms
Lower Costs: Fewer API calls to providers
Improved Throughput: Higher request handling capacity
Better User Experience: Faster response times
Reduced Load: Less stress on provider APIs

§Cache Invalidation

The system supports multiple invalidation strategies:

TTL-based: Automatic expiration after configured time
Manual Invalidation: Explicit cache entry removal
Pattern-based: Remove entries matching patterns
Full Clear: Clear entire cache (admin only)

§Best Practices

Appropriate TTL: Set TTL based on response freshness requirements
Monitor Hit Rates: Track cache effectiveness and adjust accordingly
Memory Management: Configure appropriate cache sizes for your workload
Key Design: Use consistent and unique cache keys
Error Handling: Implement fallback for cache failures
Monitoring: Track cache performance and memory usage

Structs§

CacheConfig: Configuration for cache behavior.
CacheKeyBuilder: Utility for generating consistent cache keys.
CachedResponse: Cached response with metadata.
DistributedCache: Distributed cache implementation with local fallback.
InMemoryCache: In-memory cache implementation using LRU eviction.

Enums§

CacheType: Available cache backend types.

Traits§

Cache: Trait for cache implementations.

Module cache

Module cache Copy item path

§Caching Module

§Overview

§Cache Backends

§In-Memory Cache

§Distributed Cache

§Cache Key Strategy

§Usage Examples

§Basic Caching

§Distributed Caching

§Cache Configuration

§Performance Benefits

§Cache Invalidation

§Best Practices

Structs§

Enums§

Traits§

Module cache