Module cache

Module cache 

Source
Expand description

§Caching Module

This module provides comprehensive caching functionality for the Ultrafast Models SDK. It supports both in-memory and distributed caching with automatic expiration, intelligent cache key generation, and performance optimization.

§Overview

The caching system provides:

  • Multiple Cache Backends: In-memory and distributed caching
  • Automatic Expiration: TTL-based cache invalidation
  • Intelligent Key Generation: Hash-based cache keys for consistency
  • Performance Optimization: Reduces API calls and improves response times
  • Thread Safety: Safe concurrent access across multiple threads
  • Cache Statistics: Hit rates, memory usage, and performance metrics

§Cache Backends

§In-Memory Cache

Fast local caching suitable for single-instance deployments:

  • Low Latency: Sub-millisecond access times
  • Memory Efficient: LRU eviction with configurable size limits
  • Automatic Cleanup: Expired entries removed automatically
  • Thread Safe: Concurrent access support with mutex protection

§Distributed Cache

Hybrid caching for multi-instance deployments:

  • Local + Distributed: Combines in-memory and distributed storage
  • Shared State: Cache shared across multiple instances
  • Fallback Mechanism: Local cache as backup for distributed cache
  • Consistency: Eventual consistency with local caching

§Cache Key Strategy

The system uses intelligent cache key generation:

  • Chat Completions: chat:{model}:{messages_hash}
  • Embeddings: embedding:{model}:{input_hash}
  • Image Generation: image:{model}:{prompt_hash}
  • Consistent Hashing: Deterministic keys for identical requests

§Usage Examples

§Basic Caching

use ultrafast_models_sdk::cache::{InMemoryCache, Cache, CachedResponse};
use ultrafast_models_sdk::models::{ChatRequest, ChatResponse};
use std::time::Duration;

// Create in-memory cache
let cache = InMemoryCache::new(1000); // 1000 entries capacity

// Create a chat request
let request = ChatRequest {
    model: "gpt-4".to_string(),
    messages: vec![Message::user("Hello, world!")],
    ..Default::default()
};

// Generate cache key
let cache_key = CacheKeyBuilder::build_chat_key(&request);

// Check cache first
if let Some(cached) = cache.get(&cache_key) {
    println!("Cache hit: {}", cached.response.choices[0].message.content);
    return Ok(cached.response);
}

// Make API call and cache result
let response = provider.chat_completion(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(3600));
cache.set(&cache_key, cached_response, Duration::from_secs(3600));

Ok(response)

§Distributed Caching

use ultrafast_models_sdk::cache::{DistributedCache, Cache};

// Create distributed cache
let cache = DistributedCache::new(500); // 500 local entries

// Cache operations work the same way
let cache_key = CacheKeyBuilder::build_embedding_key("text-embedding-ada-002", "Hello");

if let Some(cached) = cache.get(&cache_key) {
    println!("Cache hit from distributed cache");
    return Ok(cached.response);
}

// Cache miss - make API call
let response = provider.embedding(request).await?;
let cached_response = CachedResponse::new(response.clone(), Duration::from_secs(1800));
cache.set(&cache_key, cached_response, Duration::from_secs(1800));

Ok(response)

§Cache Configuration

use ultrafast_models_sdk::cache::{CacheConfig, CacheType};
use std::time::Duration;

let config = CacheConfig {
    enabled: true,
    ttl: Duration::from_secs(3600), // 1 hour TTL
    max_size: 1000,
    cache_type: CacheType::InMemory,
};

// Use with client
let client = UltrafastClient::standalone()
    .with_openai("your-key")
    .with_cache(config)
    .build()?;

§Performance Benefits

The caching system provides significant performance improvements:

  • Reduced Latency: Cached responses served in <1ms
  • Lower Costs: Fewer API calls to providers
  • Improved Throughput: Higher request handling capacity
  • Better User Experience: Faster response times
  • Reduced Load: Less stress on provider APIs

§Cache Invalidation

The system supports multiple invalidation strategies:

  • TTL-based: Automatic expiration after configured time
  • Manual Invalidation: Explicit cache entry removal
  • Pattern-based: Remove entries matching patterns
  • Full Clear: Clear entire cache (admin only)

§Best Practices

  • Appropriate TTL: Set TTL based on response freshness requirements
  • Monitor Hit Rates: Track cache effectiveness and adjust accordingly
  • Memory Management: Configure appropriate cache sizes for your workload
  • Key Design: Use consistent and unique cache keys
  • Error Handling: Implement fallback for cache failures
  • Monitoring: Track cache performance and memory usage

Structs§

CacheConfig
Configuration for cache behavior.
CacheKeyBuilder
Utility for generating consistent cache keys.
CachedResponse
Cached response with metadata.
DistributedCache
Distributed cache implementation with local fallback.
InMemoryCache
In-memory cache implementation using LRU eviction.

Enums§

CacheType
Available cache backend types.

Traits§

Cache
Trait for cache implementations.