cachelito 0.5.0

Procedural macro for automatic function caching with LRU/FIFO policies, expiration, limits, and thread-local or global scope
Documentation

Cachelito

Crates.io Documentation Rust License Build Status

A lightweight, thread-safe caching library for Rust that provides automatic memoization through procedural macros.

Features

  • 🚀 Easy to use: Simply add #[cache] attribute to any function or method
  • 🔒 Thread-safe: Uses thread_local! storage for cache isolation by default
  • 🌐 Global scope: Optional global cache shared across all threads with scope = "global"
  • High-performance synchronization: Uses parking_lot::RwLock for global caches, enabling concurrent reads
  • 🎯 Flexible key generation: Supports custom cache key implementations
  • 🎨 Result-aware: Intelligently caches only successful Result::Ok values
  • 🗑️ Cache limits: Control memory usage with configurable cache size limits
  • 📊 Eviction policies: Choose between FIFO (First In, First Out) and LRU (Least Recently Used)
  • ⏱️ TTL support: Time-to-live expiration for automatic cache invalidation
  • Type-safe: Full compile-time type checking
  • 📦 Minimal dependencies: Uses parking_lot for optimal performance

Quick Start

Add this to your Cargo.toml:

[dependencies]
cachelito = "0.5.0"

Usage

Basic Function Caching

use cachelito::cache;

#[cache]
fn fibonacci(n: u32) -> u64 {
    if n <= 1 {
        return n as u64;
    }
    fibonacci(n - 1) + fibonacci(n - 2)
}

fn main() {
    // First call computes the result
    let result1 = fibonacci(10);

    // Second call returns cached result instantly
    let result2 = fibonacci(10);

    assert_eq!(result1, result2);
}

Caching with Methods

The #[cache] attribute also works with methods:

use cachelito::cache;
use cachelito_core::DefaultCacheableKey;

#[derive(Debug, Clone)]
struct Calculator {
    precision: u32,
}

impl DefaultCacheableKey for Calculator {}

impl Calculator {
    #[cache]
    fn compute(&self, x: f64, y: f64) -> f64 {
        // Expensive computation
        x.powf(y) * self.precision as f64
    }
}

Custom Cache Keys

For complex types, you can implement custom cache key generation:

Option 1: Use Default Debug-based Key

use cachelito_core::DefaultCacheableKey;

#[derive(Debug, Clone)]
struct Product {
    id: u32,
    name: String,
}

// Enable default cache key generation based on Debug
impl DefaultCacheableKey for Product {}

Option 2: Custom Key Implementation

use cachelito_core::CacheableKey;

#[derive(Debug, Clone)]
struct User {
    id: u64,
    name: String,
}

// More efficient custom key implementation
impl CacheableKey for User {
    fn to_cache_key(&self) -> String {
        format!("user:{}", self.id)
    }
}

Caching Result Types

Functions returning Result<T, E> only cache successful results:

use cachelito::cache;

#[cache]
fn divide(a: i32, b: i32) -> Result<i32, String> {
    if b == 0 {
        Err("Division by zero".to_string())
    } else {
        Ok(a / b)
    }
}

fn main() {
    // Ok results are cached
    let _ = divide(10, 2); // Computes and caches Ok(5)
    let _ = divide(10, 2); // Returns cached Ok(5)

    // Err results are NOT cached (will retry each time)
    let _ = divide(10, 0); // Returns Err, not cached
    let _ = divide(10, 0); // Computes again, returns Err
}

Cache Limits and Eviction Policies

Control memory usage by setting cache limits and choosing an eviction policy:

FIFO (First In, First Out) - Default

use cachelito::cache;

// Cache with a limit of 100 entries using FIFO eviction
#[cache(limit = 100, policy = "fifo")]
fn expensive_computation(x: i32) -> i32 {
    // When cache is full, oldest entry is evicted
    x * x
}

// FIFO is the default policy, so this is equivalent:
#[cache(limit = 100)]
fn another_computation(x: i32) -> i32 {
    x * x
}

LRU (Least Recently Used)

use cachelito::cache;

// Cache with a limit of 100 entries using LRU eviction
#[cache(limit = 100, policy = "lru")]
fn expensive_computation(x: i32) -> i32 {
    // When cache is full, least recently accessed entry is evicted
    // Accessing a cached value moves it to the end of the queue
    x * x
}

Key Differences:

  • FIFO: Evicts the oldest inserted entry, regardless of usage
  • LRU: Evicts the least recently accessed entry, keeping frequently used items longer

Time-To-Live (TTL) Expiration

Set automatic expiration times for cached entries:

use cachelito::cache;

// Cache entries expire after 60 seconds
#[cache(ttl = 60)]
fn fetch_user_data(user_id: u32) -> UserData {
    // Entries older than 60 seconds are automatically removed
    // when accessed
    fetch_from_database(user_id)
}

// Combine TTL with limits and policies
#[cache(limit = 100, policy = "lru", ttl = 300)]
fn api_call(endpoint: &str) -> Result<Response, Error> {
    // Max 100 entries, LRU eviction, 5 minute TTL
    make_http_request(endpoint)
}

Benefits:

  • Automatic expiration: Old data is automatically removed
  • Per-entry tracking: Each entry has its own timestamp
  • Lazy eviction: Expired entries removed on access
  • Works with policies: Compatible with FIFO and LRU

Global Scope Cache

By default, each thread has its own cache (thread-local). Use scope = "global" to share the cache across all threads:

use cachelito::cache;

// Thread-local cache (default) - each thread has its own cache
#[cache(limit = 100)]
fn thread_local_computation(x: i32) -> i32 {
    // Cache is NOT shared across threads
    x * x
}

// Global cache - shared across all threads
#[cache(limit = 100, scope = "global")]
fn global_computation(x: i32) -> i32 {
    // Cache IS shared across all threads
    // Uses Mutex for thread-safe access
    x * x
}

When to use global scope:

  • Cross-thread sharing: When you want all threads to benefit from cached results
  • Expensive operations: When the cost of computation outweighs the synchronization overhead
  • Shared data: When the same function is called with the same arguments across multiple threads

Performance considerations:

  • Thread-local (default): No synchronization overhead, but cache is not shared
  • Global: Uses Mutex for synchronization, adds overhead but shares cache across threads
use cachelito::cache;
use std::thread;

#[cache(scope = "global", limit = 50)]
fn expensive_api_call(endpoint: &str) -> String {
    // This expensive call is cached globally
    // All threads benefit from the same cache
    format!("Response from {}", endpoint)
}

fn main() {
    let handles: Vec<_> = (0..10)
        .map(|i| {
            thread::spawn(move || {
                // All threads share the same cache
                // First thread computes, others get cached result
                expensive_api_call("/api/users")
            })
        })
        .collect();

    for handle in handles {
        handle.join().unwrap();
    }
}

Performance with Large Values

The cache clones values on every get operation. For large values (big structs, vectors, strings), this can be expensive. Wrap your return values in Arc<T> to share ownership without copying data:

Problem: Expensive Cloning

use cachelito::cache;

#[derive(Clone, Debug)]
struct LargeData {
    payload: Vec<u8>, // Could be megabytes of data
    metadata: String,
}

#[cache(limit = 100)]
fn process_data(id: u32) -> LargeData {
    LargeData {
        payload: vec![0u8; 1_000_000], // 1MB of data
        metadata: format!("Data for {}", id),
    }
}

fn main() {
    // First call: computes and caches (1MB allocation)
    let data1 = process_data(42);

    // Second call: clones the ENTIRE 1MB! (expensive)
    let data2 = process_data(42);
}

Solution: Use Arc

use cachelito::cache;
use std::sync::Arc;

#[derive(Debug)]
struct LargeData {
    payload: Vec<u8>,
    metadata: String,
}

// Return Arc instead of the value directly
#[cache(limit = 100)]
fn process_data(id: u32) -> Arc<LargeData> {
    Arc::new(LargeData {
        payload: vec![0u8; 1_000_000], // 1MB of data
        metadata: format!("Data for {}", id),
    })
}

fn main() {
    // First call: computes and caches Arc (1MB allocation)
    let data1 = process_data(42);

    // Second call: clones only the Arc pointer (cheap!)
    // The 1MB payload is NOT cloned
    let data2 = process_data(42);

    // Both Arc point to the same underlying data
    assert!(Arc::ptr_eq(&data1, &data2));
}

Real-World Example: Caching Parsed Data

use cachelito::cache;
use std::sync::Arc;

#[derive(Debug)]
struct ParsedDocument {
    title: String,
    content: String,
    tokens: Vec<String>,
    word_count: usize,
}

// Cache expensive parsing operations
#[cache(limit = 50, policy = "lru", ttl = 3600)]
fn parse_document(file_path: &str) -> Arc<ParsedDocument> {
    // Expensive parsing operation
    let content = std::fs::read_to_string(file_path).unwrap();
    let tokens: Vec<String> = content
        .split_whitespace()
        .map(|s| s.to_string())
        .collect();

    Arc::new(ParsedDocument {
        title: extract_title(&content),
        content,
        word_count: tokens.len(),
        tokens,
    })
}

fn analyze_document(path: &str) {
    // First access: parses file (expensive)
    let doc = parse_document(path);
    println!("Title: {}", doc.title);

    // Subsequent accesses: returns Arc clone (cheap)
    let doc2 = parse_document(path);
    println!("Words: {}", doc2.word_count);

    // The underlying ParsedDocument is shared, not cloned
}

When to Use Arc

Use Arc when:

  • ✅ Values are large (>1KB)
  • ✅ Values contain collections (Vec, HashMap, String)
  • ✅ Values are frequently accessed from cache
  • ✅ Multiple parts of your code need access to the same data

Don't need Arc when:

  • ❌ Values are small primitives (i32, f64, bool)
  • ❌ Values are rarely accessed from cache
  • ❌ Clone is already cheap (e.g., types with Copy trait)

Combining Arc with Global Scope

For maximum efficiency with multi-threaded applications:

use cachelito::cache;
use std::sync::Arc;
use std::thread;

#[cache(scope = "global", limit = 100, policy = "lru")]
fn fetch_user_profile(user_id: u64) -> Arc<UserProfile> {
    // Expensive database or API call
    Arc::new(UserProfile::fetch_from_db(user_id))
}

fn main() {
    let handles: Vec<_> = (0..10)
        .map(|i| {
            thread::spawn(move || {
                // All threads share the global cache
                // Cloning Arc is cheap across threads
                let profile = fetch_user_profile(42);
                println!("User: {}", profile.name);
            })
        })
        .collect();

    for handle in handles {
        handle.join().unwrap();
    }
}

Benefits:

  • 🚀 Only one database/API call across all threads
  • 💾 Minimal memory overhead (Arc clones are just pointer + ref count)
  • 🔒 Thread-safe sharing with minimal synchronization cost
  • ⚡ Fast cache access with no data copying

Synchronization with parking_lot

Starting from version 0.5.0, Cachelito uses parking_lot for synchronization in global scope caches. The implementation uses RwLock for the cache map and Mutex for the eviction queue, providing optimal performance for read-heavy workloads.

Why parking_lot + RwLock?

RwLock Benefits (for the cache map):

  • Concurrent reads: Multiple threads can read simultaneously without blocking
  • 4-5x faster for read-heavy workloads (typical for caches)
  • Perfect for 90/10 read/write ratio (common in cache scenarios)
  • Only writes acquire exclusive lock

parking_lot Advantages over std::sync:

  • 30-50% faster under high contention scenarios
  • Adaptive spinning for short critical sections (faster than kernel-based locks)
  • Fair scheduling prevents thread starvation
  • No lock poisoning - simpler API without Result wrapping
  • ~40x smaller memory footprint per lock (~1 byte vs ~40 bytes)

Architecture

GlobalCache Structure:
┌─────────────────────────────────────┐
│ map: RwLock<HashMap<...>>           │ ← Multiple readers OR one writer
│ order: Mutex<VecDeque<...>>         │ ← Always exclusive (needs modification)
└─────────────────────────────────────┘

Read Operation (cache hit):
Thread 1 ──┐
Thread 2 ──┼──> RwLock.read() ──> ✅ Concurrent, no blocking
Thread 3 ──┘

Write Operation (cache miss):
Thread 1 ──> RwLock.write() ──> ⏳ Exclusive access

Benchmark Results

Performance comparison on concurrent cache access:

Mixed workload (8 threads, 100 operations, 90% reads / 10% writes):

Thread-Local Cache:      1.26ms  (no synchronization baseline)
Global + RwLock:         1.84ms  (concurrent reads)
Global + Mutex only:     ~3.20ms (all operations serialized)
std::sync::RwLock:       ~2.80ms (less optimized)

Improvement: RwLock is ~74% faster than Mutex for read-heavy workloads

Pure concurrent reads (20 threads, 100 reads each):

With RwLock:    ~2ms   (all threads read simultaneously)
With Mutex:     ~40ms  (threads wait in queue)

20x improvement for concurrent reads!

Code Simplification

With parking_lot, the internal code is cleaner:

// Read operation (concurrent with RwLock)
let value = self .map.read().get(key).cloned();

// Write operation (exclusive)
self .map.write().insert(key, value);

Running the Benchmarks

You can run the included benchmarks to see the performance on your hardware:

# Run cache benchmarks (includes RwLock concurrent reads)
cd cachelito-core
cargo bench --bench cache_benchmark

# Run RwLock concurrent reads demo
cargo run --example rwlock_concurrent_reads

# Run parking_lot demo
cargo run --example parking_lot_performance

# Compare thread-local vs global
cargo run --example cache_comparison

How It Works

The #[cache] macro generates code that:

  1. Creates a thread-local cache using thread_local! and RefCell<HashMap>
  2. Creates a thread-local order queue using VecDeque for eviction tracking
  3. Wraps cached values in CacheEntry to track insertion timestamps
  4. Builds a cache key from function arguments using CacheableKey::to_cache_key()
  5. Checks the cache before executing the function body
  6. Validates TTL expiration if configured, removing expired entries
  7. Stores the result in the cache after execution
  8. For Result<T, E> types, only caches Ok values
  9. When cache limit is reached, evicts entries according to the configured policy:
    • FIFO: Removes the oldest inserted entry
    • LRU: Removes the least recently accessed entry

Examples

The library includes several comprehensive examples demonstrating different features:

Run Examples

# Basic caching with custom types (default cache key)
cargo run --example custom_type_default_key

# Custom cache key implementation
cargo run --example custom_type_custom_key

# Result type caching (only Ok values cached)
cargo run --example result_caching

# Cache limits with LRU policy
cargo run --example cache_limit

# LRU eviction policy
cargo run --example lru

# FIFO eviction policy
cargo run --example fifo

# Default policy (FIFO)
cargo run --example fifo_default

# TTL (Time To Live) expiration
cargo run --example ttl

# Global scope cache (shared across threads)
cargo run --example global_scope

Example Output (LRU Policy):

=== Testing LRU Cache Policy ===

Calling compute_square(1)...
Executing compute_square(1)
Result: 1

Calling compute_square(2)...
Executing compute_square(2)
Result: 4

Calling compute_square(3)...
Executing compute_square(3)
Result: 9

Calling compute_square(2)...
Result: 4 (should be cached)

Calling compute_square(4)...
Executing compute_square(4)
Result: 16

...

Total executions: 6
✅ LRU Policy Test PASSED

Performance Considerations

  • Thread-local storage (default): Each thread has its own cache, so cached data is not shared across threads. This means no locks or synchronization overhead.
  • Global scope: When using scope = "global", the cache is shared across all threads using a Mutex. This adds synchronization overhead but allows cache sharing.
  • Memory usage: Without a limit, the cache grows unbounded. Use the limit parameter to control memory usage.
  • Cache key generation: Uses CacheableKey::to_cache_key() method. The default implementation uses Debug formatting, which may be slow for complex types. Consider implementing CacheableKey directly for better performance.
  • Value cloning: The cache clones values on every access. For large values (>1KB), wrap them in Arc<T> to avoid expensive clones. See the Performance with Large Values section for details.
  • Cache hit performance: O(1) hash map lookup, with LRU having an additional O(n) reordering cost on hits
    • FIFO: Minimal overhead, O(1) eviction
    • LRU: Slightly higher overhead due to reordering on access, O(n) for reordering but still efficient

Limitations

  • Cannot be used with generic functions (lifetime and type parameter support is limited)
  • The function must be deterministic for correct caching behavior
  • By default, each thread maintains its own cache (use scope = "global" to share across threads)
  • LRU policy has O(n) overhead on cache hits for reordering (where n is the number of cached entries)
  • Global scope adds synchronization overhead due to Mutex usage

Documentation

For detailed API documentation, run:

cargo doc --no-deps --open

Changelog

See CHANGELOG.md for a detailed history of changes.

Latest Release: Version 0.5.0

Highlights:

  • RwLock for concurrent reads - 4-5x faster for read-heavy workloads
  • 🚀 20x improvement for pure concurrent reads
  • 💾 40x smaller memory footprint with parking_lot
  • 📊 Enhanced benchmarks and examples
  • 🔧 Idiomatic crate naming (cachelito-core, cachelito-macros)

For full details, see the complete changelog.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

See Also