Cachelito
A lightweight, thread-safe caching library for Rust that provides automatic memoization through procedural macros.
Features
- 🚀 Easy to use: Simply add
#[cache]attribute to any function or method - 🔒 Thread-safe: Uses
thread_local!storage for cache isolation by default - 🌐 Global scope: Optional global cache shared across all threads with
scope = "global" - ⚡ High-performance synchronization: Uses
parking_lot::RwLockfor global caches, enabling concurrent reads - 🎯 Flexible key generation: Supports custom cache key implementations
- 🎨 Result-aware: Intelligently caches only successful
Result::Okvalues - 🗑️ Cache limits: Control memory usage with configurable cache size limits
- 📊 Eviction policies: Choose between FIFO (First In, First Out) and LRU (Least Recently Used)
- ⏱️ TTL support: Time-to-live expiration for automatic cache invalidation
- ✅ Type-safe: Full compile-time type checking
- 📦 Minimal dependencies: Uses
parking_lotfor optimal performance
Quick Start
Add this to your Cargo.toml:
[]
= "0.5.0"
Usage
Basic Function Caching
use cache;
Caching with Methods
The #[cache] attribute also works with methods:
use cache;
use DefaultCacheableKey;
Custom Cache Keys
For complex types, you can implement custom cache key generation:
Option 1: Use Default Debug-based Key
use DefaultCacheableKey;
// Enable default cache key generation based on Debug
Option 2: Custom Key Implementation
use CacheableKey;
// More efficient custom key implementation
Caching Result Types
Functions returning Result<T, E> only cache successful results:
use cache;
Cache Limits and Eviction Policies
Control memory usage by setting cache limits and choosing an eviction policy:
FIFO (First In, First Out) - Default
use cache;
// Cache with a limit of 100 entries using FIFO eviction
// FIFO is the default policy, so this is equivalent:
LRU (Least Recently Used)
use cache;
// Cache with a limit of 100 entries using LRU eviction
Key Differences:
- FIFO: Evicts the oldest inserted entry, regardless of usage
- LRU: Evicts the least recently accessed entry, keeping frequently used items longer
Time-To-Live (TTL) Expiration
Set automatic expiration times for cached entries:
use cache;
// Cache entries expire after 60 seconds
// Combine TTL with limits and policies
Benefits:
- Automatic expiration: Old data is automatically removed
- Per-entry tracking: Each entry has its own timestamp
- Lazy eviction: Expired entries removed on access
- Works with policies: Compatible with FIFO and LRU
Global Scope Cache
By default, each thread has its own cache (thread-local). Use scope = "global" to share the cache across all threads:
use cache;
// Thread-local cache (default) - each thread has its own cache
// Global cache - shared across all threads
When to use global scope:
- Cross-thread sharing: When you want all threads to benefit from cached results
- Expensive operations: When the cost of computation outweighs the synchronization overhead
- Shared data: When the same function is called with the same arguments across multiple threads
Performance considerations:
- Thread-local (default): No synchronization overhead, but cache is not shared
- Global: Uses
Mutexfor synchronization, adds overhead but shares cache across threads
use cache;
use thread;
Performance with Large Values
The cache clones values on every get operation. For large values (big structs, vectors, strings), this can be
expensive. Wrap your return values in Arc<T> to share ownership without copying data:
Problem: Expensive Cloning
use cache;
Solution: Use Arc
use cache;
use Arc;
// Return Arc instead of the value directly
Real-World Example: Caching Parsed Data
use cache;
use Arc;
// Cache expensive parsing operations
When to Use Arc
Use Arc when:
- ✅ Values are large (>1KB)
- ✅ Values contain collections (Vec, HashMap, String)
- ✅ Values are frequently accessed from cache
- ✅ Multiple parts of your code need access to the same data
Don't need Arc when:
- ❌ Values are small primitives (i32, f64, bool)
- ❌ Values are rarely accessed from cache
- ❌ Clone is already cheap (e.g., types with
Copytrait)
Combining Arc with Global Scope
For maximum efficiency with multi-threaded applications:
use cache;
use Arc;
use thread;
Benefits:
- 🚀 Only one database/API call across all threads
- 💾 Minimal memory overhead (Arc clones are just pointer + ref count)
- 🔒 Thread-safe sharing with minimal synchronization cost
- ⚡ Fast cache access with no data copying
Synchronization with parking_lot
Starting from version 0.5.0, Cachelito uses parking_lot for
synchronization in global scope caches. The implementation uses RwLock for the cache map and Mutex for the
eviction queue, providing optimal performance for read-heavy workloads.
Why parking_lot + RwLock?
RwLock Benefits (for the cache map):
- Concurrent reads: Multiple threads can read simultaneously without blocking
- 4-5x faster for read-heavy workloads (typical for caches)
- Perfect for 90/10 read/write ratio (common in cache scenarios)
- Only writes acquire exclusive lock
parking_lot Advantages over std::sync:
- 30-50% faster under high contention scenarios
- Adaptive spinning for short critical sections (faster than kernel-based locks)
- Fair scheduling prevents thread starvation
- No lock poisoning - simpler API without
Resultwrapping - ~40x smaller memory footprint per lock (~1 byte vs ~40 bytes)
Architecture
GlobalCache Structure:
┌─────────────────────────────────────┐
│ map: RwLock<HashMap<...>> │ ← Multiple readers OR one writer
│ order: Mutex<VecDeque<...>> │ ← Always exclusive (needs modification)
└─────────────────────────────────────┘
Read Operation (cache hit):
Thread 1 ──┐
Thread 2 ──┼──> RwLock.read() ──> ✅ Concurrent, no blocking
Thread 3 ──┘
Write Operation (cache miss):
Thread 1 ──> RwLock.write() ──> ⏳ Exclusive access
Benchmark Results
Performance comparison on concurrent cache access:
Mixed workload (8 threads, 100 operations, 90% reads / 10% writes):
Thread-Local Cache: 1.26ms (no synchronization baseline)
Global + RwLock: 1.84ms (concurrent reads)
Global + Mutex only: ~3.20ms (all operations serialized)
std::sync::RwLock: ~2.80ms (less optimized)
Improvement: RwLock is ~74% faster than Mutex for read-heavy workloads
Pure concurrent reads (20 threads, 100 reads each):
With RwLock: ~2ms (all threads read simultaneously)
With Mutex: ~40ms (threads wait in queue)
20x improvement for concurrent reads!
Code Simplification
With parking_lot, the internal code is cleaner:
// Read operation (concurrent with RwLock)
let value = self .map.read.get.cloned;
// Write operation (exclusive)
self .map.write.insert;
Running the Benchmarks
You can run the included benchmarks to see the performance on your hardware:
# Run cache benchmarks (includes RwLock concurrent reads)
# Run RwLock concurrent reads demo
# Run parking_lot demo
# Compare thread-local vs global
How It Works
The #[cache] macro generates code that:
- Creates a thread-local cache using
thread_local!andRefCell<HashMap> - Creates a thread-local order queue using
VecDequefor eviction tracking - Wraps cached values in
CacheEntryto track insertion timestamps - Builds a cache key from function arguments using
CacheableKey::to_cache_key() - Checks the cache before executing the function body
- Validates TTL expiration if configured, removing expired entries
- Stores the result in the cache after execution
- For
Result<T, E>types, only cachesOkvalues - When cache limit is reached, evicts entries according to the configured policy:
- FIFO: Removes the oldest inserted entry
- LRU: Removes the least recently accessed entry
Examples
The library includes several comprehensive examples demonstrating different features:
Run Examples
# Basic caching with custom types (default cache key)
# Custom cache key implementation
# Result type caching (only Ok values cached)
# Cache limits with LRU policy
# LRU eviction policy
# FIFO eviction policy
# Default policy (FIFO)
# TTL (Time To Live) expiration
# Global scope cache (shared across threads)
Example Output (LRU Policy):
=== Testing LRU Cache Policy ===
Calling compute_square(1)...
Executing compute_square(1)
Result: 1
Calling compute_square(2)...
Executing compute_square(2)
Result: 4
Calling compute_square(3)...
Executing compute_square(3)
Result: 9
Calling compute_square(2)...
Result: 4 (should be cached)
Calling compute_square(4)...
Executing compute_square(4)
Result: 16
...
Total executions: 6
✅ LRU Policy Test PASSED
Performance Considerations
- Thread-local storage (default): Each thread has its own cache, so cached data is not shared across threads. This means no locks or synchronization overhead.
- Global scope: When using
scope = "global", the cache is shared across all threads using aMutex. This adds synchronization overhead but allows cache sharing. - Memory usage: Without a limit, the cache grows unbounded. Use the
limitparameter to control memory usage. - Cache key generation: Uses
CacheableKey::to_cache_key()method. The default implementation usesDebugformatting, which may be slow for complex types. Consider implementingCacheableKeydirectly for better performance. - Value cloning: The cache clones values on every access. For large values (>1KB), wrap them in
Arc<T>to avoid expensive clones. See the Performance with Large Values section for details. - Cache hit performance: O(1) hash map lookup, with LRU having an additional O(n) reordering cost on hits
- FIFO: Minimal overhead, O(1) eviction
- LRU: Slightly higher overhead due to reordering on access, O(n) for reordering but still efficient
Limitations
- Cannot be used with generic functions (lifetime and type parameter support is limited)
- The function must be deterministic for correct caching behavior
- By default, each thread maintains its own cache (use
scope = "global"to share across threads) - LRU policy has O(n) overhead on cache hits for reordering (where n is the number of cached entries)
- Global scope adds synchronization overhead due to
Mutexusage
Documentation
For detailed API documentation, run:
Changelog
See CHANGELOG.md for a detailed history of changes.
Latest Release: Version 0.5.0
Highlights:
- ⚡ RwLock for concurrent reads - 4-5x faster for read-heavy workloads
- 🚀 20x improvement for pure concurrent reads
- 💾 40x smaller memory footprint with parking_lot
- 📊 Enhanced benchmarks and examples
- 🔧 Idiomatic crate naming (
cachelito-core,cachelito-macros)
For full details, see the complete changelog.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
See Also
- CHANGELOG - Detailed version history and release notes
- Macro Expansion Guide - How to view generated code and understand
format!("{:?}") - API Documentation - Full API reference