Cachelito
A lightweight, thread-safe caching library for Rust that provides automatic memoization through procedural macros.
Features
- ๐ Easy to use: Simply add
#[cache]attribute to any function or method - ๐ Global scope by default: Cache shared across all threads (use
scope = "thread"for thread-local) - โก High-performance synchronization: Uses
parking_lot::RwLockfor global caches, enabling concurrent reads - ๐ Thread-local option: Optional thread-local storage with
scope = "thread"for maximum performance - ๐ฏ Flexible key generation: Supports custom cache key implementations
- ๐จ Result-aware: Intelligently caches only successful
Result::Okvalues - ๐๏ธ Cache limits: Control memory usage with configurable cache size limits
- ๐ Eviction policies: Choose between FIFO (First In, First Out) and LRU (Least Recently Used)
- โฑ๏ธ TTL support: Time-to-live expiration for automatic cache invalidation
- ๐ Statistics: Track cache hit/miss rates and performance metrics (with
statsfeature) - โ Type-safe: Full compile-time type checking
- ๐ฆ Minimal dependencies: Uses
parking_lotfor optimal performance
Quick Start
Add this to your Cargo.toml:
[]
= "0.6.0"
# Optional: Enable statistics tracking
= { = "0.6.0", = ["stats"] }
Usage
Basic Function Caching
use cache;
Caching with Methods
The #[cache] attribute also works with methods:
use cache;
use DefaultCacheableKey;
Custom Cache Keys
For complex types, you can implement custom cache key generation:
Option 1: Use Default Debug-based Key
use DefaultCacheableKey;
// Enable default cache key generation based on Debug
Option 2: Custom Key Implementation
use CacheableKey;
// More efficient custom key implementation
Caching Result Types
Functions returning Result<T, E> only cache successful results:
use cache;
Cache Limits and Eviction Policies
Control memory usage by setting cache limits and choosing an eviction policy:
FIFO (First In, First Out) - Default
use cache;
// Cache with a limit of 100 entries using FIFO eviction
// FIFO is the default policy, so this is equivalent:
LRU (Least Recently Used)
use cache;
// Cache with a limit of 100 entries using LRU eviction
Key Differences:
- FIFO: Evicts the oldest inserted entry, regardless of usage
- LRU: Evicts the least recently accessed entry, keeping frequently used items longer
Time-To-Live (TTL) Expiration
Set automatic expiration times for cached entries:
use cache;
// Cache entries expire after 60 seconds
// Combine TTL with limits and policies
Benefits:
- Automatic expiration: Old data is automatically removed
- Per-entry tracking: Each entry has its own timestamp
- Lazy eviction: Expired entries removed on access
- Works with policies: Compatible with FIFO and LRU
Global Scope Cache
By default, the cache is shared across all threads (global scope). Use scope = "thread" for thread-local caches where
each thread has its own independent cache:
use cache;
// Global cache (default) - shared across all threads
// Thread-local cache - each thread has its own cache
When to use global scope (default):
- โ Cross-thread sharing: All threads benefit from cached results
- โ
Statistics monitoring: Full access to cache statistics via
stats_registry - โ Expensive operations: Computation cost outweighs synchronization overhead
- โ Shared data: Same function called with same arguments across threads
When to use thread-local (scope = "thread"):
- โ Maximum performance: No synchronization overhead
- โ Thread isolation: Each thread needs independent cache
- โ Thread-specific data: Different threads process different data
Performance considerations:
- Global (default): Uses
RwLockfor synchronization, allows concurrent reads - Thread-local: No synchronization overhead, but cache is not shared
use cache;
use thread;
// Global by default
Performance with Large Values
The cache clones values on every get operation. For large values (big structs, vectors, strings), this can be
expensive. Wrap your return values in Arc<T> to share ownership without copying data:
Problem: Expensive Cloning
use cache;
Solution: Use Arc
use cache;
use Arc;
// Return Arc instead of the value directly
Real-World Example: Caching Parsed Data
use cache;
use Arc;
// Cache expensive parsing operations
When to Use Arc
Use Arc when:
- โ Values are large (>1KB)
- โ Values contain collections (Vec, HashMap, String)
- โ Values are frequently accessed from cache
- โ Multiple parts of your code need access to the same data
Don't need Arc when:
- โ Values are small primitives (i32, f64, bool)
- โ Values are rarely accessed from cache
- โ Clone is already cheap (e.g., types with
Copytrait)
Combining Arc with Global Scope
For maximum efficiency with multi-threaded applications:
use cache;
use Arc;
use thread;
Benefits:
- ๐ Only one database/API call across all threads
- ๐พ Minimal memory overhead (Arc clones are just pointer + ref count)
- ๐ Thread-safe sharing with minimal synchronization cost
- โก Fast cache access with no data copying
Synchronization with parking_lot
Starting from version 0.5.0, Cachelito uses parking_lot for
synchronization in global scope caches. The implementation uses RwLock for the cache map and Mutex for the
eviction queue, providing optimal performance for read-heavy workloads.
Why parking_lot + RwLock?
RwLock Benefits (for the cache map):
- Concurrent reads: Multiple threads can read simultaneously without blocking
- 4-5x faster for read-heavy workloads (typical for caches)
- Perfect for 90/10 read/write ratio (common in cache scenarios)
- Only writes acquire exclusive lock
parking_lot Advantages over std::sync:
- 30-50% faster under high contention scenarios
- Adaptive spinning for short critical sections (faster than kernel-based locks)
- Fair scheduling prevents thread starvation
- No lock poisoning - simpler API without
Resultwrapping - ~40x smaller memory footprint per lock (~1 byte vs ~40 bytes)
Architecture
GlobalCache Structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ map: RwLock<HashMap<...>> โ โ Multiple readers OR one writer
โ order: Mutex<VecDeque<...>> โ โ Always exclusive (needs modification)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Read Operation (cache hit):
Thread 1 โโโ
Thread 2 โโโผโโ> RwLock.read() โโ> โ
Concurrent, no blocking
Thread 3 โโโ
Write Operation (cache miss):
Thread 1 โโ> RwLock.write() โโ> โณ Exclusive access
Benchmark Results
Performance comparison on concurrent cache access:
Mixed workload (8 threads, 100 operations, 90% reads / 10% writes):
Thread-Local Cache: 1.26ms (no synchronization baseline)
Global + RwLock: 1.84ms (concurrent reads)
Global + Mutex only: ~3.20ms (all operations serialized)
std::sync::RwLock: ~2.80ms (less optimized)
Improvement: RwLock is ~74% faster than Mutex for read-heavy workloads
Pure concurrent reads (20 threads, 100 reads each):
With RwLock: ~2ms (all threads read simultaneously)
With Mutex: ~40ms (threads wait in queue)
20x improvement for concurrent reads!
Code Simplification
With parking_lot, the internal code is cleaner:
// Read operation (concurrent with RwLock)
let value = self .map.read.get.cloned;
// Write operation (exclusive)
self .map.write.insert;
Running the Benchmarks
You can run the included benchmarks to see the performance on your hardware:
# Run cache benchmarks (includes RwLock concurrent reads)
# Run RwLock concurrent reads demo
# Run parking_lot demo
# Compare thread-local vs global
How It Works
The #[cache] macro generates code that:
- Creates a thread-local cache using
thread_local!andRefCell<HashMap> - Creates a thread-local order queue using
VecDequefor eviction tracking - Wraps cached values in
CacheEntryto track insertion timestamps - Builds a cache key from function arguments using
CacheableKey::to_cache_key() - Checks the cache before executing the function body
- Validates TTL expiration if configured, removing expired entries
- Stores the result in the cache after execution
- For
Result<T, E>types, only cachesOkvalues - When cache limit is reached, evicts entries according to the configured policy:
- FIFO: Removes the oldest inserted entry
- LRU: Removes the least recently accessed entry
Examples
The library includes several comprehensive examples demonstrating different features:
Run Examples
# Basic caching with custom types (default cache key)
# Custom cache key implementation
# Result type caching (only Ok values cached)
# Cache limits with LRU policy
# LRU eviction policy
# FIFO eviction policy
# Default policy (FIFO)
# TTL (Time To Live) expiration
# Global scope cache (shared across threads)
Example Output (LRU Policy):
=== Testing LRU Cache Policy ===
Calling compute_square(1)...
Executing compute_square(1)
Result: 1
Calling compute_square(2)...
Executing compute_square(2)
Result: 4
Calling compute_square(3)...
Executing compute_square(3)
Result: 9
Calling compute_square(2)...
Result: 4 (should be cached)
Calling compute_square(4)...
Executing compute_square(4)
Result: 16
...
Total executions: 6
โ
LRU Policy Test PASSED
Performance Considerations
- Thread-local storage (default): Each thread has its own cache, so cached data is not shared across threads. This means no locks or synchronization overhead.
- Global scope: When using
scope = "global", the cache is shared across all threads using aMutex. This adds synchronization overhead but allows cache sharing. - Memory usage: Without a limit, the cache grows unbounded. Use the
limitparameter to control memory usage. - Cache key generation: Uses
CacheableKey::to_cache_key()method. The default implementation usesDebugformatting, which may be slow for complex types. Consider implementingCacheableKeydirectly for better performance. - Value cloning: The cache clones values on every access. For large values (>1KB), wrap them in
Arc<T>to avoid expensive clones. See the Performance with Large Values section for details. - Cache hit performance: O(1) hash map lookup, with LRU having an additional O(n) reordering cost on hits
- FIFO: Minimal overhead, O(1) eviction
- LRU: Slightly higher overhead due to reordering on access, O(n) for reordering but still efficient
Cache Statistics
Available since v0.6.0 with the stats feature flag.
Track cache performance metrics including hit/miss rates and access counts. Statistics are automatically collected for global-scoped caches and can be queried programmatically.
Enabling Statistics
Add the stats feature to your Cargo.toml:
[]
= { = "0.6.0", = ["stats"] }
Basic Usage
Statistics are automatically tracked for global caches (default):
use cache;
// Global by default
Output:
Total accesses: 4
Cache hits: 2
Cache misses: 2
Hit rate: 50.00%
Miss rate: 50.00%
Statistics Registry API
The stats_registry module provides centralized access to all cache statistics:
Get Statistics
use stats_registry;
// Get a snapshot of statistics for a function
if let Some = get
// Get direct reference (no cloning)
if let Some = get_ref
List All Cached Functions
use stats_registry;
// Get names of all registered cache functions
let functions = list;
for name in functions
Reset Statistics
use stats_registry;
// Reset stats for a specific function
if reset
// Clear all registrations (useful for testing)
clear;
Statistics Metrics
The CacheStats struct provides the following metrics:
hits()- Number of successful cache lookupsmisses()- Number of cache misses (computation required)total_accesses()- Total number of get operationshit_rate()- Ratio of hits to total accesses (0.0 to 1.0)miss_rate()- Ratio of misses to total accesses (0.0 to 1.0)reset()- Reset all counters to zero
Concurrent Statistics Example
Statistics are thread-safe and work correctly with concurrent access:
use cache;
use thread;
// Global by default
Monitoring Cache Performance
Use statistics to monitor and optimize cache performance:
use ;
// Global by default
Custom Cache Names
Use the name attribute to give your caches custom identifiers in the statistics registry:
use cache;
// API V1 - using custom name (global by default)
// API V2 - using custom name (global by default)
// Access statistics using custom names
Benefits:
- Descriptive names: Use meaningful identifiers instead of function names
- Multiple versions: Track different implementations separately
- Easier debugging: Identify caches by purpose rather than function name
- Better monitoring: Compare performance of different cache strategies
Default behavior: If name is not provided, the function name is used as the identifier.
Important Notes
- Global scope by default: Statistics are automatically available via
stats_registry(default behavior) - Thread-local statistics: Thread-local caches (
scope = "thread") DO track statistics internally via theThreadLocalCache::statsfield, but these are NOT accessible viastats_registry::get()due to architectural limitations. See THREAD_LOCAL_STATS.md for a detailed explanation. - Performance: Statistics use atomic operations (minimal overhead)
- Feature flag: Statistics are only compiled when the
statsfeature is enabled
Why thread-local stats aren't in stats_registry:
- Each thread has its own independent cache and statistics
- Thread-local statics (
thread_local!) cannot be registered in a global registry - Global scope (default) provides full statistics access via
stats_registry - Thread-local stats are still useful for testing and internal debugging
Limitations
- Cannot be used with generic functions (lifetime and type parameter support is limited)
- The function must be deterministic for correct caching behavior
- Cache is global by default (use
scope = "thread"for thread-local isolation) - LRU policy has O(n) overhead on cache hits for reordering (where n is the number of cached entries)
- Global scope adds synchronization overhead (though optimized with RwLock)
- Statistics are automatically available for global caches (default); thread-local caches track stats internally but
they're not accessible via
stats_registry
Documentation
For detailed API documentation, run:
Changelog
See CHANGELOG.md for a detailed history of changes.
Latest Release: Version 0.6.0
Highlights:
- ๐ Global scope by default - Cache is now shared across threads by default for better statistics and sharing
- ๐ Cache Statistics - Track hit/miss rates and performance metrics with the
statsfeature - ๐ฏ Stats Registry - Centralized API for querying statistics:
stats_registry::get("function_name") - ๐ท๏ธ Custom Cache Names - Use
nameattribute to give caches custom identifiers:#[cache(name = "my_cache")] - ๐ Performance Monitoring - Monitor cache effectiveness with detailed metrics
- โก Thread-safe Statistics - Atomic counters for concurrent access
- ๐ Rich Metrics - Access hits, misses, total accesses, hit rate, and miss rate
- ๐งน Statistics Management - Reset, clear, and list all cached functions
Breaking Change:
- Default scope changed from
threadtoglobal. If you need thread-local caches, explicitly usescope = "thread"
Statistics Features:
// Enable with feature flag
cachelito =
// Access statistics with default name (function name)
if let Some = get
// Or use a custom name for better organization
if let Some = get
For full details, see the complete changelog.
Previous Release: Version 0.5.0
Highlights:
- โก RwLock for concurrent reads - 4-5x faster for read-heavy workloads
- ๐ 20x improvement for pure concurrent reads
- ๐พ 40x smaller memory footprint with parking_lot
- ๐ Enhanced benchmarks and examples
- ๐ง Idiomatic crate naming (
cachelito-core,cachelito-macros)
For full details, see the complete changelog.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
See Also
- CHANGELOG - Detailed version history and release notes
- Macro Expansion Guide - How to view generated code and understand
format!("{:?}") - Thread-Local Statistics - Why thread-local cache stats aren't in
stats_registryand how they work - API Documentation - Full API reference