SetAssociativeCache - High-Performance CLOCK Eviction Cache
A hardware-optimized, set-associative cache implementation using CLOCK eviction policy. Achieves 30-50% higher throughput than standard LRU/LFU/FIFO caches through SIMD vectorization, compile-time optimization, and cache-friendly memory layout.
Features
- CLOCK Eviction: O(1) amortized complexity, good approximation of LRU
- SIMD Vectorized Search: Parallel tag comparison using x86-64 intrinsics (SSE2/AVX2)
- Compile-Time Optimization: CacheLayout trait enables constant folding and dead code elimination
- Hardware-Aligned Memory: Single contiguous allocation with cache line alignment
- Flexible Key Extraction: Generic KeyExtract trait supports custom key types
- Custom Allocators: Full support for custom allocator implementations
- No Dependencies: Core cache uses only std library
Performance
Benchmark results (sequential workload):
Cache Size 256 items:
SetAssociativeCache: 11.38 Melem/s
LRU/FIFO/Random: 8-9 Melem/s (+27% slower)
LFU: 6.75 Melem/s (+60% slower)
Cache Size 512 items:
SetAssociativeCache: 13.29 Melem/s
LRU/FIFO/Random: 8-9 Melem/s (+47% slower)
LFU: 6.49 Melem/s (+100% slower)
See BENCHMARK_RESULTS.md for detailed analysis and comparisons.
Usage
Basic Example
use ;
use RandomState;
// Define key-value pair type
; // (key, value)
// Implement KeyExtract trait
;
Custom Cache Layout
use ;
// Define custom geometry (2 ways, 16-bit tags, 4 clock bits)
;
// Use with custom layout
let mut cache: =
new;
Custom Allocator
use SetAssociativeCache;
use Global;
let mut cache: = new;
How It Works
Memory Layout
SetAssociativeCache organizes data into sets, each containing multiple ways:
Set 0: [Tag0][Tag1]...[Tag15]
[Val0][Val1]...[Val15] <- All in one cache line
[Clock bits for each slot]
Set 1: [Tag0][Tag1]...[Tag15]
[Val0][Val1]...[Val15]
[Clock bits]
...
All data for a set fits in a single CPU cache line, enabling prefetching and cache hits.
Search Algorithm
- Hash key to get (set index, tag)
- Load all tags for that set via SIMD
- Compare all 16 tags in parallel
- On match, verify key equality and update clock bit
- Return value
Cost: 1-2 CPU cycles (vectorized) vs 10+ cycles for HashMap.
CLOCK Eviction
When cache is full:
- Locate eviction pointer for the set
- Scan forward in circular order
- Clear clock bits as we scan
- Evict first slot with bit = 0
- Insert new value
- Advance pointer for next eviction
Cost: O(1) amortized with predictable latency.
Configuration
Cache Size Requirements
Value count must be a multiple of:
- WAYS (number of associative ways)
- (CACHE_LINE_SIZE * 8) / CLOCK_BITS for clock counters
Default layout (WAYS=16, CLOCK_BITS=2):
Required multiple = max(16, 256) = 256
Valid sizes: 256, 512, 1024, 2048, ...
Layout Selection
- DefaultLayout: 16 ways, 8-bit tags, 2 clock bits (recommended)
- Ways2Layout: 2 ways, minimal overhead
- Ways4Layout: 4 ways, balance
- Ways16Layout: 16 ways, maximum parallelism
Larger WAYS = better SIMD parallelism but more memory per set.
Comparison with Other Policies
| Policy | Throughput | Hit Rate | Complexity | Memory | Flexibility |
|---|---|---|---|---|---|
| SetAssociative | 11-14 Melem/s | Good (CLOCK) | O(1) | Fixed | Limited |
| LRU | 8-9 Melem/s | Optimal | O(n) | Dynamic | Full |
| LFU | 6.5 Melem/s | Optimal | O(log n) | Dynamic | Full |
| FIFO | 8-9 Melem/s | Good | O(1) | Dynamic | Full |
| Random | 8-9 Melem/s | Fair | O(1) | Dynamic | Full |
Trade-offs
Choose SetAssociativeCache if:
- Throughput is critical (>100K ops/sec)
- Cache size is known in advance
- CPU cache efficiency matters
- Hit rate variance of 2-5% is acceptable
- Standard CLOCK eviction semantics sufficient
Choose HashMap + LFU if:
- Optimal hit rate required
- Cache size varies dynamically
- Support for heterogeneous value types needed
Building and Testing
# Build
# Run tests
# Run benchmarks
# Full benchmarks with HTML report
# Open report
Benchmarking
The included benchmark suite compares SetAssociativeCache against:
- LRU (Least Recently Used)
- FIFO (First In First Out)
- LFU (Least Frequently Used)
- W-TinyLFU (Windowed TinyLFU)
- Random eviction
Benchmarks cover:
- Cache sizes: 256, 512, 1024 items
- Access patterns: Sequential workload
- Access counts: 10K and 100K operations
See BENCHMARKS.md for detailed methodology.
When to Use SetAssociativeCache
Good Fit
- Database query result caching
- CPU cache replacement simulation
- High-frequency trading data cache
- Game engine asset cache
- Connection/resource pooling
- DNS resolver cache
Poor Fit
- Variable-size item storage
- Unbounded growth scenarios
- Applications requiring exact LRU semantics
Implementation Notes
Thread Safety
SetAssociativeCache is not thread-safe. For concurrent access, wrap with:
- Arc<Mutex>
- DashMap with per-item locks
- Sharding across multiple caches
Memory Overhead
Base overhead: ~400 bytes
Per item: tag(1-2B) + clock_bits(fractional) + bookkeeping
Total = 400 + (cache_size * 2)
CPU Architecture
Currently optimized for x86-64 with SSE2/AVX2. Generic implementations available via feature flags.
Design Rationale
Why CLOCK Eviction?
- O(1) amortized complexity vs O(n) for LRU retention
- Good hit rate approximation (95%+ of LRU)
- Predictable latency (no worst-case O(n))
- Hardware-friendly scan pattern
Why Set-Associative?
- Mimics CPU cache structure
- Enables SIMD vectorization
- Single allocation with predictable layout
- Aligns with CPU cache lines
Why Compile-Time Constants?
- LLVM optimizations: division -> shift, masks -> constants
- Dead code elimination for unused ways/tags
- Monomorphization specializes code per layout
- Zero-cost abstractions
References
- CLOCK Algorithm: Corbato, "A Paging Experiment with the Multics System"
- Set-Associative Caches: CPU cache architecture literature
- SIMD Optimization: Intel x86 optimization manuals
- ARM: https://developer.arm.com/documentation/den0042/0100/Caches/Cache-architecture/Set-associative-caches
License
Apache License 2.0
Contributing
Contributions welcome. Please ensure:
- All tests pass:
cargo test - Benchmarks don't regress:
cargo bench - Code follows project style
- Documentation updated