Expand description
Greedy Dual-Size Frequency (GDSF) cache implementation.
A cache replacement algorithm that combines frequency, size, and aging. Assigns priority based on (Frequency / Size) + Global_Age formula. Greedy Dual-Size Frequency (GDSF) Cache Implementation
GDSF is a sophisticated cache replacement algorithm designed for variable-sized objects. It combines object size, access frequency, and aging into a unified priority formula, making it ideal for CDN caches, image caches, and any scenario where cached objects have different sizes.
§How the Algorithm Works
GDSF calculates priority for each cached item using the formula:
Priority = (Frequency / Size) + GlobalAgeThis formula cleverly balances multiple factors:
- Frequency: More frequently accessed items get higher priority
- Size: Smaller items are favored (more items fit in cache)
- Aging: Prevents old popular items from staying forever
§Mathematical Formulation
For each cache entry i:
- F_i = access frequency of item i
- S_i = size of item i (in bytes)
- GlobalAge = increases on eviction (set to evicted item's priority)
- Priority_i = (F_i / S_i) + GlobalAge
On eviction: select item j where Priority_j = min{Priority_i for all i}§Why Size Matters
Consider a cache with 10KB capacity:
- Option A: Cache one 10KB file accessed 10 times → Priority = 10/10000 = 0.001
- Option B: Cache ten 1KB files accessed once each → Each Priority = 1/1000 = 0.001
GDSF recognizes that caching many small frequently-accessed items often yields better hit rates than caching fewer large items.
§Data Structure
┌─────────────────────────────────────────────────────────────────────────────┐
│ GDSF Cache (global_age=1.5, max_size=10MB) │
│ │
│ HashMap<K, *Node> BTreeMap<Priority, List> │
│ ┌──────────────┐ ┌─────────────────────────────────────────┐ │
│ │ "icon.png" ─────────────────│ pri=3.5: [icon.png] (f=2, s=1KB) │ │
│ │ "thumb.jpg" ────────────────│ pri=2.5: [thumb.jpg] (f=1, s=1KB) │ │
│ │ "video.mp4" ────────────────│ pri=1.5: [video.mp4] (f=10, s=100MB)←E │ │
│ └──────────────┘ └─────────────────────────────────────────┘ │
│ ▲ │
│ │ │
│ min_priority=1.5 │
│ │
│ Note: video.mp4 has high frequency (10) but large size (100MB), │
│ so its priority = 10/100000000 + 1.5 ≈ 1.5 (eviction candidate) │
└─────────────────────────────────────────────────────────────────────────────┘§Operations
| Operation | Action | Time |
|---|---|---|
get(key) | Increment frequency, recalculate priority | O(log P) |
put(key, value, size) | Insert with priority=(1/size)+age | O(log P) |
remove(key) | Remove from priority list, update min_priority | O(log P) |
§Size-Aware Example
Cache: max_size=5KB, global_age=0
put("small.txt", data, 1KB) → pri=1.0, total_size=1KB
put("medium.txt", data, 2KB) → pri=0.5, total_size=3KB
put("large.txt", data, 3KB) → pri=0.33, total_size=6KB OVERFLOW!
Eviction needed: evict "large.txt" (lowest priority=0.33)
global_age = 0.33
put("large.txt", data, 3KB) → pri=0.33+0.33=0.66, total_size=6KB OVERFLOW!
Evict again: now "medium.txt" has lowest priority (0.5 < 0.66)
Result: small.txt + large.txt fit in 4KB§Dual-Limit Capacity
GDSF naturally works with size-based limits:
max_entries: Maximum number of items (prevents too many tiny items)max_size: Maximum total size (primary constraint for GDSF)
§Performance Characteristics
| Metric | Value |
|---|---|
| Get | O(log P) |
| Put | O(log P) amortized |
| Remove | O(log P) |
| Memory per entry | ~120 bytes overhead + key×2 + value |
Where P = number of distinct priority buckets. Priority = (frequency/size) + age, quantized to integer keys. BTreeMap provides O(log P) lookups.
Higher overhead than simpler algorithms due to priority calculation and BTreeMap-based priority lists.
§When to Use GDSF
Good for:
- CDN and proxy caches with variable object sizes
- Image thumbnail caches
- API response caches with varying payload sizes
- File system caches
- Any size-constrained cache with heterogeneous objects
Not ideal for:
- Uniform-size objects (simpler algorithms work equally well)
- Entry-count-constrained caches (LRU/LFU are simpler)
- Very small caches (overhead not justified)
§GDSF vs Other Algorithms
| Aspect | LRU | LFU | GDSF |
|---|---|---|---|
| Size-aware | No | No | Yes |
| Frequency-aware | No | Yes | Yes |
| Aging | Implicit | No | Yes |
| Best for | Recency | Frequency | Variable-size objects |
§Thread Safety
GdsfCache is not thread-safe. For concurrent access, either:
- Wrap with
MutexorRwLock - Use
ConcurrentGdsfCache(requiresconcurrentfeature)
§Examples
§Basic Usage
use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;
// Create cache with max 1000 entries and 10MB size limit
let config = GdsfCacheConfig {
capacity: NonZeroUsize::new(1000).unwrap(),
initial_age: 0.0,
max_size: 10 * 1024 * 1024, // 10MB
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);
// Insert with explicit size tracking
let small_data = vec![0u8; 1024]; // 1KB
cache.put("small.txt".to_string(), small_data, 1024);
let large_data = vec![0u8; 1024 * 1024]; // 1MB
cache.put("large.bin".to_string(), large_data, 1024 * 1024);
// Small items get higher priority per byte
assert!(cache.get(&"small.txt".to_string()).is_some());§CDN-Style Caching
use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;
// 100MB cache for web assets
let config = GdsfCacheConfig {
capacity: NonZeroUsize::new(10000).unwrap(),
initial_age: 0.0,
max_size: 100 * 1024 * 1024,
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);
// Cache various asset types with their sizes
fn cache_asset(cache: &mut GdsfCache<String, Vec<u8>>, url: &str, data: Vec<u8>) {
let size = data.len() as u64;
cache.put(url.to_string(), data, size);
}
// Small, frequently-accessed assets get priority over large, rarely-used onesStructs§
- Gdsf
Cache - An implementation of a Greedy Dual-Size Frequency (GDSF) cache.