Module gdsf

Expand description

Greedy Dual-Size Frequency (GDSF) cache implementation.

A cache replacement algorithm that combines frequency, size, and aging. Assigns priority based on (Frequency / Size) + Global_Age formula. Greedy Dual-Size Frequency (GDSF) Cache Implementation

GDSF is a sophisticated cache replacement algorithm designed for variable-sized objects. It combines object size, access frequency, and aging into a unified priority formula, making it ideal for CDN caches, image caches, and any scenario where cached objects have different sizes.

§How the Algorithm Works

GDSF calculates priority for each cached item using the formula:

Priority = (Frequency / Size) + GlobalAge

This formula cleverly balances multiple factors:

Frequency: More frequently accessed items get higher priority
Size: Smaller items are favored (more items fit in cache)
Aging: Prevents old popular items from staying forever

§Mathematical Formulation

For each cache entry i:
  - F_i = access frequency of item i
  - S_i = size of item i (in bytes)
  - GlobalAge = increases on eviction (set to evicted item's priority)
  - Priority_i = (F_i / S_i) + GlobalAge

On eviction: select item j where Priority_j = min{Priority_i for all i}

§Why Size Matters

Consider a cache with 10KB capacity:

Option A: Cache one 10KB file accessed 10 times → Priority = 10/10000 = 0.001
Option B: Cache ten 1KB files accessed once each → Each Priority = 1/1000 = 0.001

GDSF recognizes that caching many small frequently-accessed items often yields better hit rates than caching fewer large items.

§Data Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GDSF Cache (global_age=1.5, max_size=10MB)               │
│                                                                              │
│  HashMap<K, *Node>              BTreeMap<Priority, List>                     │
│  ┌──────────────┐              ┌─────────────────────────────────────────┐   │
│  │ "icon.png" ─────────────────│ pri=3.5: [icon.png]   (f=2, s=1KB)      │   │
│  │ "thumb.jpg" ────────────────│ pri=2.5: [thumb.jpg]  (f=1, s=1KB)      │   │
│  │ "video.mp4" ────────────────│ pri=1.5: [video.mp4]  (f=10, s=100MB)←E │   │
│  └──────────────┘              └─────────────────────────────────────────┘   │
│                                        ▲                                     │
│                                        │                                     │
│                                   min_priority=1.5                           │
│                                                                              │
│  Note: video.mp4 has high frequency (10) but large size (100MB),            │
│        so its priority = 10/100000000 + 1.5 ≈ 1.5 (eviction candidate)      │
└─────────────────────────────────────────────────────────────────────────────┘

§Operations

Operation	Action	Time
`get(key)`	Increment frequency, recalculate priority	O(log P)
`put(key, value, size)`	Insert with priority=(1/size)+age	O(log P)
`remove(key)`	Remove from priority list, update min_priority	O(log P)

§Size-Aware Example

Cache: max_size=5KB, global_age=0

put("small.txt", data, 1KB)   →  pri=1.0, total_size=1KB
put("medium.txt", data, 2KB)  →  pri=0.5, total_size=3KB
put("large.txt", data, 3KB)   →  pri=0.33, total_size=6KB  OVERFLOW!

Eviction needed: evict "large.txt" (lowest priority=0.33)
global_age = 0.33

put("large.txt", data, 3KB)   →  pri=0.33+0.33=0.66, total_size=6KB  OVERFLOW!

Evict again: now "medium.txt" has lowest priority (0.5 < 0.66)
Result: small.txt + large.txt fit in 4KB

§Dual-Limit Capacity

GDSF naturally works with size-based limits:

max_entries: Maximum number of items (prevents too many tiny items)
max_size: Maximum total size (primary constraint for GDSF)

§Performance Characteristics

Metric	Value
Get	O(log P)
Put	O(log P) amortized
Remove	O(log P)
Memory per entry	~120 bytes overhead + key×2 + value

Where P = number of distinct priority buckets. Priority = (frequency/size) + age, quantized to integer keys. BTreeMap provides O(log P) lookups.

Higher overhead than simpler algorithms due to priority calculation and BTreeMap-based priority lists.

§When to Use GDSF

Good for:

CDN and proxy caches with variable object sizes
Image thumbnail caches
API response caches with varying payload sizes
File system caches
Any size-constrained cache with heterogeneous objects

Not ideal for:

Uniform-size objects (simpler algorithms work equally well)
Entry-count-constrained caches (LRU/LFU are simpler)
Very small caches (overhead not justified)

§GDSF vs Other Algorithms

Aspect	LRU	LFU	GDSF
Size-aware	No	No	Yes
Frequency-aware	No	Yes	Yes
Aging	Implicit	No	Yes
Best for	Recency	Frequency	Variable-size objects

§Thread Safety

GdsfCache is not thread-safe. For concurrent access, either:

Wrap with Mutex or RwLock
Use ConcurrentGdsfCache (requires concurrent feature)

§Examples

§Basic Usage

use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;

// Create cache with max 1000 entries and 10MB size limit
let config = GdsfCacheConfig {
    capacity: NonZeroUsize::new(1000).unwrap(),
    initial_age: 0.0,
    max_size: 10 * 1024 * 1024,  // 10MB
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);

// Insert with explicit size tracking
let small_data = vec![0u8; 1024];  // 1KB
cache.put("small.txt".to_string(), small_data, 1024);

let large_data = vec![0u8; 1024 * 1024];  // 1MB
cache.put("large.bin".to_string(), large_data, 1024 * 1024);

// Small items get higher priority per byte
assert!(cache.get(&"small.txt".to_string()).is_some());

§CDN-Style Caching

use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;

// 100MB cache for web assets
let config = GdsfCacheConfig {
    capacity: NonZeroUsize::new(10000).unwrap(),
    initial_age: 0.0,
    max_size: 100 * 1024 * 1024,
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);

// Cache various asset types with their sizes
fn cache_asset(cache: &mut GdsfCache<String, Vec<u8>>, url: &str, data: Vec<u8>) {
    let size = data.len() as u64;
    cache.put(url.to_string(), data, size);
}

// Small, frequently-accessed assets get priority over large, rarely-used ones

Structs§

GdsfCache: An implementation of a Greedy Dual-Size Frequency (GDSF) cache.

Module gdsf

Module gdsf Copy item path

§How the Algorithm Works

§Mathematical Formulation

§Why Size Matters

§Data Structure

§Operations

§Size-Aware Example

§Dual-Limit Capacity

§Performance Characteristics

§When to Use GDSF

§GDSF vs Other Algorithms

§Thread Safety

§Examples

§Basic Usage

§CDN-Style Caching

Structs§

Module gdsf