Skip to main content

Module gdsf

Module gdsf 

Source
Expand description

Greedy Dual-Size Frequency (GDSF) cache implementation.

A cache replacement algorithm that combines frequency, size, and aging. Assigns priority based on (Frequency / Size) + Global_Age formula. Greedy Dual-Size Frequency (GDSF) Cache Implementation

GDSF is a sophisticated cache replacement algorithm designed for variable-sized objects. It combines object size, access frequency, and aging into a unified priority formula, making it ideal for CDN caches, image caches, and any scenario where cached objects have different sizes.

§How the Algorithm Works

GDSF calculates priority for each cached item using the formula:

Priority = (Frequency / Size) + GlobalAge

This formula cleverly balances multiple factors:

  • Frequency: More frequently accessed items get higher priority
  • Size: Smaller items are favored (more items fit in cache)
  • Aging: Prevents old popular items from staying forever

§Mathematical Formulation

For each cache entry i:
  - F_i = access frequency of item i
  - S_i = size of item i (in bytes)
  - GlobalAge = increases on eviction (set to evicted item's priority)
  - Priority_i = (F_i / S_i) + GlobalAge

On eviction: select item j where Priority_j = min{Priority_i for all i}

§Why Size Matters

Consider a cache with 10KB capacity:

  • Option A: Cache one 10KB file accessed 10 times → Priority = 10/10000 = 0.001
  • Option B: Cache ten 1KB files accessed once each → Each Priority = 1/1000 = 0.001

GDSF recognizes that caching many small frequently-accessed items often yields better hit rates than caching fewer large items.

§Data Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GDSF Cache (global_age=1.5, max_size=10MB)               │
│                                                                              │
│  HashMap<K, *Node>              BTreeMap<Priority, List>                     │
│  ┌──────────────┐              ┌─────────────────────────────────────────┐   │
│  │ "icon.png" ─────────────────│ pri=3.5: [icon.png]   (f=2, s=1KB)      │   │
│  │ "thumb.jpg" ────────────────│ pri=2.5: [thumb.jpg]  (f=1, s=1KB)      │   │
│  │ "video.mp4" ────────────────│ pri=1.5: [video.mp4]  (f=10, s=100MB)←E │   │
│  └──────────────┘              └─────────────────────────────────────────┘   │
│                                        ▲                                     │
│                                        │                                     │
│                                   min_priority=1.5                           │
│                                                                              │
│  Note: video.mp4 has high frequency (10) but large size (100MB),            │
│        so its priority = 10/100000000 + 1.5 ≈ 1.5 (eviction candidate)      │
└─────────────────────────────────────────────────────────────────────────────┘

§Operations

OperationActionTime
get(key)Increment frequency, recalculate priorityO(log P)
put(key, value, size)Insert with priority=(1/size)+ageO(log P)
remove(key)Remove from priority list, update min_priorityO(log P)

§Size-Aware Example

Cache: max_size=5KB, global_age=0

put("small.txt", data, 1KB)   →  pri=1.0, total_size=1KB
put("medium.txt", data, 2KB)  →  pri=0.5, total_size=3KB
put("large.txt", data, 3KB)   →  pri=0.33, total_size=6KB  OVERFLOW!

Eviction needed: evict "large.txt" (lowest priority=0.33)
global_age = 0.33

put("large.txt", data, 3KB)   →  pri=0.33+0.33=0.66, total_size=6KB  OVERFLOW!

Evict again: now "medium.txt" has lowest priority (0.5 < 0.66)
Result: small.txt + large.txt fit in 4KB

§Dual-Limit Capacity

GDSF naturally works with size-based limits:

  • max_entries: Maximum number of items (prevents too many tiny items)
  • max_size: Maximum total size (primary constraint for GDSF)

§Performance Characteristics

MetricValue
GetO(log P)
PutO(log P) amortized
RemoveO(log P)
Memory per entry~120 bytes overhead + key×2 + value

Where P = number of distinct priority buckets. Priority = (frequency/size) + age, quantized to integer keys. BTreeMap provides O(log P) lookups.

Higher overhead than simpler algorithms due to priority calculation and BTreeMap-based priority lists.

§When to Use GDSF

Good for:

  • CDN and proxy caches with variable object sizes
  • Image thumbnail caches
  • API response caches with varying payload sizes
  • File system caches
  • Any size-constrained cache with heterogeneous objects

Not ideal for:

  • Uniform-size objects (simpler algorithms work equally well)
  • Entry-count-constrained caches (LRU/LFU are simpler)
  • Very small caches (overhead not justified)

§GDSF vs Other Algorithms

AspectLRULFUGDSF
Size-awareNoNoYes
Frequency-awareNoYesYes
AgingImplicitNoYes
Best forRecencyFrequencyVariable-size objects

§Thread Safety

GdsfCache is not thread-safe. For concurrent access, either:

  • Wrap with Mutex or RwLock
  • Use ConcurrentGdsfCache (requires concurrent feature)

§Examples

§Basic Usage

use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;

// Create cache with max 1000 entries and 10MB size limit
let config = GdsfCacheConfig {
    capacity: NonZeroUsize::new(1000).unwrap(),
    initial_age: 0.0,
    max_size: 10 * 1024 * 1024,  // 10MB
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);

// Insert with explicit size tracking
let small_data = vec![0u8; 1024];  // 1KB
cache.put("small.txt".to_string(), small_data, 1024);

let large_data = vec![0u8; 1024 * 1024];  // 1MB
cache.put("large.bin".to_string(), large_data, 1024 * 1024);

// Small items get higher priority per byte
assert!(cache.get(&"small.txt".to_string()).is_some());

§CDN-Style Caching

use cache_rs::GdsfCache;
use cache_rs::config::GdsfCacheConfig;
use core::num::NonZeroUsize;

// 100MB cache for web assets
let config = GdsfCacheConfig {
    capacity: NonZeroUsize::new(10000).unwrap(),
    initial_age: 0.0,
    max_size: 100 * 1024 * 1024,
};
let mut cache: GdsfCache<String, Vec<u8>> = GdsfCache::init(config, None);

// Cache various asset types with their sizes
fn cache_asset(cache: &mut GdsfCache<String, Vec<u8>>, url: &str, data: Vec<u8>) {
    let size = data.len() as u64;
    cache.put(url.to_string(), data, size);
}

// Small, frequently-accessed assets get priority over large, rarely-used ones

Structs§

GdsfCache
An implementation of a Greedy Dual-Size Frequency (GDSF) cache.