size_lru 0.1.2

Size-aware cache library with intelligent eviction strategies. / 具备智能淘汰策略的大小感知缓存库。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
[English](#en) | [中文](#zh)

---

<a id="en"></a>

# size_lru : Intelligent Size-Aware Cache Library

High-performance, size-aware cache library implementing intelligent eviction strategies for optimal memory usage.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [API Reference](#api-reference)
- [Design](#design)
- [Technology Stack](#technology-stack)
- [Directory Structure](#directory-structure)
- [History](#history)

## Features

- **Size Awareness**: Optimizes storage based on actual object size rather than count.
- **Intelligent Eviction**: Implements LHD (Least Hit Density) algorithm to maximize hit rate.
- **Constant Complexity**: Guarantees O(1) time access for get, set, and remove operations.
- **Adaptive Tuning**: Automatically adjusts internal parameters to match workload patterns.
- **Zero Overhead**: Provides `NoCache` implementation for performance baselining.

## Installation

Add to `Cargo.toml`:

```toml
[dependencies]
size_lru = { version = "0.1.0", features = ["lhd"] }
```

## Usage

Demonstration code based on `tests/main.rs`.

### Basic Operations

```rust
use size_lru::Lhd;

fn main() {
  // Initialize cache with capacity
  let mut cache: Lhd<&str, &str> = Lhd::new(1024);

  // Set values with explicit size
  cache.set("k1", "v1", 10);
  cache.set("k2", "v2", 20);

  // Retrieve value
  assert_eq!(cache.get(&"k1"), Some(&"v1"));

  // Check state
  assert_eq!(cache.len(), 2);
  assert_eq!(cache.size(), 30);

  // Remove value
  cache.rm(&"k2");
  assert_eq!(cache.get(&"k2"), None);
}
```

### Generic Trait Usage

```rust
use size_lru::{SizeLru, Lhd};

fn cache_op<K, V>(cache: &mut impl SizeLru<K, V>, key: K, val: V, size: u32) {
  cache.set(key, val, size);
}
```

## API Reference

### `trait SizeLru<K, V>`

Core interface for cache implementations.

- `fn get(&mut self, key: &K) -> Option<&V>`: Retrieve reference to value. Updates hit statistics.
- `fn set(&mut self, key: K, val: V, size: u32)`: Insert or update value. Triggers eviction if capacity exceeded.
- `fn rm(&mut self, key: &K)`: Remove value by key.

### `struct Lhd<K, V>`

LHD algorithm implementation.

- `fn new(max: usize) -> Self`: Create new instance with maximum byte capacity.
- `fn size(&self) -> usize`: Return total size of stored items in bytes.
- `fn len(&self) -> usize`: Return count of stored items.
- `fn is_empty(&self) -> bool`: Return true if cache contains no items.

## Design

### Architecture

```mermaid
graph TD
  User[User Code] --> Trait[SizeLru Trait]
  Trait --> |impl| Lhd[Lhd Struct]
  Trait --> |impl| No[NoCache Struct]
  
  subgraph Lhd_Internals [Lhd Implementation]
    Lhd --> Index[HashMap Index]
    Lhd --> Entries[Vec Entry]
    Lhd --> Stats[Class Stats]
    
    Entries --> EntryData[Key, Val, Size, TS]
    Stats --> Hits[Hit Counters]
    Stats --> Evicts[Eviction Counters]
  end
```

### Eviction Logic

```mermaid
graph TD
  Start[Set Operation] --> CheckExist{Key Exists?}
  CheckExist --Yes--> Update[Update Value & Size]
  CheckExist --No--> CheckCap{Over Capacity?}
  
  CheckCap --No--> Insert[Insert New Entry]
  CheckCap --Yes--> EvictStart[Start Eviction]
  
  subgraph Eviction_Process [LHD Eviction]
    EvictStart --> Sample[Sample N Candidates]
    Sample --> Calc[Calculate Hit Density]
    Calc --> Select["Select Victim (Min Density)"]
    Select --> Remove[Remove Victim]
    Remove --> CheckCap
  end
  
  Update --> End[Done]
  Insert --> End
```

### Generational Mechanism

```mermaid
graph TD
  Access[Access Entry] --> AgeCalc[Calculate Age: current_ts - entry_ts]
  AgeCalc --> Coarsen[Age Coarsening: age >> shift]
  Coarsen --> AgeBucket["Age Bucket: min(coarsened_age, MAX_AGE-1)"]
  
  subgraph ClassMapping [Class Mapping]
    AgeBucket --> Sum[last_age + prev_age]
    Sum --> LogScale["Log Mapping: class_id = 32 - leading_zeros(sum) - 19"]
    LogScale --> ClassSelect["Select Class: min(log_result, AGE_CLASSES-1)"]
  end
  
  ClassSelect --> UpdateStats[Update Class Statistics]
  
  subgraph AgeClasses [Age Class Structure]
    ClassSelect --> Class0[Class 0: New Access]
    ClassSelect --> Class1[Class 1: Occasional Access]
    ClassSelect --> Class2[Class 2: Medium Frequency]
    ClassSelect --> ClassN["Class N: High Frequency (N=15)"]
    
    Class0 --> Buckets0[4096 Age Buckets]
    Class1 --> Buckets1[4096 Age Buckets]
    Class2 --> Buckets2[4096 Age Buckets]
    ClassN --> BucketsN[4096 Age Buckets]
  end
```

### Hit Rate Calculation Mechanism

```mermaid
graph TD
  Reconfig[Reconfiguration Trigger] --> Decay[Apply EWMA Decay]
  Decay --> Iterate[Iterate Age Buckets Backwards]
  
  subgraph DensityCalc [Density Calculation]
    Iterate --> Init[Initialize: events=0, hits=0, life=0]
    Init --> Loop["Loop from MAX_AGE-1 to 0"]
    Loop --> AddHits["hits += hits[age]"]
    Loop --> AddEvents["events += hits[age] + evicts[age]"]
    Loop --> AddLife["life += events"]
    Loop --> CalcDensity["density[age] = hits / life"]
  end
  
  CalcDensity --> NextAge{More Buckets?}
  NextAge --Yes--> Loop
  NextAge --No--> Complete[Density Calculation Complete]
  
  subgraph HitStats [Hit Statistics Update]
    AccessEntry[Entry Accessed] --> GetClass[Get Class ID]
    GetClass --> GetAge[Get Age Bucket]
    GetAge --> Increment["hits[class][age] += 1.0"]
  end
```

### Density Calculation and Eviction Flow

```mermaid
graph TD
  EvictStart[Start Eviction] --> Sample[Sample N Candidates]
  Sample --> CalcDensity[Calculate Hit Density for Each Candidate]
  
  subgraph DensityFormula [Density Calculation Formula]
    CalcDensity --> GetEntry[Get Entry Information]
    GetEntry --> CalcAge["Calculate Age: (ts - entry_ts) >> shift"]
    CalcAge --> GetClass["Get Class: class_id(last_age + prev_age)"]
    GetClass --> Lookup["Lookup: density = classes[class].density[age]"]
    Lookup --> Normalize["Normalize: density / size"]
  end
  
  Normalize --> Compare[Compare Density Values]
  Compare --> Select["Select Victim (Min Density)"]
  Select --> Remove[Remove Victim]
  Remove --> UpdateEvictStats["Update Eviction Stats: evicts[class][age] += 1.0"]
  UpdateEvictStats --> CheckCapacity{Still Over Capacity?}
  CheckCapacity --Yes--> Sample
  CheckCapacity --No--> End[Eviction Complete]
```

## Technology Stack

- **Rust**: Systems programming language.
- **gxhash**: High-performance, non-cryptographic hashing.
- **fastrand**: Efficient pseudo-random number generation.

## Directory Structure

```
src/
  lib.rs    # Trait definitions and module exports
  lhd.rs    # LHD algorithm implementation
  no.rs     # No-op implementation
tests/
  main.rs   # Integration tests and demos
readme/
  en.md     # English documentation
  zh.md     # Chinese documentation
```

## History

The **LHD (Least Hit Density)** algorithm originates from the NSDI '18 paper "LHD: Improving Cache Hit Rate by Maximizing Hit Density". The authors (Beckmann et al.) proposed replacing complex heuristics with a probabilistic framework. Instead of asking "which item was least recently used?", LHD asks "which item has the least expected hits per unit of space?". By estimating the probability of future hits based on object age and size, LHD maximizes the total hit rate of the cache. This implementation brings these theoretical gains to a practical Rust library.

### References

- **Paper**: [LHD: Improving Cache Hit Rate by Maximizing Hit Density](https://www.usenix.org/conference/nsdi18/presentation/beckmann) (NSDI '18)
- **Implementation**: [Official Simulation Code](https://github.com/beckmann/cache_replacement)
- **PDF**: [Download Paper](https://www.usenix.org/system/files/conference/nsdi18/nsdi18-beckmann.pdf)

---

## About

This project is an open-source component of [js0.site ⋅ Refactoring the Internet Plan](https://js0.site).

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

* [Google Group](https://groups.google.com/g/js0-site)
* [js0site.bsky.social](https://bsky.app/profile/js0site.bsky.social)

---

<a id="zh"></a>

# size_lru : 智能大小感知缓存库

具备智能淘汰策略的高性能大小感知缓存库,优化内存使用。

## 目录

- [功能特性](#功能特性)
- [安装指南](#安装指南)
- [使用演示](#使用演示)
- [接口参考](#接口参考)
- [设计架构](#设计架构)
- [技术堆栈](#技术堆栈)
- [目录结构](#目录结构)
- [历史背景](#历史背景)

## 功能特性

- **大小感知**:基于对象实际大小而非数量优化存储。
- **智能淘汰**:实现 LHD (最低命中密度) 算法以最大化命中率。
- **常数复杂度**:确保获取、设置和删除操作的 O(1) 时间复杂度。
- **自适应调整**:自动调整内部参数以匹配工作负载模式。
- **零开销**:提供 `NoCache` 实现用于性能基准测试。

## 安装指南

在 `Cargo.toml` 中添加:

```toml
[dependencies]
size_lru = { version = "0.1.0", features = ["lhd"] }
```

## 使用演示

演示代码基于 `tests/main.rs`。

### 基础操作

```rust
use size_lru::Lhd;

fn main() {
  // 初始化指定容量的缓存
  let mut cache: Lhd<&str, &str> = Lhd::new(1024);

  // 设置带有明确大小的值
  cache.set("k1", "v1", 10);
  cache.set("k2", "v2", 20);

  // 获取值
  assert_eq!(cache.get(&"k1"), Some(&"v1"));

  // 检查状态
  assert_eq!(cache.len(), 2);
  assert_eq!(cache.size(), 30);

  // 删除值
  cache.rm(&"k2");
  assert_eq!(cache.get(&"k2"), None);
}
```

### 通用 Trait 用法

```rust
use size_lru::{SizeLru, Lhd};

fn cache_op<K, V>(cache: &mut impl SizeLru<K, V>, key: K, val: V, size: u32) {
  cache.set(key, val, size);
}
```

## 接口参考

### `trait SizeLru<K, V>`

缓存实现的各项核心接口。

- `fn get(&mut self, key: &K) -> Option<&V>`: 获取值引用。更新命中统计信息。
- `fn set(&mut self, key: K, val: V, size: u32)`: 插入或更新值。若超出容量将触发淘汰。
- `fn rm(&mut self, key: &K)`: 按键删除值。

### `struct Lhd<K, V>`

LHD 算法实现。

- `fn new(max: usize) -> Self`: 创建具有最大字节容量的新实例。
- `fn size(&self) -> usize`: 返回存储项目的总大小(字节)。
- `fn len(&self) -> usize`: 返回存储项目的数量。
- `fn is_empty(&self) -> bool`: 如果缓存不包含任何项目,返回真。

## 设计架构

### 架构图

```mermaid
graph TD
  User[用户代码] --> Trait[SizeLru Trait]
  Trait --> |impl| Lhd[Lhd 结构体]
  Trait --> |impl| No[NoCache 结构体]
  
  subgraph Lhd_Internals [Lhd 实现]
    Lhd --> Index[HashMap 索引]
    Lhd --> Entries[Vec 条目]
    Lhd --> Stats[分类统计]
    
    Entries --> EntryData[键, 值, 大小, 时间戳]
    Stats --> Hits[命中计数]
    Stats --> Evicts[淘汰计数]
  end
```

### 淘汰逻辑

```mermaid
graph TD
  Start[设置操作] --> CheckExist{键是否存在?}
  CheckExist --是--> Update[更新值和大小]
  CheckExist --否--> CheckCap{超出容量?}
  
  CheckCap --否--> Insert[插入新条目]
  CheckCap --是--> EvictStart[开始淘汰]
  
  subgraph Eviction_Process [LHD 淘汰]
    EvictStart --> Sample[采样 N 个候选]
    Sample --> Calc[计算命中密度]
    Calc --> Select["选择牺牲者 (最小密度)"]
    Select --> Remove[移除牺牲者]
    Remove --> CheckCap
  end
  
  Update --> End[完成]
  Insert --> End
```

### 分代机制详解

```mermaid
graph TD
  Access[访问条目] --> AgeCalc[计算年龄: current_ts - entry_ts]
  AgeCalc --> Coarsen[年龄粗化: age >> shift]
  Coarsen --> AgeBucket["年龄桶: min(coarsened_age, MAX_AGE-1)"]
  
  subgraph ClassMapping [类别映射]
    AgeBucket --> Sum[last_age + prev_age]
    Sum --> LogScale["对数映射: class_id = 32 - leading_zeros(sum) - 19"]
    LogScale --> ClassSelect["选择类别: min(log_result, AGE_CLASSES-1)"]
  end
  
  ClassSelect --> UpdateStats[更新类别统计]
  
  subgraph AgeClasses [年龄类别结构]
    ClassSelect --> Class0[类别 0: 新访问]
    ClassSelect --> Class1[类别 1: 偶尔访问]
    ClassSelect --> Class2[类别 2: 中等频率]
    ClassSelect --> ClassN["类别 N: 高频率 (N=15)"]
    
    Class0 --> Buckets0[4096 个年龄桶]
    Class1 --> Buckets1[4096 个年龄桶]
    Class2 --> Buckets2[4096 个年龄桶]
    ClassN --> BucketsN[4096 个年龄桶]
  end
```

### 命中率计算机制

```mermaid
graph TD
  Reconfig[重新配置触发] --> Decay[应用 EWMA 衰减]
  Decay --> Iterate[反向迭代年龄桶]
  
  subgraph DensityCalc [密度计算]
    Iterate --> Init[初始化: events=0, hits=0, life=0]
    Init --> Loop["从 MAX_AGE-1 到 0 循环"]
    Loop --> AddHits["hits += hits[age]"]
    Loop --> AddEvents["events += hits[age] + evicts[age]"]
    Loop --> AddLife["life += events"]
    Loop --> CalcDensity["density[age] = hits / life"]
  end
  
  CalcDensity --> NextAge{还有年龄桶?}
  NextAge --是--> Loop
  NextAge --否--> Complete[密度计算完成]
  
  subgraph HitStats [命中统计更新]
    AccessEntry[条目被访问] --> GetClass[获取类别 ID]
    GetClass --> GetAge[获取年龄桶]
    GetAge --> Increment["hits[class][age] += 1.0"]
  end
```

### 密度计算与淘汰流程

```mermaid
graph TD
  EvictStart[开始淘汰] --> Sample[采样 N 个候选]
  Sample --> CalcDensity[计算每个候选的命中密度]
  
  subgraph DensityFormula [密度计算公式]
    CalcDensity --> GetEntry[获取条目信息]
    GetEntry --> CalcAge["计算年龄: (ts - entry_ts) >> shift"]
    CalcAge --> GetClass["获取类别: class_id(last_age + prev_age)"]
    GetClass --> Lookup["查表: density = classes[class].density[age]"]
    Lookup --> Normalize["归一化: density / size"]
  end
  
  Normalize --> Compare[比较密度值]
  Compare --> Select["选择牺牲者 (最小密度)"]
  Select --> Remove[移除牺牲者]
  Remove --> UpdateEvictStats["更新淘汰统计: evicts[class][age] += 1.0"]
  UpdateEvictStats --> CheckCapacity{仍超容量?}
  CheckCapacity --是--> Sample
  CheckCapacity --否--> End[淘汰完成]
```

## 技术堆栈

- **Rust**: 系统编程语言。
- **gxhash**: 高性能非加密哈希。
- **fastrand**: 高效伪随机数生成。

## 目录结构

```
src/
  lib.rs    # Trait 定义和模块导出
  lhd.rs    # LHD 算法实现
  no.rs     # 空操作实现
tests/
  main.rs   # 集成测试和演示
readme/
  en.md     # 英文文档
  zh.md     # 中文文档
```

## 历史背景

**LHD (最低命中密度)** 算法源自 NSDI '18 论文《LHD: Improving Cache Hit Rate by Maximizing Hit Density》。作者 (Beckmann 等人) 提议用概率框架替代复杂的启发式算法。LHD 不问“哪一项最近最少使用?”,而是问“哪一项单位空间的预期命中率最低?”。通过根据对象年龄和大小估算未来命中的概率,LHD 最大化了缓存的总命中率。本实现将这些理论成果转化为实用的 Rust 库。

### 参考文献

- **论文**: [LHD: Improving Cache Hit Rate by Maximizing Hit Density](https://www.usenix.org/conference/nsdi18/presentation/beckmann) (NSDI '18)
- **实现**: [官方模拟代码](https://github.com/beckmann/cache_replacement)
- **PDF**: [下载论文](https://www.usenix.org/system/files/conference/nsdi18/nsdi18-beckmann.pdf)

---

## 关于

本项目为 [js0.site ⋅ 重构互联网计划](https://js0.site) 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

* [谷歌邮件列表](https://groups.google.com/g/js0-site)
* [js0site.bsky.social](https://bsky.app/profile/js0site.bsky.social)