Skip to main content

Crate lru_tokens

Crate lru_tokens 

Source
Expand description

§lru-tokens

LRU cache where eviction is weighted by token count (or any other size unit you supply), not entry count.

Prompt caches are usually bounded by tokens, not by entries — a few 100k-token system prompts can dominate the budget that would otherwise hold thousands of small entries. This crate inverts the usual LruCache<K, V> policy: each entry carries a weight (you say what it means — tokens, bytes, dollars), and the cache evicts the least-recently-used entries until the cumulative weight fits.

§Example

use lru_tokens::LruTokens;

let mut cache: LruTokens<&str, String> = LruTokens::new(1_000);

cache.put("system-prompt-a", "...".into(), 800);
cache.put("system-prompt-b", "...".into(), 300); // total 1100 > 1000
// Inserting `b` evicted `a` (the LRU) to bring total under 1000.
assert!(cache.get(&"system-prompt-a").is_none());
assert!(cache.get(&"system-prompt-b").is_some());
assert_eq!(cache.weight(), 300);

Structs§

LruTokens
LRU cache bounded by cumulative weight.