Expand description
§lru-tokens
LRU cache where eviction is weighted by token count (or any other size unit you supply), not entry count.
Prompt caches are usually bounded by tokens, not by entries — a few
100k-token system prompts can dominate the budget that would
otherwise hold thousands of small entries. This crate inverts the
usual LruCache<K, V> policy: each entry carries a weight (you
say what it means — tokens, bytes, dollars), and the cache evicts
the least-recently-used entries until the cumulative weight fits.
§Example
use lru_tokens::LruTokens;
let mut cache: LruTokens<&str, String> = LruTokens::new(1_000);
cache.put("system-prompt-a", "...".into(), 800);
cache.put("system-prompt-b", "...".into(), 300); // total 1100 > 1000
// Inserting `b` evicted `a` (the LRU) to bring total under 1000.
assert!(cache.get(&"system-prompt-a").is_none());
assert!(cache.get(&"system-prompt-b").is_some());
assert_eq!(cache.weight(), 300);Structs§
- LruTokens
- LRU cache bounded by cumulative weight.