tool-result-cache
Content-addressable LRU cache for LLM agent tool calls. Same tool, same args, same answer, returned from memory. Optional TTL. One tiny runtime dep (serde_json) for the argument type.
use Duration;
use json;
use ToolCache;
let mut cache: = new
.with_capacity
.with_ttl;
let args = json!;
// First call: miss, run the closure.
let result = cache
.get_or_set
.clone;
// Second call: hit, never invokes the closure.
let same = cache
.get_or_set
.clone;
assert_eq!;
println!;
#
Why
Agents repeat themselves. search_web("anthropic prompt cache"). Two minutes later, after a tool that returned something confusing: search_web("anthropic prompt cache"). There is no upstream rate limiter to save you. There is just a bill that keeps growing.
tool-result-cache is a HashMap-backed LRU plus an optional TTL plus a stable content-addressable key. JSON-canonical arg keys mean {"a": 1, "b": 2} and {"b": 2, "a": 1} hit the same entry. Object keys are sorted recursively before hashing.
For loop detection (raise on repeats), pair with tool-loop-guard-rs. For idempotency keys (no caching, just hash), see llm-message-hash.
Install
[]
= "0.1"
The only runtime dep is serde_json, used as the argument value type.
API
use Duration;
use ;
use ;
let mut cache: = new
.with_capacity // 0 disables capacity eviction
.with_ttl; // optional default TTL
cache.set;
cache.set_with_ttl;
let _hit: = cache.get;
// Memoize: returns &V (cached or freshly computed).
let _v: &String = cache.get_or_set;
// Convenience wrapper: returns an owned clone.
let _v: String = cached_call;
// Drop a single entry; reports whether it was present.
let _was_present: bool = cache.invalidate;
// Drop everything and reset stats.
cache.clear;
// Observability.
let _h: u64 = cache.hits;
let _m: u64 = cache.misses;
let _e: u64 = cache.evictions;
let _x: u64 = cache.expirations;
let _snap: CacheStats = cache.stats;
// Stable key (lowercase hex SHA-256 of tool name + "\0" + canonical-JSON args).
let _k: String = make_key;
Internals
- Storage:
HashMap<String, Entry<V>>. - LRU recency:
Vec<String>with index 0 as the oldest and the tail as most recent._touchswaps the key to the tail; capacity eviction pops index 0. This keeps the crate at one tiny dep (serde_json) at the cost of O(n)Vec::removeon touch; for the typical agent cache size (a few hundred to a few thousand entries) this is fine. - Canonical JSON: object keys are sorted recursively before hashing. Array order is significant.
- Hashing: bundled pure-Rust SHA-256 implementation (no
sha2crate dependency). - Clock: defaults to
Instant::now. Override viaToolCache::with_clock(...)for tests.
What it does NOT do
- No HTTP, no I/O, no SDK dependency.
- No persistence. The cache lives in process memory.
- No async story. Wrap the cache in a
Mutexif you need to share across tasks. - No automatic tool wrapping. Rust does not decorate functions; call
get_or_setorcached_callat the call site.
Companion crates
tool-loop-guard-rs— raises on repeated tool calls, catches a stuck agent.llm-message-hash— canonical hash for LLM-request idempotency.cachebench— measure hit ratios over time (Python).
License
MIT OR Apache-2.0