tool-result-cache

Content-addressable LRU cache for LLM agent tool calls. Same tool, same args, same answer, returned from memory. Optional TTL. One tiny runtime dep (serde_json) for the argument type.

use std::time::Duration;
use serde_json::json;
use tool_result_cache::ToolCache;

let mut cache: ToolCache<String> = ToolCache::new()
    .with_capacity(128)
    .with_ttl(Duration::from_secs(300));

let args = json!({"q": "anthropic prompt cache"});

// First call: miss, run the closure.
let result = cache
    .get_or_set("search_web", &args, || expensive_search("anthropic prompt cache"))
    .clone();

// Second call: hit, never invokes the closure.
let same = cache
    .get_or_set("search_web", &args, || unreachable!())
    .clone();

assert_eq!(result, same);

println!(
    "hits={} misses={} evictions={} expirations={}",
    cache.hits(),
    cache.misses(),
    cache.evictions(),
    cache.expirations(),
);
# fn expensive_search(_: &str) -> String { String::from("ok") }

Why

Agents repeat themselves. search_web("anthropic prompt cache"). Two minutes later, after a tool that returned something confusing: search_web("anthropic prompt cache"). There is no upstream rate limiter to save you. There is just a bill that keeps growing.

tool-result-cache is a HashMap-backed LRU plus an optional TTL plus a stable content-addressable key. JSON-canonical arg keys mean {"a": 1, "b": 2} and {"b": 2, "a": 1} hit the same entry. Object keys are sorted recursively before hashing.

For loop detection (raise on repeats), pair with tool-loop-guard-rs. For idempotency keys (no caching, just hash), see llm-message-hash.

Install

[dependencies]
tool-result-cache = "0.1"

The only runtime dep is serde_json, used as the argument value type.

API

use std::time::Duration;
use serde_json::{json, Value};
use tool_result_cache::{cached_call, make_key, CacheStats, ToolCache};

let mut cache: ToolCache<String> = ToolCache::new()
    .with_capacity(1024)               // 0 disables capacity eviction
    .with_ttl(Duration::from_secs(60)); // optional default TTL

cache.set("tool", &json!({"q": "x"}), "v".to_string());
cache.set_with_ttl("tool", &json!({"q": "x"}), "v".to_string(), Duration::from_secs(5));

let _hit: Option<&String> = cache.get("tool", &json!({"q": "x"}));

// Memoize: returns &V (cached or freshly computed).
let _v: &String = cache.get_or_set("tool", &json!({"q": "x"}), || "computed".to_string());

// Convenience wrapper: returns an owned clone.
let _v: String = cached_call(&mut cache, "tool", &json!({"q": "x"}), || "computed".to_string());

// Drop a single entry; reports whether it was present.
let _was_present: bool = cache.invalidate("tool", &json!({"q": "x"}));

// Drop everything and reset stats.
cache.clear();

// Observability.
let _h: u64 = cache.hits();
let _m: u64 = cache.misses();
let _e: u64 = cache.evictions();
let _x: u64 = cache.expirations();
let _snap: CacheStats = cache.stats();

// Stable key (lowercase hex SHA-256 of tool name + "\0" + canonical-JSON args).
let _k: String = make_key("tool", &json!({"q": "x"}));

Internals

Storage: HashMap<String, Entry<V>>.
LRU recency: Vec<String> with index 0 as the oldest and the tail as most recent. _touch swaps the key to the tail; capacity eviction pops index 0. This keeps the crate at one tiny dep (serde_json) at the cost of O(n) Vec::remove on touch; for the typical agent cache size (a few hundred to a few thousand entries) this is fine.
Canonical JSON: object keys are sorted recursively before hashing. Array order is significant.
Hashing: bundled pure-Rust SHA-256 implementation (no sha2 crate dependency).
Clock: defaults to Instant::now. Override via ToolCache::with_clock(...) for tests.

What it does NOT do

No HTTP, no I/O, no SDK dependency.
No persistence. The cache lives in process memory.
No async story. Wrap the cache in a Mutex if you need to share across tasks.
No automatic tool wrapping. Rust does not decorate functions; call get_or_set or cached_call at the call site.

Companion crates

tool-loop-guard-rs — raises on repeated tool calls, catches a stuck agent.
llm-message-hash — canonical hash for LLM-request idempotency.
cachebench — measure hit ratios over time (Python).

License

MIT OR Apache-2.0

tool-result-cache 0.1.0