cachebench
Prompt-cache observability for LLM APIs. Per-call hit ratio, cost saved, regression alerts. Anthropic, OpenAI, Bedrock.
[]
= "0.1"
Why
Prompt caching saves 50–90% of input tokens on Anthropic and OpenAI, but per-request hit rate is invisible from the SDK. Misses are silent. A deploy that appends a timestamp to a system prompt can quietly halve your cache hit rate and double your bill — and you'll find out from the invoice. Anthropic's SDK silently misses ~40% on back-to-back requests at certain windows.
cachebench wraps your LLM call site, takes the Usage returned by the provider, and tells you per call what hit and what didn't.
Quick start
use ;
use json;
use Duration;
let tracker = new
.with_alert_threshold
.with_alert_hook;
// After your Anthropic call returns, hand the usage to the tracker:
let messages = vec!;
let prefix = fingerprint;
let usage = Usage ;
let m = tracker.record;
println!;
println!;
let agg = tracker.aggregate;
println!;
Features
- Per-call attribution. Stable
prefix_id(sha256 of system + tools + model + prefix messages, trailing user turn excluded) lets you group calls by what was supposed to be cached. - Regression alerts. Configurable threshold; fires the alert hook when a cacheable call hits below it.
- Multi-provider pricing.
DEFAULT_ANTHROPIC_PRICING,DEFAULT_OPENAI_PRICING,DEFAULT_BEDROCK_PRICINGconstants; pass your ownPricingif rates change. - Per-prefix grouping.
tracker.by_prefix()shows hit rate per prefix — instantly tells you which system prompt regressed. - Cheap to share.
CacheTrackerisCloneand shares one inner history across tasks; safe to hand to many spawned tasks.
What it doesn't do
- Not a proxy. Not a router. Not a cache itself — it observes the provider's cache, doesn't store responses.
- Doesn't make HTTP calls; you do, then hand the
Usagetorecord(). - No HTTP middleware in this crate (yet). For automatic capture from
reqwest-based clients, watch for a futurecachebench-reqwestcompanion crate or hook your own middleware.
Sibling: Python cachebench
Python users: same library, same fingerprinting, same metrics — see MukundaKatta/cachebench (Python).
License
MIT