Expand description
Prompt-cache observability for LLM APIs.
Wrap your LLM call site with a CacheTracker, feed it the Usage
returned by the provider, and get per-call cache hit ratio, cost saved,
and regression alerts. Cross-provider (Anthropic, OpenAI, Bedrock).
§Quick start
use cachebench::{CacheTracker, Provider, Usage};
use std::time::Duration;
let tracker = CacheTracker::new(Provider::Anthropic)
.with_alert_threshold(0.6);
// After your LLM call:
let usage = Usage {
input_tokens: 100,
cache_read_tokens: 800,
cache_creation_tokens: 0,
output_tokens: 50,
};
let metrics = tracker.record("prefix-abc".into(), usage, Duration::from_millis(420));
assert_eq!(metrics.hit_ratio(), Some(1.0));Structs§
- Aggregate
- Aggregate statistics across many calls.
- Cache
Tracker - Records per-call cache metrics and exposes aggregate / per-prefix views.
- Call
Metrics - One recorded LLM call with cache metrics attached.
- Prefix
Stats - Per-prefix stats group; lets you spot which system prompt regressed.
- Pricing
- Per-million-token USD prices for one provider’s tier.
- Usage
- Token usage breakdown extracted from a provider response.
Enums§
- Provider
- LLM provider whose cache mechanics we’re tracking.
Constants§
- DEFAULT_
ANTHROPIC_ PRICING - Anthropic Claude Sonnet 4 default pricing as of late 2025.
- DEFAULT_
BEDROCK_ PRICING - Bedrock Claude default pricing as of late 2025.
- DEFAULT_
OPENAI_ PRICING - OpenAI GPT-4o default pricing as of late 2025.
Functions§
- fingerprint
- Stable hash of the cacheable prefix portion of a call.