Skip to main content

Crate cachebench

Crate cachebench 

Source
Expand description

Prompt-cache observability for LLM APIs.

Wrap your LLM call site with a CacheTracker, feed it the Usage returned by the provider, and get per-call cache hit ratio, cost saved, and regression alerts. Cross-provider (Anthropic, OpenAI, Bedrock).

§Quick start

use cachebench::{CacheTracker, Provider, Usage};
use std::time::Duration;

let tracker = CacheTracker::new(Provider::Anthropic)
    .with_alert_threshold(0.6);

// After your LLM call:
let usage = Usage {
    input_tokens: 100,
    cache_read_tokens: 800,
    cache_creation_tokens: 0,
    output_tokens: 50,
};
let metrics = tracker.record("prefix-abc".into(), usage, Duration::from_millis(420));
assert_eq!(metrics.hit_ratio(), Some(1.0));

Structs§

Aggregate
Aggregate statistics across many calls.
CacheTracker
Records per-call cache metrics and exposes aggregate / per-prefix views.
CallMetrics
One recorded LLM call with cache metrics attached.
PrefixStats
Per-prefix stats group; lets you spot which system prompt regressed.
Pricing
Per-million-token USD prices for one provider’s tier.
Usage
Token usage breakdown extracted from a provider response.

Enums§

Provider
LLM provider whose cache mechanics we’re tracking.

Constants§

DEFAULT_ANTHROPIC_PRICING
Anthropic Claude Sonnet 4 default pricing as of late 2025.
DEFAULT_BEDROCK_PRICING
Bedrock Claude default pricing as of late 2025.
DEFAULT_OPENAI_PRICING
OpenAI GPT-4o default pricing as of late 2025.

Functions§

fingerprint
Stable hash of the cacheable prefix portion of a call.