Crate cachebench

Expand description

Prompt-cache observability for LLM APIs.

Wrap your LLM call site with a CacheTracker, feed it the Usage returned by the provider, and get per-call cache hit ratio, cost saved, and regression alerts. Cross-provider (Anthropic, OpenAI, Bedrock).

§Quick start

use cachebench::{CacheTracker, Provider, Usage};
use std::time::Duration;

let tracker = CacheTracker::new(Provider::Anthropic)
    .with_alert_threshold(0.6);

// After your LLM call:
let usage = Usage {
    input_tokens: 100,
    cache_read_tokens: 800,
    cache_creation_tokens: 0,
    output_tokens: 50,
};
let metrics = tracker.record("prefix-abc".into(), usage, Duration::from_millis(420));
assert_eq!(metrics.hit_ratio(), Some(1.0));

Structs§

Aggregate: Aggregate statistics across many calls.
CacheTracker: Records per-call cache metrics and exposes aggregate / per-prefix views.
CallMetrics: One recorded LLM call with cache metrics attached.
PrefixStats: Per-prefix stats group; lets you spot which system prompt regressed.
Pricing: Per-million-token USD prices for one provider’s tier.
Usage: Token usage breakdown extracted from a provider response.

Enums§

Provider: LLM provider whose cache mechanics we’re tracking.

Constants§

DEFAULT_ANTHROPIC_PRICING: Anthropic Claude Sonnet 4 default pricing as of late 2025.
DEFAULT_BEDROCK_PRICING: Bedrock Claude default pricing as of late 2025.
DEFAULT_OPENAI_PRICING: OpenAI GPT-4o default pricing as of late 2025.

Functions§

fingerprint: Stable hash of the cacheable prefix portion of a call.

Crate cachebench

Crate cachebench Copy item path

§Quick start

Structs§

Enums§

Constants§

Functions§

Crate cachebench