Expand description
§prompt-cache-warmer
Pre-warm Anthropic prompt cache before user traffic.
Anthropic charges 25% more on the first request that creates a cache entry and 10% as much on subsequent reads. If user requests are slow or expensive on the first hit of a new system prompt, you want that first hit to be a cheap synthetic warmup, not a real user.
This crate:
- Takes your long system prompt (string or block list) and a model name.
- Inserts up to N
cache_controlbreakpoints in the right places. - Fires a tiny warmup call (
max_tokens = 8by default). - Optionally fires a second verification call and asserts
cache_read_input_tokens > 0. - Returns a
WarmResultwith timings, token counts, and estimated cost.
§Quick example
use prompt_cache_warmer::{Block, Usage, WarmCall, WarmRequest, WarmResponse, Warmer};
// BYO transport: anything that implements `WarmCall`.
struct FakeClient;
impl WarmCall for FakeClient {
type Error = std::convert::Infallible;
fn call(&self, _req: &WarmRequest) -> Result<WarmResponse, Self::Error> {
Ok(WarmResponse {
usage: Usage {
input_tokens: 10,
output_tokens: 4,
cache_creation_input_tokens: 12_000,
cache_read_input_tokens: 0,
},
})
}
}
let warmer = Warmer::new(FakeClient);
let out = warmer
.warm("claude-opus-4-7", "long system text")
.unwrap();
assert_eq!(out.cache_creation_input_tokens, 12_000);Warmer is generic over the transport, so you can plug in the real
Anthropic HTTP client, a Bedrock wrapper, or a fake for tests.
Structs§
- Block
- One block of a system prompt.
- Message
- A single chat message (role + content text).
- Model
Price - Per-million-token list price for a single model.
- Tool
- An opaque tool definition. The transport owns serialization.
- Usage
- Token usage returned by the model on a single warm call.
- Warm
Input - High-level input for
Warmer::warm/Warmer::warm_verified. - Warm
Request - The request payload handed to a
WarmCalltransport. - Warm
Response - Response shape returned by a
WarmCalltransport. - Warm
Result - Result of a single
Warmer::warm(orWarmer::warm_verified) call. - Warmer
- Caller-facing entry point. Generic over a
WarmCalltransport.
Enums§
- Cache
Control - Cache control marker for a
Block.
Constants§
- CACHE_
READ_ MULTIPLIER - Cache-read multiplier applied to
cache_read_input_tokens. - CACHE_
WRITE_ MULTIPLIER - Cache-write multiplier applied to
cache_creation_input_tokens.
Traits§
- Warm
Call - A transport that can execute a
WarmRequestand return aWarmResponse.
Functions§
- add_
cache_ breakpoints - Add up to
nephemeralcache_controlmarkers, evenly spaced and ending with the last block. - default_
prices - Built-in best-effort pricing as of 2026-Q2. Override with
Warmer::with_pricesfor unsupported models. - to_
system_ blocks - Coerce a system arg into the Anthropic block list shape.
Type Aliases§
- Price
Table - Lookup table mapping model id ->
ModelPrice.