Crate prompt_cache_warmer

Expand description

§prompt-cache-warmer

Pre-warm Anthropic prompt cache before user traffic.

Anthropic charges 25% more on the first request that creates a cache entry and 10% as much on subsequent reads. If user requests are slow or expensive on the first hit of a new system prompt, you want that first hit to be a cheap synthetic warmup, not a real user.

This crate:

Takes your long system prompt (string or block list) and a model name.
Inserts up to N cache_control breakpoints in the right places.
Fires a tiny warmup call (max_tokens = 8 by default).
Optionally fires a second verification call and asserts cache_read_input_tokens > 0.
Returns a WarmResult with timings, token counts, and estimated cost.

§Quick example

use prompt_cache_warmer::{Block, Usage, WarmCall, WarmRequest, WarmResponse, Warmer};

// BYO transport: anything that implements `WarmCall`.
struct FakeClient;
impl WarmCall for FakeClient {
    type Error = std::convert::Infallible;
    fn call(&self, _req: &WarmRequest) -> Result<WarmResponse, Self::Error> {
        Ok(WarmResponse {
            usage: Usage {
                input_tokens: 10,
                output_tokens: 4,
                cache_creation_input_tokens: 12_000,
                cache_read_input_tokens: 0,
            },
        })
    }
}

let warmer = Warmer::new(FakeClient);
let out = warmer
    .warm("claude-opus-4-7", "long system text")
    .unwrap();
assert_eq!(out.cache_creation_input_tokens, 12_000);

Warmer is generic over the transport, so you can plug in the real Anthropic HTTP client, a Bedrock wrapper, or a fake for tests.

Structs§

Block: One block of a system prompt.
Message: A single chat message (role + content text).
ModelPrice: Per-million-token list price for a single model.
Tool: An opaque tool definition. The transport owns serialization.
Usage: Token usage returned by the model on a single warm call.
WarmInput: High-level input for Warmer::warm/Warmer::warm_verified.
WarmRequest: The request payload handed to a WarmCall transport.
WarmResponse: Response shape returned by a WarmCall transport.
WarmResult: Result of a single Warmer::warm (or Warmer::warm_verified) call.
Warmer: Caller-facing entry point. Generic over a WarmCall transport.

Enums§

CacheControl: Cache control marker for a Block.

Constants§

CACHE_READ_MULTIPLIER: Cache-read multiplier applied to cache_read_input_tokens.
CACHE_WRITE_MULTIPLIER: Cache-write multiplier applied to cache_creation_input_tokens.

Traits§

WarmCall: A transport that can execute a WarmRequest and return a WarmResponse.

Functions§

add_cache_breakpoints: Add up to n ephemeral cache_control markers, evenly spaced and ending with the last block.
default_prices: Built-in best-effort pricing as of 2026-Q2. Override with Warmer::with_prices for unsupported models.
to_system_blocks: Coerce a system arg into the Anthropic block list shape.

Type Aliases§

PriceTable: Lookup table mapping model id -> ModelPrice.

Crate prompt_cache_warmer

Crate prompt_cache_warmer Copy item path

§prompt-cache-warmer

§Quick example

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate prompt_cache_warmer