Skip to main content

Crate prompt_cache_warmer

Crate prompt_cache_warmer 

Source
Expand description

§prompt-cache-warmer

Pre-warm Anthropic prompt cache before user traffic.

Anthropic charges 25% more on the first request that creates a cache entry and 10% as much on subsequent reads. If user requests are slow or expensive on the first hit of a new system prompt, you want that first hit to be a cheap synthetic warmup, not a real user.

This crate:

  1. Takes your long system prompt (string or block list) and a model name.
  2. Inserts up to N cache_control breakpoints in the right places.
  3. Fires a tiny warmup call (max_tokens = 8 by default).
  4. Optionally fires a second verification call and asserts cache_read_input_tokens > 0.
  5. Returns a WarmResult with timings, token counts, and estimated cost.

§Quick example

use prompt_cache_warmer::{Block, Usage, WarmCall, WarmRequest, WarmResponse, Warmer};

// BYO transport: anything that implements `WarmCall`.
struct FakeClient;
impl WarmCall for FakeClient {
    type Error = std::convert::Infallible;
    fn call(&self, _req: &WarmRequest) -> Result<WarmResponse, Self::Error> {
        Ok(WarmResponse {
            usage: Usage {
                input_tokens: 10,
                output_tokens: 4,
                cache_creation_input_tokens: 12_000,
                cache_read_input_tokens: 0,
            },
        })
    }
}

let warmer = Warmer::new(FakeClient);
let out = warmer
    .warm("claude-opus-4-7", "long system text")
    .unwrap();
assert_eq!(out.cache_creation_input_tokens, 12_000);

Warmer is generic over the transport, so you can plug in the real Anthropic HTTP client, a Bedrock wrapper, or a fake for tests.

Structs§

Block
One block of a system prompt.
Message
A single chat message (role + content text).
ModelPrice
Per-million-token list price for a single model.
Tool
An opaque tool definition. The transport owns serialization.
Usage
Token usage returned by the model on a single warm call.
WarmInput
High-level input for Warmer::warm/Warmer::warm_verified.
WarmRequest
The request payload handed to a WarmCall transport.
WarmResponse
Response shape returned by a WarmCall transport.
WarmResult
Result of a single Warmer::warm (or Warmer::warm_verified) call.
Warmer
Caller-facing entry point. Generic over a WarmCall transport.

Enums§

CacheControl
Cache control marker for a Block.

Constants§

CACHE_READ_MULTIPLIER
Cache-read multiplier applied to cache_read_input_tokens.
CACHE_WRITE_MULTIPLIER
Cache-write multiplier applied to cache_creation_input_tokens.

Traits§

WarmCall
A transport that can execute a WarmRequest and return a WarmResponse.

Functions§

add_cache_breakpoints
Add up to n ephemeral cache_control markers, evenly spaced and ending with the last block.
default_prices
Built-in best-effort pricing as of 2026-Q2. Override with Warmer::with_prices for unsupported models.
to_system_blocks
Coerce a system arg into the Anthropic block list shape.

Type Aliases§

PriceTable
Lookup table mapping model id -> ModelPrice.