# Prompt Caching
phi-core automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to **90% savings** on input tokens.
## How It Works
In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.
### Provider Support
| **Anthropic** | Explicit (cache breakpoints) | 90% on hits | ✅ Auto-placed |
| **OpenAI** | Automatic (>1024 tokens) | 50% on hits | None needed |
| **Google Gemini** | Implicit (automatic) | Varies | None needed |
| **Azure OpenAI** | Automatic (same as OpenAI) | 50% on hits | None needed |
| **Amazon Bedrock** | Not yet implemented | N/A | `CacheConfig` accepted but no breakpoints placed |
### What Gets Cached (Anthropic)
phi-core places up to 3 cache breakpoints automatically:
1. **System prompt** — stable across all turns
2. **Tool definitions** — rarely change between turns
3. **Conversation history** — second-to-last message, so the growing prefix is cached
This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.
## Configuration
Caching is **enabled by default** with automatic breakpoint placement. No configuration needed for optimal behavior.
### Disable Caching
Use `CacheStrategy::Disabled` to turn off all cache breakpoint placement while keeping the config structure intact. Alternatively, set `enabled: false` on the `CacheConfig` master switch.
```rust
use phi_core::{BasicAgent, CacheConfig, CacheStrategy};
use phi_core::provider::ModelConfig;
let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();
// Option 1: CacheStrategy::Disabled (preferred — explicit intent)
let agent = BasicAgent::new(ModelConfig::anthropic(
"claude-sonnet-4-20250514",
"Claude Sonnet 4",
&api_key,
))
.with_cache_config(CacheConfig {
strategy: CacheStrategy::Disabled,
..Default::default()
});
// Option 2: Master switch (equivalent effect)
let agent = BasicAgent::new(ModelConfig::anthropic(
"claude-sonnet-4-20250514",
"Claude Sonnet 4",
&api_key,
))
.with_cache_config(CacheConfig {
enabled: false,
..Default::default()
});
```
### Fine-Grained Control
```rust
let agent = BasicAgent::new(ModelConfig::anthropic(
"claude-sonnet-4-20250514",
"Claude Sonnet 4",
&api_key,
))
.with_cache_config(CacheConfig {
enabled: true,
strategy: CacheStrategy::Manual {
cache_system: true,
cache_tools: true,
cache_messages: false, // Don't cache conversation history
},
});
```
## Monitoring Cache Usage
Every `Usage` struct includes cache statistics:
```rust
// After a response:
let usage = message.usage(); // from assistant message
println!("Cache read: {} tokens", usage.cache_read);
println!("Cache write: {} tokens", usage.cache_write);
println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0);
```
- **`cache_read`** — tokens served from cache (cheap)
- **`cache_write`** — tokens written to cache (slightly more than base price)
- **`cache_hit_rate()`** — fraction of input tokens from cache (0.0–1.0)
## Cost Impact
For a typical 10-turn agent conversation with Anthropic Claude:
| ~500K input tokens billed at full price | ~50K at full price + ~450K at 10% price |
| **$2.50** (Sonnet) | **$0.39** (Sonnet) |
That's an **84% cost reduction** with zero configuration.
## Best Practices
1. **Keep system prompts stable** — changing the system prompt between turns invalidates the cache
2. **Don't shuffle tools** — tool order matters for cache prefix matching
3. **Let it work automatically** — the default `CacheStrategy::Auto` is optimal for most use cases. The three strategies are `Auto` (recommended), `Disabled` (no breakpoints), and `Manual` (fine-grained control)
4. **Monitor `cache_hit_rate()`** — if it's consistently low, check if your system prompt or tools are changing unexpectedly