# Context Management
Long-running agents accumulate messages that exceed the model's context window. phi-core provides token tracking, overflow detection, tiered compaction, and execution limits.
The `context` module is split into sub-modules: `token`, `config`, `tracker`, `compaction`, `strategy`, `compact_messages`, `execution`, `orchestration`.
## Token Estimation
Fast estimation without external tokenizer dependencies:
```rust
use phi_core::context::{estimate_tokens, message_tokens, total_tokens};
estimate_tokens("Hello world"); // ~3 tokens (chars / 4)
message_tokens(&agent_message); // estimate for a single message
total_tokens(&messages); // estimate for all messages
```
## Context Tracking
`ContextTracker` combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:
```rust
use phi_core::context::ContextTracker;
let mut tracker = ContextTracker::new();
// After each assistant response, record the real usage:
tracker.record_usage(&assistant_usage, message_index);
// Get current context size (real usage + estimated trailing):
let tokens = tracker.estimate_context_tokens(agent.messages());
// After compaction, reset the tracker:
tracker.reset();
```
When no usage data is available, it falls back to chars/4 estimation.
## Context Overflow Detection
When the context exceeds a model's window, providers return overflow errors. phi-core detects these automatically across all major providers.
### HTTP-level detection
Providers that check before streaming (Google, Bedrock, Vertex) return `ProviderError::ContextOverflow`:
```rust
use phi_core::provider::ProviderError;
match agent.prompt("...").await {
// The loop already handles this — but you can also match it:
Err(ProviderError::ContextOverflow { message }) => {
// Compact and retry
}
_ => {}
}
```
`ProviderError::classify()` auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.
### Message-level detection
SSE-based providers (Anthropic, OpenAI) return overflow as a `StopReason::Error` message. Check with:
```rust
if message.is_context_overflow() {
// Compact and retry
}
```
### Handling overflow in your application
phi-core provides the detection and building blocks. Your application wires the compaction strategy:
```rust
// Proactive: check before each prompt
let tokens = tracker.estimate_context_tokens(agent.messages());
if tokens > context_window - reserve {
let compacted = compact_messages(agent.messages().to_vec(), &config);
agent.replace_messages(compacted);
}
// Reactive: catch overflow errors
// ... on ContextOverflow or message.is_context_overflow():
// compact, then retry with agent.continue_loop()
```
For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — phi-core provides `replace_messages()` and `compact_messages()` as building blocks.
## ContextConfig
```rust
pub struct ContextConfig {
pub max_context_tokens: usize, // Default: 100,000
pub system_prompt_tokens: usize, // Default: 4,000
pub compaction: CompactionConfig, // Primary compaction settings
// Custom token counter (serde-skipped). None → HeuristicTokenCounter (chars/4).
pub token_counter: Option<Arc<dyn TokenCounter>>,
// Legacy backward-compat fields (prefer CompactionConfig equivalents):
pub keep_recent: usize, // Default: 10
pub keep_first: usize, // Default: 2
pub tool_output_max_lines: usize, // Default: 50
}
pub struct CompactionConfig {
// ── WHEN to compact ──
pub compact_at_pct: f64, // Default: 0.90 (90%)
pub compact_budget_threshold_pct: f64, // Default: 0.05 (5%)
pub compaction_scope: CompactionScope, // Default: FixedCount(3)
// ── HOW to compact ──
pub keep_first_turns: usize, // Default: 2
pub keep_recent_turns: usize, // Default: 10
pub max_summary_tokens: usize, // Default: 2_000 (budget, not per-turn)
pub tool_output_max_lines: usize, // Default: 50
}
```
### `CompactionScope`
Controls how many earlier loops are included in compaction and context loading:
| `FixedCount(usize)` | Compact a fixed number of earlier loops on the active chain. Default: `FixedCount(3)`. |
| `TokenBudget` | Walk the chain backward, accumulating per-loop token estimates, and stop when `max_context_tokens` would be exceeded. Loops whose raw messages exceed the budget are still included — their compacted summaries will fit. |
See [compaction.md](compaction.md) for full details on the non-destructive overlay model.
## Tiered Compaction
`compact_messages()` tries each level in order, stopping as soon as messages fit the budget:
### Level 1: Truncate Tool Outputs
Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.
### Level 2: Summarize Old Turns
Keeps the last `keep_recent` messages in full detail. Older assistant messages are replaced with one-line summaries like `"[Summary] [Assistant used 3 tool(s)]"`, and their tool results are dropped.
### Level 3: Drop Middle Messages
Keeps `keep_first` messages from the start and `keep_recent` from the end, dropping everything in between. A marker message notes how many were removed.
## ExecutionLimits
Prevents runaway agents:
```rust
pub struct ExecutionLimits {
pub max_turns: usize, // Default: 50
pub max_total_tokens: usize, // Default: 1,000,000
pub max_duration: Duration, // Default: 600s (10 min)
pub max_cost: Option<f64>, // Default: None (no cost cap)
}
```
`max_cost` caps cumulative dollar cost for the run. Requires `AgentLoopConfig.cost_config` to be set — without pricing rates the accumulated cost is always 0.0 and this limit has no effect.
When a limit is reached, the agent stops with a message like `"[Agent stopped: Max turns reached (50/50)]"`.
## Disabling Context Management
```rust
let agent = BasicAgent::new(model_config)
.without_context_management();
```
This sets both `context_config` and `execution_limits` to `None`.