Expand description
History compaction.
Manages conversation history size by summarizing older messages when the context window limit approaches. Implements three compaction strategies:
- Auto-compact: triggered when estimated tokens exceed threshold
- Reactive compact: triggered by API
prompt_too_longerrors - Microcompact: clears stale tool results to free tokens
§Thresholds
|<--- context window (e.g., 200K) -------------------------------->|
|<--- effective window (context - 20K reserved) ------------------>|
|<--- auto-compact threshold (effective - 13K buffer) ------------>|
| ↑ compact fires hereStructs§
- Compact
Tracking - Tracking state for auto-compact across turns.
- Token
Warning State - Token warning state for the UI.
Constants§
- MAX_
OUTPUT_ TOKENS_ RECOVERY_ LIMIT - Maximum recovery attempts for max-output-tokens errors.
Functions§
- auto_
compact_ threshold - Calculate the auto-compact threshold.
- build_
compact_ summary_ prompt - Build a compact summary request: asks the LLM to summarize the conversation up to a certain point.
- compact_
boundary_ message - Create a compact boundary marker message.
- compact_
with_ llm - Perform full LLM-based compaction of the conversation history.
- effective_
context_ window - Calculate the effective context window (total minus output reservation).
- max_
output_ recovery_ message - Build the recovery message injected when max-output-tokens is hit.
- microcompact
- Perform microcompact: clear stale tool results to free tokens.
- parse_
prompt_ too_ long_ gap - Parse a “prompt too long” error to extract the token gap.
- should_
auto_ compact - Check whether auto-compact should fire for this conversation.
- token_
warning_ state - Calculate token warning state for the current conversation.