Expand description
Experimental context-management strategies applied before RLM compaction.
The agentic loop re-sends the entire conversation every step, which means two structural wastes dominate token usage:
- Duplicate tool outputs. Agents frequently re-read the same
file, re-run the same
ls, or re-grep for the same pattern across many steps. The verbatim content appears multiple times in the history. Seededup. - Stale oversized tool outputs. A 40 KB
read_fileresult from step 2 is rarely relevant at step 30, yet it still costs full input tokens every turn. Seesnippet.
Both strategies are lossy in the strict sense but preserve referenceability: the model can always ask the agent to re-run the original tool call if it needs the full output back.
§Composition
apply_all runs every strategy in a fixed order against the live
Message buffer, mutating in place. Callers (the two prompt loops)
invoke it immediately before
enforce_context_window
so the RLM compaction pass sees the already-shrunken buffer. The
returned ExperimentalStats is logged at info level for
observability.
§Default-on, no config
These strategies are always active — there is intentionally no env
flag to disable them. If a future regression requires an escape
hatch, add a field to crate::config::Config rather than a magic
env var so the setting is discoverable.
§Examples
use codetether_agent::provider::{ContentPart, Message, Role};
use codetether_agent::session::helper::experimental::apply_all;
let tool_result = ContentPart::ToolResult {
tool_call_id: "call_a".into(),
content: "file contents: hello world".repeat(40),
};
let duplicate = ContentPart::ToolResult {
tool_call_id: "call_b".into(),
content: "file contents: hello world".repeat(40),
};
let mut msgs = vec![
Message { role: Role::Tool, content: vec![tool_result] },
Message { role: Role::Tool, content: vec![duplicate] },
];
let stats = apply_all(&mut msgs);
assert!(stats.total_bytes_saved > 0);
assert!(stats.dedup_hits >= 1);Modules§
- dedup
- Content-addressed deduplication of tool-result blocks.
- lingua
- Heuristic LLMLingua-style token/line pruning on stale assistant text.
- pairing
- Invariant-repair pass: ensure every tool_call has its tool_result and vice versa.
- snippet
- Head/tail snippet compaction of stale oversized tool outputs.
- streaming_
llm - StreamingLLM-style middle-drop with attention sinks.
- thinking_
prune - Strip extended-thinking blocks from older messages.
- tool_
call_ dedup - Collapse redundant identical tool calls.
Structs§
- Experimental
Stats - Aggregate outcome of every strategy in
apply_all.
Functions§
- apply_
all - Apply every experimental strategy in order, mutating
messagesin place. Returns aggregate statistics suitable for logging.