Module experimental

Expand description

Experimental context-management strategies applied before RLM compaction.

The agentic loop re-sends the entire conversation every step, which means two structural wastes dominate token usage:

Duplicate tool outputs. Agents frequently re-read the same file, re-run the same ls, or re-grep for the same pattern across many steps. The verbatim content appears multiple times in the history. See dedup.
Stale oversized tool outputs. A 40 KB read_file result from step 2 is rarely relevant at step 30, yet it still costs full input tokens every turn. See snippet.

Both strategies are lossy in the strict sense but preserve referenceability: the model can always ask the agent to re-run the original tool call if it needs the full output back.

§Composition

apply_all runs every strategy in a fixed order against the live Message buffer, mutating in place. Callers (the two prompt loops) invoke it immediately before enforce_context_window so the RLM compaction pass sees the already-shrunken buffer. The returned ExperimentalStats is logged at info level for observability.

§Default-on, no config

These strategies are always active — there is intentionally no env flag to disable them. If a future regression requires an escape hatch, add a field to crate::config::Config rather than a magic env var so the setting is discoverable.

§Examples

use codetether_agent::provider::{ContentPart, Message, Role};
use codetether_agent::session::helper::experimental::apply_all;

let tool_result = ContentPart::ToolResult {
    tool_call_id: "call_a".into(),
    content: "file contents: hello world".repeat(40),
};
let duplicate = ContentPart::ToolResult {
    tool_call_id: "call_b".into(),
    content: "file contents: hello world".repeat(40),
};

let mut msgs = vec![
    Message { role: Role::Tool, content: vec![tool_result] },
    Message { role: Role::Tool, content: vec![duplicate] },
];

let stats = apply_all(&mut msgs);
assert!(stats.total_bytes_saved > 0);
assert!(stats.dedup_hits >= 1);

Modules§

dedup: Content-addressed deduplication of tool-result blocks.
lingua: Heuristic LLMLingua-style token/line pruning on stale assistant text.
pairing: Invariant-repair pass: ensure every tool_call has its tool_result and vice versa.
snippet: Head/tail snippet compaction of stale oversized tool outputs.
streaming_llm: StreamingLLM-style middle-drop with attention sinks.
thinking_prune: Strip extended-thinking blocks from older messages.
tool_call_dedup: Collapse redundant identical tool calls.

Structs§

ExperimentalStats: Aggregate outcome of every strategy in apply_all.

Functions§

apply_all: Apply every experimental strategy in order, mutating messages in place. Returns aggregate statistics suitable for logging.