Module compression

Expand description

Session history compression via the RLM router.

This module contains the context-window enforcement logic that keeps the prompt under the model’s token budget. It is invoked automatically at the start of every agent step by Session::run_loop.

§Strategy

Estimate the current request token cost (system + messages + tools).
If it exceeds 90% of the model’s usable budget, compress the prefix of the conversation via RlmRouter::auto_process, keeping the most recent keep_last messages verbatim.
Progressively shrink keep_last (16 → 12 → 8 → 6) until the budget is met or nothing more can be compressed.

The compressed prefix is replaced by a single synthetic assistant message tagged [AUTO CONTEXT COMPRESSION] so the model sees a coherent summary rather than a truncated tail.

§Fallback decision table

┌────────────────────────────────────┬─────────────────────────────────────┬────────────────────────────────────┐
│ State after attempt                │ Action                              │ Events emitted                     │
├────────────────────────────────────┼─────────────────────────────────────┼────────────────────────────────────┤
│ RLM keep_last ∈ {16,12,8,6} fits   │ Stop; request is ready              │ CompactionStarted → Completed(Rlm) │
│ RLM auto_process errors on prefix  │ Fall back to chunk compression      │ (internal; logged via tracing)     │
│ All 4 keep_last values exhausted   │ Apply terminal truncation           │ Completed(Truncate) + Truncated    │
│ Terminal truncation still over bud │ Surface error to caller             │ Failed(fell_back_to = Truncate)    │
└────────────────────────────────────┴─────────────────────────────────────┴────────────────────────────────────┘

Terminal truncation drops the oldest messages outright (no summary) and is deliberately a distinct event from CompactionCompleted so consumers can warn the user about silent context loss.