Expand description
Session history compression via the RLM router.
This module contains the context-window enforcement logic that keeps
the prompt under the model’s token budget. It is invoked automatically
at the start of every agent step by Session::run_loop.
§Strategy
- Estimate the current request token cost (system + messages + tools).
- If it exceeds 90% of the model’s usable budget, compress the prefix
of the conversation via
RlmRouter::auto_process, keeping the most recentkeep_lastmessages verbatim. - Progressively shrink
keep_last(16 → 12 → 8 → 6) until the budget is met or nothing more can be compressed.
The compressed prefix is replaced by a single synthetic assistant
message tagged [AUTO CONTEXT COMPRESSION] so the model sees a
coherent summary rather than a truncated tail.
§Fallback decision table
┌────────────────────────────────────┬─────────────────────────────────────┬────────────────────────────────────┐
│ State after attempt │ Action │ Events emitted │
├────────────────────────────────────┼─────────────────────────────────────┼────────────────────────────────────┤
│ RLM keep_last ∈ {16,12,8,6} fits │ Stop; request is ready │ CompactionStarted → Completed(Rlm) │
│ RLM auto_process errors on prefix │ Fall back to chunk compression │ (internal; logged via tracing) │
│ All 4 keep_last values exhausted │ Apply terminal truncation │ Completed(Truncate) + Truncated │
│ Terminal truncation still over bud │ Surface error to caller │ Failed(fell_back_to = Truncate) │
└────────────────────────────────────┴─────────────────────────────────────┴────────────────────────────────────┘Terminal truncation drops the oldest messages outright (no summary)
and is deliberately a distinct event from CompactionCompleted so
consumers can warn the user about silent context loss.