Skip to main content

Module compression

Module compression 

Source
Expand description

Session history compression via the RLM router.

This module contains the context-window enforcement logic that keeps the prompt under the model’s token budget. It is invoked automatically at the start of every agent step by Session::run_loop.

§Strategy

  1. Estimate the current request token cost (system + messages + tools).
  2. If it exceeds 90% of the model’s usable budget, compress the prefix of the conversation via RlmRouter::auto_process, keeping the most recent keep_last messages verbatim.
  3. Progressively shrink keep_last (16 → 12 → 8 → 6) until the budget is met or nothing more can be compressed.

The compressed prefix is replaced by a single synthetic assistant message tagged [AUTO CONTEXT COMPRESSION] so the model sees a coherent summary rather than a truncated tail.