Module compaction

Expand description

Conversation-history compaction planning.

When a chat thread grows past its slice of the token budget, the assembler would otherwise drop the oldest turns outright. Compaction instead keeps the recent turns verbatim and folds the overflow prefix into a short summary, so older context survives in compressed form rather than vanishing.

This module is the pure planning half: it decides which turns fit and which overflow, with no LLM and no I/O, so the policy is fully unit-tested. The signal pipeline owns the LLM-backed summarization + caching and feeds the kept/overflow split from here.

Structs§

HistoryPlan: How history should be split to fit a token budget.

Functions§

plan_history_compaction: Plan history compaction against a budget (in tokens), holding back reserve tokens for the summary note the caller will prepend so the final summary + keep_recent still fits.