Expand description
Conversation-history compaction planning.
When a chat thread grows past its slice of the token budget, the assembler would otherwise drop the oldest turns outright. Compaction instead keeps the recent turns verbatim and folds the overflow prefix into a short summary, so older context survives in compressed form rather than vanishing.
This module is the pure planning half: it decides which turns fit and which overflow, with no LLM and no I/O, so the policy is fully unit-tested. The signal pipeline owns the LLM-backed summarization + caching and feeds the kept/overflow split from here.
Structs§
- History
Plan - How
historyshould be split to fit a token budget.
Functions§
- plan_
history_ compaction - Plan history compaction against a
budget(in tokens), holding backreservetokens for the summary note the caller will prepend so the finalsummary + keep_recentstill fits.