1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
//! Types for explicit message trimming, summarization, and compression policies.
//!
//! All policy decisions — when to summarize, what to keep, and what provenance
//! to record — are expressed as data types so they can be inspected, tested,
//! and audited without coupling to any particular LLM provider.
use async_trait;
use ;
use crateResult;
use crateMessage;
// ---------------------------------------------------------------------------
// Token estimation
// ---------------------------------------------------------------------------
/// A cheap heuristic estimate of the number of tokens in a piece of text.
///
/// The value is derived by [`estimate_tokens`] and should be treated as an
/// approximation only — it does not use a real tokenizer.
// ---------------------------------------------------------------------------
// Trim strategy
// ---------------------------------------------------------------------------
/// How to trim a message list when it grows too long.
///
/// Trimming is a best-effort, synchronous operation that does not call an LLM.
/// It simply drops messages from the slice according to the chosen rule.
/// System messages are never dropped by default unless the strategy is
/// `MaxTokens` and the budget is so tight that even system content must be
/// shed.
// ---------------------------------------------------------------------------
// Compression provenance
// ---------------------------------------------------------------------------
/// Metadata that records *why* a set of messages was removed or replaced by a
/// summary.
///
/// Provenance is required by the summarization spec so that users can audit
/// what was compressed and under which policy.
// ---------------------------------------------------------------------------
// Summary record
// ---------------------------------------------------------------------------
/// A single summary produced by a [`Summarizer`], together with the provenance
/// that explains which messages it replaced and why.
// ---------------------------------------------------------------------------
// Summarizer trait
// ---------------------------------------------------------------------------
/// Async trait for turning a slice of messages into a [`SummaryRecord`].
///
/// Implementations range from deterministic concatenation stubs (see
/// [`ConcatSummarizer`]) to real LLM-backed compressors. The trait is
/// object-safe so harness layers can store `Box<dyn Summarizer>`.
// ---------------------------------------------------------------------------
// ConcatSummarizer
// ---------------------------------------------------------------------------
/// A deterministic, LLM-free summarizer for testing and fallback use.
///
/// It concatenates the text of all provided messages into a single system
/// message, prefixed by a header. No external call is made; the result is
/// fully reproducible.
///
/// # Provenance
///
/// Because [`Message`] carries no stable id, `ConcatSummarizer` assigns
/// synthetic positional ids of the form `"msg-0"`, `"msg-1"`, … based on
/// the index of each message within the supplied slice.
;
// ---------------------------------------------------------------------------
// Summarization policy
// ---------------------------------------------------------------------------
/// Policy describing *when* to summarize and *how much* to retain verbatim.
///
/// The policy does not perform summarization itself — it only decides whether
/// summarization is needed and splits the message list accordingly. Pass the
/// split output to a [`Summarizer`] implementation.
///
/// # Context-window awareness
///
/// When [`context_window`][Self::context_window] is set (typically from a
/// model's [`ModelProfile::max_input_tokens`]), the policy only triggers once
/// the estimated tokens reach [`threshold_fraction`][Self::threshold_fraction]
/// of that window (default `0.9`, i.e. 90%). When `context_window` is `None`
/// the policy falls back to the raw [`trigger_tokens`][Self::trigger_tokens]
/// threshold, preserving the original behaviour.
///
/// [`ModelProfile::max_input_tokens`]: crate::harness::model::ModelProfile::max_input_tokens
///
/// # Example
///
/// ```
/// use tinyagents::harness::message::Message;
/// use tinyagents::harness::summarization::SummarizationPolicy;
///
/// let policy = SummarizationPolicy {
/// trigger_tokens: 2000,
/// keep_last: 4,
/// ..Default::default()
/// };
/// let msgs = vec![Message::user("hello"), Message::assistant("world")];
/// assert!(!policy.should_summarize(&msgs));
/// ```
/// The default [`SummarizationPolicy::threshold_fraction`] (90% of the context
/// window).
pub