Expand description
Context window management with token budgeting.
This module provides ContextWindow, a token-aware message buffer that
tracks conversation history and signals when compaction is needed.
§Design Philosophy
The library doesn’t tokenize text (that requires model-specific tokenizers). Instead:
- Token counts are fed from provider-reported
Usageafter each call estimate_tokensprovides a rough heuristic for pre-call estimation- Compaction is the caller’s responsibility — the library signals when to compact and returns messages to summarize, but summarization is an LLM call the application controls
§Example
use llm_stack_core::context::ContextWindow;
use llm_stack_core::ChatMessage;
// 8K context window, reserve 1K for output
let mut window = ContextWindow::new(8000, 1000);
// Add messages with their token counts (from provider usage)
window.push(ChatMessage::system("You are helpful."), 10);
window.push(ChatMessage::user("Hello!"), 5);
window.push(ChatMessage::assistant("Hi there!"), 8);
// Check available space
assert_eq!(window.available(), 8000 - 1000 - 10 - 5 - 8);
// Protect recent messages from compaction
window.protect_recent(2);
// Check if compaction is needed (e.g., when 80% full)
if window.needs_compaction(0.8) {
let old_messages = window.compact();
// Summarize old_messages with an LLM call, then:
// window.push(ChatMessage::system("Summary: ..."), summary_tokens);
}Structs§
- Context
Window - A token-budgeted message buffer for managing conversation context.
Functions§
- estimate_
message_ tokens - Estimates tokens for a chat message.
- estimate_
tokens - Estimates the token count for a string.