Skip to main content

Module context

Module context 

Source
Expand description

Context window management with token budgeting.

This module provides ContextWindow, a token-aware message buffer that tracks conversation history and signals when compaction is needed.

§Design Philosophy

The library doesn’t tokenize text (that requires model-specific tokenizers). Instead:

  • Token counts are fed from provider-reported Usage after each call
  • estimate_tokens provides a rough heuristic for pre-call estimation
  • Compaction is the caller’s responsibility — the library signals when to compact and returns messages to summarize, but summarization is an LLM call the application controls

§Example

use llm_stack_core::context::ContextWindow;
use llm_stack_core::ChatMessage;

// 8K context window, reserve 1K for output
let mut window = ContextWindow::new(8000, 1000);

// Add messages with their token counts (from provider usage)
window.push(ChatMessage::system("You are helpful."), 10);
window.push(ChatMessage::user("Hello!"), 5);
window.push(ChatMessage::assistant("Hi there!"), 8);

// Check available space
assert_eq!(window.available(), 8000 - 1000 - 10 - 5 - 8);

// Protect recent messages from compaction
window.protect_recent(2);

// Check if compaction is needed (e.g., when 80% full)
if window.needs_compaction(0.8) {
    let old_messages = window.compact();
    // Summarize old_messages with an LLM call, then:
    // window.push(ChatMessage::system("Summary: ..."), summary_tokens);
}

Structs§

ContextWindow
A token-budgeted message buffer for managing conversation context.

Functions§

estimate_message_tokens
Estimates tokens for a chat message.
estimate_tokens
Estimates the token count for a string.