Expand description
Streaming Context Generation (Task 1)
This module implements reactive context streaming for reduced TTFT (time-to-first-token) and progressive budget enforcement.
§Design
Instead of materializing all sections before returning, execute_streaming()
returns a Stream<Item = SectionChunk> that yields chunks as they become ready.
Priority Queue Stream Output
┌─────────────┐ ┌───────────────┐
│ P0: USER │──────►│ SectionHeader │
│ P1: HISTORY │ │ RowBlock │
│ P2: SEARCH │ │ RowBlock │
└─────────────┘ │ SearchResult │
│ ... │
└───────────────┘§Budget Enforcement
Rolling sum is maintained: B = Σ tokens(chunk_i)
Stream terminates when B ≥ token_limit.
§Complexity
- Scheduling: O(log S) per section where S = number of sections
- Budget tracking: O(m) where m = total chunks
- Tokenization: depends on exact vs estimated mode
Structs§
- Rolling
Budget - Thread-safe rolling budget tracker for streaming
- Streaming
Config - Configuration for streaming context generation
- Streaming
Context Executor - Streaming context executor
- Streaming
Context Iter - Iterator over streaming context chunks
- Streaming
Search Result - Streaming search result (subset of VectorSearchResult)
Enums§
- Section
Chunk - A chunk of context output during streaming
Functions§
- collect_
streaming_ chunks - Collect all chunks from a streaming context execution
- create_
streaming_ executor - Create a streaming context executor with default configuration
- materialize_
context - Materialize streaming chunks into a final context string