Expand description
Token Optimization Engine for PiSovereign
Adaptively compresses the full inference pipeline — input prompts, conversation history, RAG context, tool definitions, tool results, and output streams — to minimize token usage while preserving response quality. Operates as a decorator in the inference chain.
§Architecture
The optimizer sits inside the sanitization decorator:
SanitizedInferencePort → TokenOptimizedInferencePort → Cache → Ollama§Strategy
Uses an adaptive approach: lossless compression when within budget, progressively lossy (rolling summaries, extractive truncation) under token pressure. Falls through transparently on any error.
Re-exports§
pub use config::TokenOptimizationConfig;pub use error::TokenOptError;pub use estimator::TokenEstimator;pub use estimator_hf::HfTokenEstimator;pub use optimizer::TokenOptimizer;pub use ports::SummarizationPort;pub use prompt::template_loader::TemplateLoader;pub use pipeline::Pipeline;pub use types::OptimizationMetadata;pub use types::OptimizedPrompt;
Modules§
- budget
- Token budget allocation engine
- config
- Configuration for the token optimization engine
- error
- Error types for the token optimization engine
- estimator
- Token estimation using a character-based heuristic
- estimator_
hf - HuggingFace tokenizer-based token estimation.
- estimator_
language - Language-aware token estimation ratios.
- estimator_
tuning - Per-model token estimation calibration.
- history
- Conversation history compaction and summarization
- metrics
- Prometheus-compatible optimization metrics.
- optimizer
- Token optimization orchestrator
- output
- Output token control — query complexity classification and dynamic budget
- pipeline
- Fluent pipeline builder for standalone token optimization.
- ports
- Port definitions for the token optimization engine.
- profile
- Hardware profile auto-detection and adaptive configuration.
- prompt
- Prompt optimization — system prompt and RAG context
- stream
- Output stream optimization — repetition detection
- tools
- Tool calling optimization — schema compression, selection, and result truncation
- types
- Type definitions for the token optimization engine.
Constants§
- YAML_
PROMPTS - Pre-converted YAML prompts generated at build time.