Crate ai_tokenopt

Expand description

Token Optimization Engine for PiSovereign

Adaptively compresses the full inference pipeline — input prompts, conversation history, RAG context, tool definitions, tool results, and output streams — to minimize token usage while preserving response quality. Operates as a decorator in the inference chain.

§Architecture

The optimizer sits inside the sanitization decorator:

SanitizedInferencePort  →  TokenOptimizedInferencePort  →  Cache  →  Ollama

§Strategy

Uses an adaptive approach: lossless compression when within budget, progressively lossy (rolling summaries, extractive truncation) under token pressure. Falls through transparently on any error.

Re-exports§

pub use config::TokenOptimizationConfig;
pub use error::TokenOptError;
pub use estimator::TokenEstimator;
pub use estimator_hf::HfTokenEstimator;
pub use optimizer::TokenOptimizer;
pub use ports::SummarizationPort;
pub use prompt::template_loader::TemplateLoader;
pub use pipeline::Pipeline;
pub use types::OptimizationMetadata;
pub use types::OptimizedPrompt;

Modules§

budget: Token budget allocation engine
config: Configuration for the token optimization engine
error: Error types for the token optimization engine
estimator: Token estimation using a character-based heuristic
estimator_hf: HuggingFace tokenizer-based token estimation.
estimator_language: Language-aware token estimation ratios.
estimator_tuning: Per-model token estimation calibration.
history: Conversation history compaction and summarization
metrics: Prometheus-compatible optimization metrics.
optimizer: Token optimization orchestrator
output: Output token control — query complexity classification and dynamic budget
pipeline: Fluent pipeline builder for standalone token optimization.
ports: Port definitions for the token optimization engine.
profile: Hardware profile auto-detection and adaptive configuration.
prompt: Prompt optimization — system prompt and RAG context
stream: Output stream optimization — repetition detection
tools: Tool calling optimization — schema compression, selection, and result truncation
types: Type definitions for the token optimization engine.

Constants§

YAML_PROMPTS: Pre-converted YAML prompts generated at build time.