Skip to main content

Crate ai_tokenopt

Crate ai_tokenopt 

Source
Expand description

Token Optimization Engine for PiSovereign

Adaptively compresses the full inference pipeline — input prompts, conversation history, RAG context, tool definitions, tool results, and output streams — to minimize token usage while preserving response quality. Operates as a decorator in the inference chain.

§Architecture

The optimizer sits inside the sanitization decorator:

SanitizedInferencePort  →  TokenOptimizedInferencePort  →  Cache  →  Ollama

§Strategy

Uses an adaptive approach: lossless compression when within budget, progressively lossy (rolling summaries, extractive truncation) under token pressure. Falls through transparently on any error.

Re-exports§

pub use config::TokenOptimizationConfig;
pub use error::TokenOptError;
pub use estimator::TokenEstimator;
pub use estimator_hf::HfTokenEstimator;
pub use optimizer::TokenOptimizer;
pub use ports::SummarizationPort;
pub use prompt::template_loader::TemplateLoader;
pub use pipeline::Pipeline;
pub use types::OptimizationMetadata;
pub use types::OptimizedPrompt;

Modules§

budget
Token budget allocation engine
config
Configuration for the token optimization engine
error
Error types for the token optimization engine
estimator
Token estimation using a character-based heuristic
estimator_hf
HuggingFace tokenizer-based token estimation.
estimator_language
Language-aware token estimation ratios.
estimator_tuning
Per-model token estimation calibration.
history
Conversation history compaction and summarization
metrics
Prometheus-compatible optimization metrics.
optimizer
Token optimization orchestrator
output
Output token control — query complexity classification and dynamic budget
pipeline
Fluent pipeline builder for standalone token optimization.
ports
Port definitions for the token optimization engine.
profile
Hardware profile auto-detection and adaptive configuration.
prompt
Prompt optimization — system prompt and RAG context
stream
Output stream optimization — repetition detection
tools
Tool calling optimization — schema compression, selection, and result truncation
types
Type definitions for the token optimization engine.

Constants§

YAML_PROMPTS
Pre-converted YAML prompts generated at build time.