Expand description
Shared token counting utilities.
Uses tokenizers for accurate counts against Qwen models, falls back to tiktoken-rs
and finally to a conservative heuristic if tokenizer initialization fails.
A per-content hash cache avoids redundant tokenization for repeated strings. The cache is capped at a fixed size and cleared entirely when full (simple eviction that avoids the overhead of an LRU bookkeeping structure).
Functions§
- estimate_
content_ tokens - Estimate tokens for raw content.
- estimate_
tokens_ with_ overhead - Estimate token count for content and add a fixed per-message overhead.