Expand description
Accurate token counting using actual BPE tokenizers
This module provides accurate token counts using tiktoken for OpenAI models and estimation-based counting for other models.
§Supported Models
§OpenAI (Exact tokenization via tiktoken)
- o200k_base: GPT-5.2, GPT-5.1, GPT-5, GPT-4o, O1, O3, O4 (all latest models)
- cl100k_base: GPT-4, GPT-3.5-turbo (legacy models)
§Other Vendors (Estimation-based)
- Claude (Anthropic): ~3.5 chars/token
- Gemini (Google): ~3.8 chars/token
- Llama (Meta): ~3.5 chars/token
- Mistral: ~3.5 chars/token
- DeepSeek: ~3.5 chars/token
- Qwen (Alibaba): ~3.5 chars/token
- Cohere: ~3.6 chars/token
- Grok (xAI): ~3.5 chars/token
Structs§
- Token
Counts - Token counts for multiple models
- Tokenizer
- Accurate token counter with fallback to estimation
Enums§
- Token
Model - Supported LLM models for token counting
Functions§
- quick_
estimate - Quick estimation without creating a Tokenizer instance