Module tokenizer

Module tokenizer 

Source
Expand description

Accurate token counting using actual BPE tokenizers

This module provides accurate token counts using tiktoken for OpenAI models and estimation-based counting for other models.

§Supported Models

§OpenAI (Exact tokenization via tiktoken)

  • o200k_base: GPT-5.2, GPT-5.1, GPT-5, GPT-4o, O1, O3, O4 (all latest models)
  • cl100k_base: GPT-4, GPT-3.5-turbo (legacy models)

§Other Vendors (Estimation-based)

  • Claude (Anthropic): ~3.5 chars/token
  • Gemini (Google): ~3.8 chars/token
  • Llama (Meta): ~3.5 chars/token
  • Mistral: ~3.5 chars/token
  • DeepSeek: ~3.5 chars/token
  • Qwen (Alibaba): ~3.5 chars/token
  • Cohere: ~3.6 chars/token
  • Grok (xAI): ~3.5 chars/token

Structs§

TokenCounts
Token counts for multiple models
Tokenizer
Accurate token counter with fallback to estimation

Enums§

TokenModel
Supported LLM models for token counting

Functions§

quick_estimate
Quick estimation without creating a Tokenizer instance