Module tokenizer

infiniloom_engine

Module tokenizer

Expand description

Accurate token counting using actual BPE tokenizers

This module provides accurate token counts using tiktoken for OpenAI models and estimation-based counting for other models.

§Supported Models

§OpenAI (Exact tokenization via tiktoken)

o200k_base: GPT-5.2, GPT-5.1, GPT-5, GPT-4o, O1, O3, O4 (all latest models)
cl100k_base: GPT-4, GPT-3.5-turbo (legacy models)

§Other Vendors (Estimation-based)

Claude (Anthropic): ~3.5 chars/token
Gemini (Google): ~3.8 chars/token
Llama (Meta): ~3.5 chars/token
Mistral: ~3.5 chars/token
DeepSeek: ~3.5 chars/token
Qwen (Alibaba): ~3.5 chars/token
Cohere: ~3.6 chars/token
Grok (xAI): ~3.5 chars/token

Structs§

TokenCounts: Token counts for multiple models
Tokenizer: Accurate token counter with fallback to estimation

Enums§

TokenModel: Supported LLM models for token counting

Functions§

quick_estimate: Quick estimation without creating a Tokenizer instance