Skip to main content

Module tokens

Module tokens 

Source
Expand description

Token counting via tiktoken BPE tokenizer.

All token estimation goes through tiktoken’s cl100k_base encoding (GPT-4, GPT-3.5-turbo). BPE tokenizers are similar enough across providers that this gives reasonable accuracy for Anthropic, Gemini, and others.

Provider-reported exact token counts (from API responses) should always be preferred when available. This module is for pre-call budget estimation and offline token sizing where no provider response exists yet.

Functions§

estimate_tokens
Count the number of tokens in text using tiktoken BPE.
truncate_to_tokens
Truncate text to at most max_tokens tokens.