Expand description
Token counting via tiktoken BPE tokenizer.
All token estimation goes through tiktoken’s cl100k_base encoding
(GPT-4, GPT-3.5-turbo). BPE tokenizers are similar enough across providers
that this gives reasonable accuracy for Anthropic, Gemini, and others.
Provider-reported exact token counts (from API responses) should always be preferred when available. This module is for pre-call budget estimation and offline token sizing where no provider response exists yet.
Functions§
- estimate_
tokens - Count the number of tokens in
textusing tiktoken BPE. - truncate_
to_ tokens - Truncate
textto at mostmax_tokenstokens.