Skip to main content

count_tokens

Function count_tokens 

Source
pub fn count_tokens(text: &str) -> usize
Expand description

Returns the exact cl100k_base (OpenAI tiktoken) token count of text.

This is a deliberately conservative proxy for the qwen/qwen3-embedding-8b tokenizer used by the OpenRouter embedding backend: cl100k_base generally emits at least as many tokens as Qwen’s BPE for the same input, so a count comfortably under the model’s ~32K-token effective ceiling guarantees the input fits Qwen’s window.

Unlike approx_tokens, this is exact for arbitrary input. It uses the process-wide cached BPE singleton, so repeated calls do not re-initialise the tokenizer.