pub struct TokenizerProfile {
pub chars_per_token: f32,
pub bpe: Tokenizer,
pub inline_json_cost: f32,
pub toon_overhead: f32,
pub format_factors: BTreeMap<String, f32>,
}Expand description
Cost model for one tokenizer family.
Captures the empirical observation that the same encoder produces wildly
different token counts depending on the receiving model’s tokenizer (e.g.,
inline_json_cost is 2.2x on Anthropic-class but 1.0x on Ollama BPE).
See Paper 2 §Encoder Bug Postmortem (2026-04-25).
bpe selects the actual byte-pair encoder used to count tokens. When set
to Tokenizer::Heuristic (default for backward compat), chars_per_token
drives the estimate. When set to a real BPE variant, that BPE is used and
chars_per_token is informational only.
Fields§
§chars_per_token: f32Average characters per token observed for this tokenizer.
Only used when bpe == Heuristic.
bpe: TokenizerReal BPE tokenizer to use for accurate counts. Falls back to the
chars_per_token heuristic when set to heuristic.
inline_json_cost: f32Penalty multiplier for inline-JSON cells inside markdown tables. Use to decide between inline-JSON nested cells vs. recursive sections.
toon_overhead: f32Multiplicative cost of TOON encoding vs json_compact for this tokenizer.
(TOON’s “−40% tokens” claim is only valid for openai_o200k.)
format_factors: BTreeMap<String, f32>Optional per-format cost factors (multiplied with raw-char-based estimate).
Implementations§
Source§impl TokenizerProfile
impl TokenizerProfile
Sourcepub fn count_tokens(&self, text: &str) -> usize
pub fn count_tokens(&self, text: &str) -> usize
Count tokens in text using this profile’s resolved tokenizer.
- If
bpe == Heuristic, appliestext.len() / chars_per_token(ceiled). - Otherwise delegates to the real BPE encoder.
Trait Implementations§
Source§impl Clone for TokenizerProfile
impl Clone for TokenizerProfile
Source§fn clone(&self) -> TokenizerProfile
fn clone(&self) -> TokenizerProfile
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more