pub struct ByteCountTokenCounter;Expand description
Zero-dependency conservative counter — bytes.div_ceil(4).
Approximates English at the ~4-bytes-per-token rule of thumb that
tiktoken’s cl100k_base is built around. Systematically
inaccurate for CJK, Devanagari, Arabic, and other scripts whose
UTF-8 byte cost diverges from typical token boundaries — operators
shipping multilingual workloads inject a vendor-accurate counter
(entelix-tokenizer-tiktoken, entelix-tokenizer-hf, locale-aware
companions) at ChatModel::with_token_counter(...) time.
The bias direction is deliberate: div_ceil rounds up, so the
estimate skews over the real count on average. Pre-flight
RunBudget checks built on top remain conservative — a
near-budget call is more likely refused than admitted, which is
the correct error direction for budget enforcement.
Implementations§
Trait Implementations§
Source§impl Clone for ByteCountTokenCounter
impl Clone for ByteCountTokenCounter
Source§fn clone(&self) -> ByteCountTokenCounter
fn clone(&self) -> ByteCountTokenCounter
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for ByteCountTokenCounter
impl Debug for ByteCountTokenCounter
Source§impl Default for ByteCountTokenCounter
impl Default for ByteCountTokenCounter
Source§fn default() -> ByteCountTokenCounter
fn default() -> ByteCountTokenCounter
Source§impl TokenCounter for ByteCountTokenCounter
impl TokenCounter for ByteCountTokenCounter
Source§fn encoding_name(&self) -> &'static str
fn encoding_name(&self) -> &'static str
"cl100k_base",
"o200k_base", "claude", "gemini-tokenizer", …) — surfaced
on OTel gen_ai.tokenizer.name and operator diagnostics.Source§fn count_messages(&self, msgs: &[Message]) -> u64
fn count_messages(&self, msgs: &[Message]) -> u64
crate::ir::ContentPart::Text parts; non-text parts (image,
tool-use, tool-result blocks) are vendor-specific in their
token cost — counters that need an exact tally for those
shapes override this method.