Skip to main content

Module encoder

Module encoder 

Source
Expand description

Claude-compatible token counting.

WCGW counts tokens with the Xenova/claude-tokenizer (Hugging Face tokenizers). We embed that same tokenizer definition in the binary and load it lazily, so token budgets and truncation match the model that actually runs the agent. If the tokenizer fails to load we fall back to a cheap character/word estimate.

Functionsยง

count_tokens
Count tokens the way Claude does. Falls back to estimate_tokens on failure.
decode_ids
Decode Claude token ids back into text. Returns None on failure.
encode_ids
Encode text into Claude token ids. Returns None if the tokenizer is unavailable so callers can pick a byte-based fallback.
estimate_tokens
Cheap fallback estimate used only when the tokenizer is unavailable.