char-token-est
Tokenless token-count estimator for LLM prompts. ~10% accurate on typical prompts, fast, zero deps. Use when a real BPE tokenizer is too heavy (routing, budget gates, log lines, progress bars).
Usage
use ;
let n = estimate;
println!;
Or supply your own ratio:
use estimate_with_ratio;
let n = estimate_with_ratio;
Calibration
| Family | chars/token |
|---|---|
Gpt |
4.0 |
Claude |
3.5 |
Gemini |
4.0 |
Llama |
3.7 |
Cohere |
3.8 |
Calibration is best-effort on English + code + JSON. Pure non-Latin input deviates further; use a real tokenizer for billing.
License
MIT or Apache-2.0.