char-token-est 0.1.0

Tokenless byte/char-based token-count estimator for LLM prompts. Per-model-family calibration for Claude, GPT, Gemini, Llama. Zero deps.
Documentation
  • Coverage
  • 100%
    11 out of 11 items documented1 out of 6 items with examples
  • Size
  • Source code size: 20.82 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 331.64 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 26s Average build duration of successful builds.
  • all releases: 26s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • MukundaKatta/char-token-est
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • MukundaKatta

char-token-est

crates.io docs.rs

Tokenless token-count estimator for LLM prompts. ~10% accurate on typical prompts, fast, zero deps. Use when a real BPE tokenizer is too heavy (routing, budget gates, log lines, progress bars).

Usage

use char_token_est::{estimate, Family};

let n = estimate("The quick brown fox jumps over the lazy dog.", Family::Gpt);
println!("~{n} tokens");

Or supply your own ratio:

use char_token_est::estimate_with_ratio;
let n = estimate_with_ratio("...", 4.0);

Calibration

Family chars/token
Gpt 4.0
Claude 3.5
Gemini 4.0
Llama 3.7
Cohere 3.8

Calibration is best-effort on English + code + JSON. Pure non-Latin input deviates further; use a real tokenizer for billing.

License

MIT or Apache-2.0.