Expand description
§tokenx-rs
Fast token count estimation for LLMs at 96% accuracy without a full tokenizer.
This is a Rust port of tokenx by Johann Schopplich. It uses heuristic rules to estimate how many tokens a piece of text will consume when sent to an LLM, without needing any vocabulary files.
§Quick start
use tokenx_rs::estimate_token_count;
let tokens = estimate_token_count("Hello, world!");
assert!(tokens > 0);§When to use this
- Token budget estimation before sending requests to an LLM API.
- Streaming display of approximate token counts in real time.
- Pre-flight checks to see if a prompt fits a model’s context window.
For exact counts, use a full BPE tokenizer like tiktoken-rs.
Structs§
- Estimation
Options - Options for
estimate_token_count_with_options. - Language
Config - A language-specific rule that adjusts characters-per-token when matched.
- Split
Options - Options for
split_by_tokens.
Functions§
- estimate_
token_ count - Estimates the number of tokens in
textusing default options. - estimate_
token_ count_ with_ options - Estimates the number of tokens in
textusing custom options. - is_
within_ token_ limit - Returns
trueif the estimated token count oftextis at mostlimit. - is_
within_ token_ limit_ with_ options - Returns
trueif the estimated token count oftext(with custom options) is at mostlimit. - slice_
by_ tokens - Extracts a substring from
textby estimated token positions. - slice_
by_ tokens_ with_ options - Like
slice_by_tokensbut with custom estimation options. - split_
by_ tokens - Splits
textinto chunks of approximatelytokens_per_chunktokens each. - split_
by_ tokens_ with_ options - Like
split_by_tokensbut with custom split options (including overlap).