Skip to main content

Crate tokenx_rs

Crate tokenx_rs 

Source
Expand description

§tokenx-rs

Fast token count estimation for LLMs at 96% accuracy without a full tokenizer.

This is a Rust port of tokenx by Johann Schopplich. It uses heuristic rules to estimate how many tokens a piece of text will consume when sent to an LLM, without needing any vocabulary files.

§Quick start

use tokenx_rs::estimate_token_count;

let tokens = estimate_token_count("Hello, world!");
assert!(tokens > 0);

§When to use this

  • Token budget estimation before sending requests to an LLM API.
  • Streaming display of approximate token counts in real time.
  • Pre-flight checks to see if a prompt fits a model’s context window.

For exact counts, use a full BPE tokenizer like tiktoken-rs.

Structs§

EstimationOptions
Options for estimate_token_count_with_options.
LanguageConfig
A language-specific rule that adjusts characters-per-token when matched.
SplitOptions
Options for split_by_tokens.

Functions§

estimate_token_count
Estimates the number of tokens in text using default options.
estimate_token_count_with_options
Estimates the number of tokens in text using custom options.
is_within_token_limit
Returns true if the estimated token count of text is at most limit.
is_within_token_limit_with_options
Returns true if the estimated token count of text (with custom options) is at most limit.
slice_by_tokens
Extracts a substring from text by estimated token positions.
slice_by_tokens_with_options
Like slice_by_tokens but with custom estimation options.
split_by_tokens
Splits text into chunks of approximately tokens_per_chunk tokens each.
split_by_tokens_with_options
Like split_by_tokens but with custom split options (including overlap).