Crate tokenx_rs

Expand description

§tokenx-rs

Fast token count estimation for LLMs at 96% accuracy without a full tokenizer.

This is a Rust port of tokenx by Johann Schopplich. It uses heuristic rules to estimate how many tokens a piece of text will consume when sent to an LLM, without needing any vocabulary files.

§Quick start

use tokenx_rs::estimate_token_count;

let tokens = estimate_token_count("Hello, world!");
assert!(tokens > 0);

§When to use this

Token budget estimation before sending requests to an LLM API.
Streaming display of approximate token counts in real time.
Pre-flight checks to see if a prompt fits a model’s context window.

For exact counts, use a full BPE tokenizer like tiktoken-rs.

Structs§

EstimationOptions: Options for estimate_token_count_with_options.
LanguageConfig: A language-specific rule that adjusts characters-per-token when matched.
SplitOptions: Options for split_by_tokens.

Functions§

estimate_token_count: Estimates the number of tokens in text using default options.
estimate_token_count_with_options: Estimates the number of tokens in text using custom options.
is_within_token_limit: Returns true if the estimated token count of text is at most limit.
is_within_token_limit_with_options: Returns true if the estimated token count of text (with custom options) is at most limit.
slice_by_tokens: Extracts a substring from text by estimated token positions.
slice_by_tokens_with_options: Like slice_by_tokens but with custom estimation options.
split_by_tokens: Splits text into chunks of approximately tokens_per_chunk tokens each.
split_by_tokens_with_options: Like split_by_tokens but with custom split options (including overlap).

Crate tokenx_rs

Crate tokenx_rs Copy item path

§tokenx-rs

§Quick start

§When to use this

Structs§

Functions§

Crate tokenx_rs