⚡ fastokens
fastokens is a fast BPE tokenizer for use with popular open-weight LLMs, built on top of a high-performance Rust backend.
fastokens can be installed from source:
git clone https://github.com/atero-ai/fast-tokens
uv pip install fast-tokens/python
The Python API lives in the python directory. To use fastokens as a drop-in replacement with
transformers, see the
patching example below.
Performance
fastokens on average achieves a 10x+ faster tokenization compared to the tokenizers library.
The gap widens as prompt sizes scale, as shown in the graphs below.


Faster tokenization directly impacts live workloads. Tested using SGLang's benchmark suite, fastokens reduces time-to-first-token (TTFT) across prompt sizes:

Note that fastokens is focused on inference and does not support all features of tokenizers.
In particular, additional encoding outputs, and some normalizers/pretokenizers are not available.
Tested models
The following models have been tested, but fastokens should generally work with most BPE tokenizers supported by the transformers library:
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16openai/gpt-oss-120bdeepseek-ai/DeepSeek-V3.2deepseek-ai/DeepSeek-V3deepseek-ai/DeepSeek-R1Qwen/Qwen3-Next-80B-A3B-ThinkingQwen/Qwen3-Next-80B-A3B-InstructQwen/Qwen3-235B-A22B-Instruct-2507Qwen/Qwen3.5-397B-A17BMiniMaxAI/MiniMax-M2.1MiniMaxAI/MiniMax-M2.5mistralai/Devstral-Small-2-24B-Instruct-2512zai-org/GLM-4.7zai-org/GLM-5
Usage
Using with transformers
Note that it currently works with transformers 4.57.1 (the version used by current sglang).
=
=
assert ==
Standalone usage
=
=
Acknowledgements
This library builds on the well-known and widely used Hugging Face tokenizers library and uses code written for HF tokenizers in several flows.