riptoken
A fast BPE tokenizer for LLMs. Drop-in compatible with OpenAI's tiktoken, 2.4×–6.2× faster single-threaded and up to ~4× faster in parallel batch mode.
riptoken is a Rust-core BPE tokenizer that reads tiktoken-format vocabularies
and produces byte-identical output to tiktoken. It is written from scratch in
Rust, with a thin PyO3 layer, and is designed to be the fastest open-source
tokenizer you can drop into an existing tiktoken pipeline.
Why
If you are running an LLM service and tokenizing millions of requests per hour,
every microsecond of tokenizer overhead shows up on your invoice. tiktoken is
great but leaves performance on the table — in its own source code the authors
comment "I tried using rayon. It wasn't really faster." riptoken is a
ground-up re-implementation that takes a different set of trade-offs and comes
out ahead on every corpus tested.
Benchmarks
Apple Silicon (M-series), Python 3.13, o200k_base vocab, release builds of
both libraries, outputs verified byte-identical. Median of 3 runs.
Single-threaded
| Corpus | Tokens | riptoken (tok/s) | tiktoken (tok/s) | Speedup |
|---|---|---|---|---|
| English prose | 40,001 | 15,660,106 | 3,111,537 | 5.03× |
| Python source code | 72,501 | 16,373,214 | 2,669,412 | 6.13× |
| Rust source code | 88,001 | 18,028,338 | 3,066,479 | 5.88× |
| Multilingual + emoji | 85,600 | 8,866,190 | 3,590,639 | 2.47× |
| Random-ish bytes | 120,000 | 18,028,338 | 4,328,077 | 4.17× |
Parallel batch (256 docs, rayon + GIL release)
| Corpus | Tokens | riptoken (tok/s) | tiktoken (tok/s) | Speedup |
|---|---|---|---|---|
| English prose | 10,240,256 | 33,966,313 | 13,783,321 | 2.51× |
| Python source code | 18,560,256 | 43,965,336 | 11,430,058 | 3.86× |
| Rust source code | 22,528,256 | 48,320,152 | 13,880,179 | 3.60× |
| Multilingual + emoji | 21,913,600 | 31,110,914 | 15,041,445 | 2.03× |
| Random-ish bytes | 30,720,000 | 46,700,000 | 18,188,264 | 2.56× |
Parallel batch scaling improves further on wider machines: on a 32-core
Sapphire Rapids box, o200k_base throughput hits ~290 M tok/s (19× the
single-threaded baseline).
Reproduce with:
Install
Python
Pre-built wheels are published for CPython 3.9–3.14 on Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).
Rust
The python Cargo feature is for the PyO3 bindings — you do not need it
unless you are building the Python extension yourself.
Quick start
Python
# One-liner: load any tiktoken encoding by name or model.
=
# or: enc = riptoken.encoding_for_model("gpt-4o")
=
assert ==
# With allowed special tokens
=
# Every tiktoken.Encoding attribute works transparently
# 200_019
# 199_999
riptoken.get_encoding and riptoken.encoding_for_model are drop-in
equivalents of the tiktoken helpers of the same name. They return a
riptoken.Encoding wrapper whose hot-path methods (encode,
encode_ordinary, decode, decode_bytes, and their batch variants)
execute in riptoken's faster Rust core; every other attribute and method
— n_vocab, eot_token, special_tokens_set, encode_with_unstable,
etc. — forwards transparently to the underlying tiktoken.Encoding.
Vocabulary files and regex patterns come from tiktoken's on-disk cache
at ~/.cache/tiktoken/. Byte-identical output, single import change.
If you'd rather skip the tiktoken dependency and load a .tiktoken file
yourself:
=
=
=
=
Rust
use CoreBPE;
use FxHashMap;
// Populate `encoder` from your vocabulary file (see `load_tiktoken_bpe` in
// the Python package for the format).
let encoder: = load_ranks;
let specials: = default;
let pat = r"\w+|\s+";
let bpe = new?;
let tokens = bpe.encode_ordinary;
let bytes = bpe.decode_bytes;
assert_eq!;
How it works
riptoken ports tiktoken's algorithm to Rust and applies a small set of targeted optimizations:
- Zero-allocation hash lookups. The BPE merge loop queries the
vocabulary thousands of times per input. We store the vocab as
FxHashMap<Vec<u8>, Rank>and look up with&[u8]directly viaVec<u8>: Borrow<[u8]>— no per-lookupVecallocation. - Inlined initial min-scan. The first pass that populates the
partsvector also tracks the minimum rank, avoiding a redundant linear scan. - Cache-aware merge update. When the linear-scan path merges two
adjacent parts, we update
parts[i-1]andparts[i]before callingVec::remove(i+1). The remove shifts memory leftwards, evicting the cells we just read — doing the reads first keeps them hot. - Heap path for long pieces. Pieces ≥ 500 bytes use an
O(m log n)min-heap with lazy invalidation and an intrusive doubly-linked list inside a flatVec<State>. This avoids theO(n²)cliff of repeatedVec::remove. - Whole-piece fast path. Before running BPE on any regex-split piece, we check whether the piece is already a full vocabulary entry. For common English text, this hits over 99 % of the time and skips BPE entirely.
- Pre-compiled dense DFA. Stock tiktoken patterns (
gpt2,r50k_base,p50k_base,cl100k_base,o200k_base) are compiled into fully-materialized dense DFAs at build time viaregex-automata. All states are computed upfront and embedded in the binary — zero lazy building at search time, eliminating the ~55 ms cold-start penalty theregexcrate's lazy DFA incurs on large Unicode patterns. Theprecompiled-dfaCargo feature (on by default) controls this; disable it for smaller binaries at the cost of a first-call warm-up. - Immutable shared regex. The pre-compiled dense DFA has no mutable
state, so a single instance is shared across all threads — no
per-thread clones needed. For non-stock patterns and the
fancy-regexfallback, per-thread clones are still used to avoid mutex contention on internal scratch buffers. - Parallel batch API.
encode_ordinary_batch/encode_batchfan out to rayon's global thread pool, so a batch of independent documents encodes in parallel. The Python bindings release the GIL for the full batch. - GIL release. Every Python-facing encode/decode call is wrapped in
py.detach(|| ...)so Python threads can make real forward progress.
API
Python (riptoken.Encoding)
get_encoding / encoding_for_model return a riptoken.Encoding.
Hot-path methods run in the Rust core and release the GIL; every other
attribute forwards to the underlying tiktoken.Encoding via
__getattr__, so the full tiktoken.Encoding API is available.
| Method | Returns |
|---|---|
encode_ordinary(text) |
list[int] |
encode(text, allowed_special=None) |
list[int] |
encode_ordinary_batch(texts) |
list[list[int]] |
encode_batch(texts, allowed_special=None) |
list[list[int]] |
decode(tokens) |
str |
decode_bytes(tokens) |
bytes |
n_vocab, eot_token, special_tokens_set, … |
forwarded to tiktoken |
allowed_special accepts a set[str] or the sentinel "all".
You can also construct a riptoken.CoreBPE directly from a .tiktoken
file via load_tiktoken_bpe if you want to avoid the tiktoken
dependency. CoreBPE exposes the same hot-path methods as Encoding
plus encode_single_token, decode_single_token_bytes, n_vocab(),
and token_byte_values().
Rust (riptoken::CoreBPE)
See docs.rs/riptoken for full Rust API
documentation. The same methods are available, returning Vec<Rank>, Vec<u8>,
etc.
Compatibility
riptoken reads the same .tiktoken vocabulary files as tiktoken and produces
identical token sequences. We run a CI parity check against tiktoken on every
commit across multiple corpora (English, code, multilingual, emoji, random
bytes).
If you find a string where riptoken produces different output from tiktoken, that is a bug — please open an issue with the input and both outputs.
Development
# Rust tests
# Rust linting
# Python extension + test suite
&&
# Benchmark
The Python test suite and benchmark use riptoken.get_encoding("o200k_base")
under the hood, which reads the vocabulary through tiktoken and its on-disk
cache at ~/.cache/tiktoken/. No local .tiktoken file is required — the
first run downloads it automatically.
Contributing
Issues and PRs welcome. Please include a benchmark or test case demonstrating any performance or behavior change.
License
MIT — see LICENSE.
Credits
riptoken is a re-implementation of the ideas in OpenAI's
tiktoken. The core BPE algorithm is due
to them; riptoken reuses vocabulary files in the .tiktoken format.