# riptoken
**A fast BPE tokenizer for LLMs. Drop-in compatible with OpenAI's [`tiktoken`](https://github.com/openai/tiktoken), 2.4×–6.2× faster single-threaded and up to ~4× faster in parallel batch mode.**
[](https://pypi.org/project/riptoken/)
[](https://crates.io/crates/riptoken)
[](LICENSE)
[](https://pypi.org/project/riptoken/)
riptoken is a Rust-core BPE tokenizer that reads `tiktoken`-format vocabularies
and produces byte-identical output to `tiktoken`. It is written from scratch in
Rust, with a thin PyO3 layer, and is designed to be the fastest open-source
tokenizer you can drop into an existing `tiktoken` pipeline.
---
## Why
If you are running an LLM service and tokenizing millions of requests per hour,
every microsecond of tokenizer overhead shows up on your invoice. `tiktoken` is
great but leaves performance on the table — in its own source code the authors
comment "I tried using rayon. It wasn't really faster." riptoken is a
ground-up re-implementation that takes a different set of trade-offs and comes
out ahead on every corpus tested.
## Benchmarks
Apple Silicon (M-series), Python 3.13, `o200k_base` vocab, release builds of
both libraries, outputs verified byte-identical. Median of 3 runs.
### Single-threaded
| English prose | 40,001 | 15,660,106 | 3,111,537 | **5.03×** |
| Python source code | 72,501 | 16,373,214 | 2,669,412 | **6.13×** |
| Rust source code | 88,001 | 18,028,338 | 3,066,479 | **5.88×** |
| Multilingual + emoji | 85,600 | 8,866,190 | 3,590,639 | **2.47×** |
| Random-ish bytes | 120,000 | 18,028,338 | 4,328,077 | **4.17×** |
### Parallel batch (256 docs, rayon + GIL release)
| English prose | 10,240,256 | 33,966,313 | 13,783,321 | **2.51×** |
| Python source code | 18,560,256 | 43,965,336 | 11,430,058 | **3.86×** |
| Rust source code | 22,528,256 | 48,320,152 | 13,880,179 | **3.60×** |
| Multilingual + emoji | 21,913,600 | 31,110,914 | 15,041,445 | **2.03×** |
| Random-ish bytes | 30,720,000 | 46,700,000 | 18,188,264 | **2.56×** |
Parallel batch scaling improves further on wider machines: on a 32-core
Sapphire Rapids box, `o200k_base` throughput hits ~290 M tok/s (19× the
single-threaded baseline).
Reproduce with:
```bash
python scripts/bench.py
```
## Install
### Python
```bash
pip install riptoken
```
Pre-built wheels are published for CPython 3.9–3.14 on Linux (x86_64, aarch64),
macOS (x86_64, arm64), and Windows (x86_64).
### Rust
```bash
cargo add riptoken
```
The `python` Cargo feature is for the PyO3 bindings — you do not need it
unless you are building the Python extension yourself.
## Quick start
### Python
```python
import riptoken
# One-liner: load any tiktoken encoding by name or model.
enc = riptoken.get_encoding("o200k_base")
# or: enc = riptoken.encoding_for_model("gpt-4o")
tokens = enc.encode_ordinary("Hello, world!")
assert enc.decode(tokens) == "Hello, world!"
# With allowed special tokens
# Every tiktoken.Encoding attribute works transparently
enc.n_vocab # 200_019
enc.eot_token # 199_999
enc.special_tokens_set
```
`riptoken.get_encoding` and `riptoken.encoding_for_model` are drop-in
equivalents of the `tiktoken` helpers of the same name. They return a
`riptoken.Encoding` wrapper whose hot-path methods (`encode`,
`encode_ordinary`, `decode`, `decode_bytes`, and their batch variants)
execute in riptoken's faster Rust core; every other attribute and method
— `n_vocab`, `eot_token`, `special_tokens_set`, `encode_with_unstable`,
etc. — forwards transparently to the underlying `tiktoken.Encoding`.
Vocabulary files and regex patterns come from tiktoken's on-disk cache
at `~/.cache/tiktoken/`. Byte-identical output, single import change.
If you'd rather skip the `tiktoken` dependency and load a `.tiktoken` file
yourself:
```python
import riptoken
ranks = riptoken.load_tiktoken_bpe("o200k_base.tiktoken")
special_tokens = {"<|endoftext|>": 199999, "<|endofprompt|>": 200018}
pat = (
r"""[^\r\n\p{L}\p{N}]?[\p{Lu}\p{Lt}\p{Lm}\p{Lo}\p{M}]*[\p{Ll}\p{Lm}\p{Lo}\p{M}]+(?i:'s|'t|'re|'ve|'m|'ll|'d)?|"""
r"""[^\r\n\p{L}\p{N}]?[\p{Lu}\p{Lt}\p{Lm}\p{Lo}\p{M}]+[\p{Ll}\p{Lm}\p{Lo}\p{M}]*(?i:'s|'t|'re|'ve|'m|'ll|'d)?|"""
r"""\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n/]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
)
enc = riptoken.CoreBPE(ranks, special_tokens, pat)
```
### Rust
```rust
use riptoken::CoreBPE;
use rustc_hash::FxHashMap;
// Populate `encoder` from your vocabulary file (see `load_tiktoken_bpe` in
// the Python package for the format).
let encoder: FxHashMap<Vec<u8>, u32> = load_ranks("o200k_base.tiktoken");
let specials: FxHashMap<String, u32> = FxHashMap::default();
let pat = r"\w+|\s+";
let bpe = CoreBPE::new(encoder, specials, pat)?;
let tokens = bpe.encode_ordinary("Hello, world!");
let bytes = bpe.decode_bytes(&tokens);
assert_eq!(bytes, b"Hello, world!");
```
## How it works
riptoken ports tiktoken's algorithm to Rust and applies a small set of targeted
optimizations:
1. **Zero-allocation hash lookups.** The BPE merge loop queries the
vocabulary thousands of times per input. We store the vocab as
`FxHashMap<Vec<u8>, Rank>` and look up with `&[u8]` directly via
`Vec<u8>: Borrow<[u8]>` — no per-lookup `Vec` allocation.
2. **Inlined initial min-scan.** The first pass that populates the `parts`
vector also tracks the minimum rank, avoiding a redundant linear scan.
3. **Cache-aware merge update.** When the linear-scan path merges two
adjacent parts, we update `parts[i-1]` and `parts[i]` **before** calling
`Vec::remove(i+1)`. The remove shifts memory leftwards, evicting the cells
we just read — doing the reads first keeps them hot.
4. **Heap path for long pieces.** Pieces ≥ 500 bytes use an `O(m log n)`
min-heap with lazy invalidation and an intrusive doubly-linked list inside
a flat `Vec<State>`. This avoids the `O(n²)` cliff of repeated
`Vec::remove`.
5. **Whole-piece fast path.** Before running BPE on any regex-split piece,
we check whether the piece is already a full vocabulary entry. For common
English text, this hits over 99 % of the time and skips BPE entirely.
6. **SIMD regex fast path.** Every stock tiktoken pattern (`gpt2`,
`r50k_base`, `p50k_base`, `cl100k_base`, `o200k_base`) compiles on the
`regex` crate's DFA/SIMD engine after a small peephole rewrite that
peels off the one lookaround feature they use (`\s+(?!\S)`) and
reproduces its semantics in Rust. Patterns we can't rewrite fall back
to `fancy-regex`.
7. **Thread-local regex clones.** Both the fast and fancy engines hold
per-thread clones. `fancy-regex` keeps mutable scratch state inside
each `Regex`, and the `regex` crate uses an internal `Pool<Cache>`
guarded by a mutex — under high thread counts that pool becomes a
contention point. Per-thread clones get out of its way: on 32-core
Sapphire Rapids, parallel `o200k_base` batch encoding scales from
6.2× to 19× vs single-threaded.
8. **Parallel batch API.** `encode_ordinary_batch` / `encode_batch` fan
out to rayon's global thread pool, so a batch of independent documents
encodes in parallel. The Python bindings release the GIL for the full
batch.
9. **GIL release.** Every Python-facing encode/decode call is wrapped in
`py.detach(|| ...)` so Python threads can make real forward progress.
## API
### Python (`riptoken.Encoding`)
`get_encoding` / `encoding_for_model` return a `riptoken.Encoding`.
Hot-path methods run in the Rust core and release the GIL; every other
attribute forwards to the underlying `tiktoken.Encoding` via
`__getattr__`, so the full `tiktoken.Encoding` API is available.
| `encode_ordinary(text)` | `list[int]` |
| `encode(text, allowed_special=None)` | `list[int]` |
| `encode_ordinary_batch(texts)` | `list[list[int]]` |
| `encode_batch(texts, allowed_special=None)` | `list[list[int]]` |
| `decode(tokens)` | `str` |
| `decode_bytes(tokens)` | `bytes` |
| `n_vocab`, `eot_token`, `special_tokens_set`, … | forwarded to `tiktoken` |
`allowed_special` accepts a `set[str]` or the sentinel `"all"`.
You can also construct a `riptoken.CoreBPE` directly from a `.tiktoken`
file via `load_tiktoken_bpe` if you want to avoid the `tiktoken`
dependency. `CoreBPE` exposes the same hot-path methods as `Encoding`
plus `encode_single_token`, `decode_single_token_bytes`, `n_vocab()`,
and `token_byte_values()`.
### Rust (`riptoken::CoreBPE`)
See [docs.rs/riptoken](https://docs.rs/riptoken) for full Rust API
documentation. The same methods are available, returning `Vec<Rank>`, `Vec<u8>`,
etc.
## Compatibility
riptoken reads the same `.tiktoken` vocabulary files as `tiktoken` and produces
identical token sequences. We run a CI parity check against `tiktoken` on every
commit across multiple corpora (English, code, multilingual, emoji, random
bytes).
If you find a string where riptoken produces different output from tiktoken,
that is a bug — please open an issue with the input and both outputs.
## Development
```bash
# Rust tests
cargo test
# Rust linting
cargo clippy --all-targets -- -D warnings
# Python extension + test suite
python -m venv .venv && source .venv/bin/activate
pip install -e .[test]
maturin develop --features python --release
pytest
# Benchmark
python scripts/bench.py
```
The Python test suite and benchmark use `riptoken.get_encoding("o200k_base")`
under the hood, which reads the vocabulary through `tiktoken` and its on-disk
cache at `~/.cache/tiktoken/`. No local `.tiktoken` file is required — the
first run downloads it automatically.
## Contributing
Issues and PRs welcome. Please include a benchmark or test case demonstrating
any performance or behavior change.
## License
MIT — see [LICENSE](LICENSE).
## Credits
riptoken is a re-implementation of the ideas in OpenAI's
[tiktoken](https://github.com/openai/tiktoken). The core BPE algorithm is due
to them; riptoken reuses vocabulary files in the `.tiktoken` format.