tiktoken
The fastest Rust BPE tokenizer. Compatible with OpenAI tiktoken and supports all mainstream LLM tokenizers — OpenAI, Llama 3, DeepSeek, Qwen, and Mistral.
Features
- Multi-provider: 9 encodings across 5 vendors (OpenAI, Meta, DeepSeek, Alibaba, Mistral)
- Fast: arena-based vocabulary, heap-accelerated BPE merge, DFA regex
- Parallel encoding: optional rayon-based multi-threaded encoding for large texts
- Pricing: cost estimation for 57 models across 7 providers
- Compact: ruzstd-compressed vocabulary data embedded at compile time
- Zero-alloc counting:
count()path avoids token vector allocation
Performance
All benchmarks on Apple M4 Mac mini, single-threaded. Token output verified identical across all three implementations.
cl100k_base encode
| Input | Python tiktoken 0.12 | tiktoken-rs 0.9 | tiktoken 3.1 | vs tiktoken-rs | vs Python |
|---|---|---|---|---|---|
| short (13 B) | 1,700 ns | 1,248 ns | 118 ns | 10.6x | 14x |
| medium (900 B) | 32.2 us | 53.8 us | 7.2 us | 7.5x | 4.5x |
| long (45 KB) | 1,500 us | 2,611 us | 366 us | 7.1x | 4.1x |
| unicode (4.5 KB) | 141 us | 164 us | 101 us | 1.6x | 1.4x |
| code (3.9 KB) | 247 us | 264 us | 42 us | 6.3x | 5.9x |
o200k_base encode
| Input | Python tiktoken 0.12 | tiktoken-rs 0.9 | tiktoken 3.1 | vs tiktoken-rs | vs Python |
|---|---|---|---|---|---|
| short (13 B) | 1,600 ns | 1,051 ns | 115 ns | 9.1x | 14x |
| medium (900 B) | 58.3 us | 56.2 us | 7.1 us | 7.9x | 8.2x |
| long (45 KB) | 2,900 us | 2,799 us | 365 us | 7.7x | 7.9x |
| unicode (4.5 KB) | 204 us | 187 us | 99 us | 1.9x | 2.1x |
| code (3.9 KB) | 332 us | 253 us | 41 us | 6.2x | 8.1x |
| tiktoken | tiktoken-rs | Python tiktoken | |
|---|---|---|---|
| Vocab storage | Arena-based (single alloc, cache-friendly) | HashMap<Vec<u8>> (200k allocs) |
Rust HashMap behind PyO3 |
| Regex engine | regex (DFA, linear time) |
fancy-regex (backtracking) |
regex via PyO3 + FFI overhead |
| Hash map | Custom open-addressing + FxHash |
rustc-hash v1 |
standard HashMap |
| BPE merge | Heap-accelerated O(n log n) | O(n*m) linear scan | O(n*m) linear scan |
count() without alloc |
yes | no | no |
Benchmark source: benches/. Reproducible via cargo bench.
Installation
[]
= "3"
# optional: multi-threaded encoding for large texts
= { = "3", = ["parallel"] }
Quick Start
// by encoding name
let enc = get_encoding.unwrap;
let tokens = enc.encode;
let text = enc.decode_to_string.unwrap;
assert_eq!;
// by model name — works across all providers
let enc = encoding_for_model.unwrap;
let count = enc.count; // zero-alloc fast path
let enc = encoding_for_model.unwrap;
let enc = encoding_for_model.unwrap;
let enc = encoding_for_model.unwrap;
Supported Encodings
| Encoding | Provider | Models |
|---|---|---|
o200k_base |
OpenAI | GPT-4o, GPT-4o-mini, o1, o3, o4-mini |
cl100k_base |
OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, text-embedding-* |
p50k_base |
OpenAI | text-davinci-002/003, code-davinci-, code-cushman- |
p50k_edit |
OpenAI | text-davinci-edit-, code-davinci-edit- |
r50k_base |
OpenAI | GPT-3 era: davinci, curie, babbage, ada |
llama3 |
Meta | Llama 3, 3.1, 3.2, 3.3, 4 |
deepseek_v3 |
DeepSeek | DeepSeek V3, R1 |
qwen2 |
Alibaba | Qwen 2.5, Qwen 3 |
mistral_v3 |
Mistral | Mistral, Mixtral (Tekken tokenizer) |
API
Encode / decode
let enc = get_encoding.unwrap;
let tokens = enc.encode; // Vec<u32>
let text = enc.decode_to_string.unwrap; // String
let bytes = enc.decode; // Vec<u8>
Special tokens
let enc = get_encoding.unwrap;
let tokens = enc.encode_with_special_tokens;
// tokens will include the special token id for <|endoftext|>
Count tokens
let enc = get_encoding.unwrap;
let count = enc.count;
// faster than encode().len() -- no token vector allocated
// count with special token recognition
let count = enc.count_with_special_tokens;
Parallel encoding
// requires `parallel` feature
let enc = get_encoding.unwrap;
let tokens = enc.encode_parallel;
// identical output, uses rayon for texts >= 4KB
Cost estimation
use pricing;
// quick estimate
let cost = estimate_cost.unwrap;
// with prompt caching
let model = get_model.unwrap;
let cost = model.estimate_cost_with_cache;
// list all models for a provider
let models = models_by_provider;
Supports 57 models across OpenAI, Anthropic, Google, Meta, DeepSeek, Alibaba, and Mistral.
WebAssembly
Available as @goliapkg/tiktoken-wasm on npm.
import init, { getEncoding, encodingForModel, estimateCost } from '@goliapkg/tiktoken-wasm'
await init()
const enc = getEncoding('cl100k_base')
const tokens = enc.encode('hello world') // Uint32Array
const text = enc.decode(tokens) // "hello world"
const count = enc.count('hello world') // 2
enc.free() // release WASM memory when done
| Bundler | Plugin |
|---|---|
| Vite | vite-plugin-wasm + vite-plugin-top-level-await |
| webpack 5 | Built-in asyncWebAssembly experiment |
| Next.js | next.config.js webpack.experiments |
See examples/react-app for a complete demo.
# requires wasm-pack: cargo install wasm-pack
License
Third-party vocabulary data licenses: see LICENSE-3RD-PARTY.