Skip to main content

Crate tiktoken

Crate tiktoken 

Source
Expand description

High-performance pure-Rust BPE tokenizer compatible with OpenAI’s tiktoken and all mainstream LLM tokenizers.

Supports 9 encodings across 5 providers: OpenAI (cl100k_base, o200k_base, p50k_base, p50k_edit, r50k_base), Meta (llama3), DeepSeek (deepseek_v3), Alibaba (qwen2), and Mistral (mistral_v3).

Includes token encoding, decoding, counting, and multi-provider pricing.

§Quick Start

// by encoding name
let enc = tiktoken::get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world");
let text = enc.decode_to_string(&tokens).unwrap();
assert_eq!(text, "hello world");

// by model name
let enc = tiktoken::encoding_for_model("gpt-4o").unwrap();
let count = enc.count("hello world");
assert_eq!(count, 2);

Modules§

encoding
Encoding definitions and data parsing for tiktoken-compatible BPE vocabularies.
pricing
Per-model pricing data and cost estimation for OpenAI, Anthropic, Google, Meta, DeepSeek, Alibaba, and Mistral.

Structs§

CoreBpe
A Byte Pair Encoding tokenizer engine.

Functions§

encoding_for_model
Get a cached tokenizer by model name.
get_encoding
Get a cached tokenizer by encoding name.
list_encodings
All available encoding names.
model_to_encoding
Map a model name to its encoding name.