Crate tiktoken

Expand description

High-performance pure-Rust BPE tokenizer compatible with OpenAI’s tiktoken and all mainstream LLM tokenizers.

Supports 11 encodings across 5 providers: OpenAI (cl100k_base, o200k_base, o200k_harmony, p50k_base, p50k_edit, r50k_base, gpt2), Meta (llama3), DeepSeek (deepseek_v3), Alibaba (qwen2), and Mistral (mistral_v3).

Includes token encoding, decoding, counting, and multi-provider pricing.

§Quick Start

// by encoding name
let enc = tiktoken::get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world");
let text = enc.decode_to_string(&tokens).unwrap();
assert_eq!(text, "hello world");

// by model name
let enc = tiktoken::encoding_for_model("gpt-4o").unwrap();
let count = enc.count("hello world");
assert_eq!(count, 2);

Modules§

encoding: Encoding definitions and data parsing for tiktoken-compatible BPE vocabularies.
pricing: Per-model pricing data and cost estimation for OpenAI, Anthropic, Google, Meta, DeepSeek, Alibaba, and Mistral.

Structs§

CoreBpe: A Byte Pair Encoding tokenizer engine.

Functions§

encoding_for_model: Get a cached tokenizer by model name.
get_encoding: Get a cached tokenizer by encoding name.
list_encodings: All available encoding names.
model_to_encoding: Map a model name to its encoding name.

Crate tiktoken

Crate tiktoken Copy item path

§Quick Start

Modules§

Structs§

Functions§

Crate tiktoken