Expand description
§tiktoken_rust
A high-performance Rust implementation of OpenAI’s tiktoken library.
This library provides fast byte pair encoding (BPE) tokenization compatible with OpenAI’s models. It supports all current OpenAI encodings including cl100k_base, p50k_base, r50k_base, and o200k_base.
§Features
- High Performance: Optimized Rust implementation with zero-copy operations where possible
- Full Compatibility: Exact compatibility with OpenAI’s tiktoken library
- All Encodings: Support for all OpenAI model encodings
- Pure Rust: Minimal dependencies using only standard library and well-maintained crates
§Quick Start
use tiktokenx::{get_encoding, encoding_for_model};
// Get encoding by name
let enc = get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world", &[], &[]).unwrap();
let text = enc.decode(&tokens).unwrap();
assert_eq!(text, "hello world");
// Get encoding for a specific model
let enc = encoding_for_model("gpt-4").unwrap();
let token_count = enc.encode("Hello, world!", &[], &[]).unwrap().len();
Re-exports§
pub use core::CoreBPE;
pub use core::Encoding;
pub use encodings::get_encoding;
pub use encodings::list_encodings;
pub use errors::Result;
pub use errors::TiktokenError;
pub use models::encoding_for_model;
pub use models::encoding_name_for_model;
pub use vendors::VendorProvider;
pub use vendors::VendorRegistry;
Modules§
- core
- Core BPE implementation and encoding structures
- encodings
- Encoding definitions and registry
- errors
- Error types for the tiktoken_rust library
- models
- Model to encoding mappings
- vendors
- Vendor-specific implementations for different AI providers
- vocab
- Vocabulary loading utilities for tiktoken encodings
Functions§
- get_
encoding_ for_ any_ model - Get encoding for any model from any supported vendor
- get_
encoding_ from_ any_ vendor - Get encoding from any supported vendor
- list_
all_ supported_ encodings - List all supported encodings from all vendors
- list_
all_ supported_ models - List all supported models from all vendors
Type Aliases§
- Tiktoken
Result - The main result type used throughout the library