Crate tiktokenx

Crate tiktokenx 

Source
Expand description

§tiktoken_rust

A high-performance Rust implementation of OpenAI’s tiktoken library.

This library provides fast byte pair encoding (BPE) tokenization compatible with OpenAI’s models. It supports all current OpenAI encodings including cl100k_base, p50k_base, r50k_base, and o200k_base.

§Features

  • High Performance: Optimized Rust implementation with zero-copy operations where possible
  • Full Compatibility: Exact compatibility with OpenAI’s tiktoken library
  • All Encodings: Support for all OpenAI model encodings
  • Pure Rust: Minimal dependencies using only standard library and well-maintained crates

§Quick Start

use tiktokenx::{get_encoding, encoding_for_model};

// Get encoding by name
let enc = get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world", &[], &[]).unwrap();
let text = enc.decode(&tokens).unwrap();
assert_eq!(text, "hello world");

// Get encoding for a specific model
let enc = encoding_for_model("gpt-4").unwrap();
let token_count = enc.encode("Hello, world!", &[], &[]).unwrap().len();

Re-exports§

pub use core::CoreBPE;
pub use core::Encoding;
pub use encodings::get_encoding;
pub use encodings::list_encodings;
pub use errors::Result;
pub use errors::TiktokenError;
pub use models::encoding_for_model;
pub use models::encoding_name_for_model;
pub use vendors::VendorProvider;
pub use vendors::VendorRegistry;

Modules§

core
Core BPE implementation and encoding structures
encodings
Encoding definitions and registry
errors
Error types for the tiktoken_rust library
models
Model to encoding mappings
vendors
Vendor-specific implementations for different AI providers
vocab
Vocabulary loading utilities for tiktoken encodings

Functions§

get_encoding_for_any_model
Get encoding for any model from any supported vendor
get_encoding_from_any_vendor
Get encoding from any supported vendor
list_all_supported_encodings
List all supported encodings from all vendors
list_all_supported_models
List all supported models from all vendors

Type Aliases§

TiktokenResult
The main result type used throughout the library