Crate riptoken

Expand description

§riptoken

Fast BPE tokenizer for LLMs — a drop-in compatible, faster reimplementation of OpenAI’s tiktoken.

§Design

riptoken is structured as three layers:

A pure-Rust core (CoreBPE) that can be used directly from Rust.
An optional PyO3 binding (enabled with the python feature).
A Python wrapper package shipped on PyPI.

The core BPE algorithm is a Rust port of tiktoken’s with several optimizations applied — see README.md for benchmarks and details.

§Example

use riptoken::{CoreBPE, Rank};
use rustc_hash::FxHashMap;

// In practice you would load `encoder` from an o200k_base / cl100k_base
// vocabulary file via `riptoken::load_tiktoken_bpe`.
let mut encoder: FxHashMap<Vec<u8>, Rank> = FxHashMap::default();
encoder.insert(b"h".to_vec(), 0);
encoder.insert(b"i".to_vec(), 1);
encoder.insert(b"hi".to_vec(), 2);

let specials = FxHashMap::default();
let bpe = CoreBPE::new(encoder, specials, r"\w+").unwrap();

let tokens = bpe.encode_ordinary("hi");
assert_eq!(tokens, vec![2]);

Structs§

CoreBPE: The core BPE encoder/decoder.

Enums§

BuildError: Errors produced when constructing a CoreBPE.
DecodeError: Errors produced during decoding.

Type Aliases§

Rank: Integer rank of a token in the BPE vocabulary.

Crate riptoken

Crate riptoken Copy item path

§riptoken

§Design

§Example

Structs§

Enums§

Type Aliases§

Crate riptoken