Crate bpe_openai

Source

Modules§

appendable_encoder
backtrack_encoder
byte_pair_encoding
interval_encoding
prependable_encoder

Structs§

Pretokenizer
Tokenizer
A byte-pair encoding tokenizer that supports a pre-tokenization regex. The direct methods on this type pre-tokenize the input text and should produce the same output as the tiktoken tokenizers. The type gives access to the regex and underlying byte-pair encoding if needed. Note that using the byte-pair encoding directly does not take the regex into account and may result in output that differs from tiktoken.

Functions§

cl100k_base
o200k_base