Expand description
Pure Rust implementation of the GPT-2 byte-pair encoder (aka “text tokenizer”).
Structs§
- Tokenizer
- Tokenizer which converts strings into token sequences consumable by the GPT-2 model, and vice-versa.
Constants§
- END_
OF_ TEXT_ STRING - END_
OF_ TEXT_ TOKEN - PAD_
TOKEN - Token
50256
is used as the padding token, which corresponds to the<|endoftext|>
token in the OpenAI GPT-2 encoder.