Crate gpt_tokenizer

Source
Expand description

This file includes code which was modified from https://github.com/openai/gpt-2 and https://github.com/latitudegames/GPT-3-Encoder/blob/master/Encoder.js It was converted from JavaScript with the help of ChatGPT 4.0

Structs§

Default
Default tokenizer that uses embedded encoder and vocab values to create the encode and decode functions.

Constants§

ENCODER_JSON
VOCAB_BPE

Functions§

bpe_ranks
Constructs the bpe_ranks hashmap from a vocab.bpe file provides as a list of lines.
bytes_to_unicode
Constructs a bytes to unicode HashMap.
decode
Decodes an encoded string using a custom decoder and byte decoder created from the encoder that encoded the original string.
encode
Encodes a string using a custom bpe_ranks and encoder HashMaps.