Crate tokengeex

Crate tokengeex 

Source

Structs§

CrlfProcessor
Replaces occurences of \r\n by \n.
Lattice
Structure to implement Viterbi algorithm to find the best encoding, or sample from all possible encodings of a given sentence.
LocalTask
Model
Node
A node from the lattice, that helps reconstruct the underlying String
ScoredToken
A token and its score.
Task
Tokenizer
Trie
TrieIterator
VecPool

Enums§

Error
ProcessorWrapper
UnicodeProcessor
Unicode normalizer.

Traits§

Processor
A processor is a step of the tokenization pipeline. It can be used to transform input sequences before they are fed to the model and to transform the output sequences after they are generated by the model.

Functions§

make_vocab
mb_per_sec
new_default_vocab
par_chunk_size

Type Aliases§

Result
Token
An arbitrary sequence of bytes. Almost always valid UTF-8 but not guaranteed.
TokenID
A numerical ID for a token. Cannot be larger than u32::MAX.