Structs§
- Replaces occurences of \r\n by \n.
- Structure to implement Viterbi algorithm to find the best encoding, or sample from all possible encodings of a given sentence.
- A node from the lattice, that helps reconstruct the underlying
String
- A token and its score.
Enums§
- Unicode normalizer.
Traits§
- A processor is a step of the tokenization pipeline. It can be used to transform input sequences before they are fed to the model and to transform the output sequences after they are generated by the model.
Functions§
Type Aliases§
- An arbitrary sequence of bytes. Almost always valid UTF-8 but not guaranteed.
- A numerical ID for a token. Cannot be larger than
u32::MAX
.