Expand description
Text tokenizer for embeddings.
Provides BPE (Byte Pair Encoding) and WordPiece tokenization, vocabulary management, special token handling, bidirectional token-to-ID mapping, text encoding/decoding, max-sequence-length truncation, and batch tokenization.
Structs§
- Encode
Result - The result of encoding a piece of text.
- Merge
Rule - A single BPE merge rule: pair
(left, right)merged intomerged. - Tokenizer
- A text tokenizer supporting BPE and WordPiece sub-word algorithms.
- Tokenizer
Config - Configuration for building a
Tokenizer.
Enums§
- Special
Token - Well-known special tokens used by transformer models.
- Tokenizer
Mode - The sub-word algorithm used by the tokenizer.