Expand description
This crate is a Rust port of Google’s BERT GoogleBERT WordPiece tokenizer.
Structs§
- Basic
Tokenizer - A basic tokenizer that runs basic tokenization (punctuation splitting, lower casing, etc.). By default, it does not lower case the input.
- Basic
Tokenizer Builder - Full
Tokenizer - A FullTokenizer that runs basic tokenization and WordPiece tokenization.
- Full
Tokenizer Builder - Word
Piece Tokenizer - A subword tokenizer that runs WordPiece tokenization algorithm.
- Word
Piece Tokenizer Builder
Traits§
- Tokenizer
- A trait for tokenizing text.
This trait is implemented by the
BasicTokenizer
andWordPieceTokenizer
.
Functions§
- load_
vocab - Load a vocabulary from a vocabulary file.
Not recommended to use this function directly, use
FullTokenizerBuilder::vocab_from_file
instead.