Crate bert_tokenizer
source ·Expand description
This crate is a Rust port of Google’s BERT GoogleBERT WordPiece tokenizer.
Structs
A basic tokenizer that runs basic tokenization (punctuation splitting, lower casing, etc.).
By default, it does not lower case the input.
A FullTokenizer that runs basic tokenization and WordPiece tokenization.
A subword tokenizer that runs WordPiece tokenization algorithm.
Traits
A trait for tokenizing text.
This trait is implemented by the
BasicTokenizer
and WordPieceTokenizer
.Functions
Load a vocabulary from a vocabulary file.
Not recommended to use this function directly, use
FullTokenizerBuilder::vocab_from_file
instead.