Crate bert_tokenizer

source ·
Expand description

This crate is a Rust port of Google’s BERT GoogleBERT WordPiece tokenizer.

Structs

A basic tokenizer that runs basic tokenization (punctuation splitting, lower casing, etc.). By default, it does not lower case the input.
A FullTokenizer that runs basic tokenization and WordPiece tokenization.
A subword tokenizer that runs WordPiece tokenization algorithm.

Traits

A trait for tokenizing text. This trait is implemented by the BasicTokenizer and WordPieceTokenizer.

Functions

Load a vocabulary from a vocabulary file. Not recommended to use this function directly, use FullTokenizerBuilder::vocab_from_file instead.

Type Definitions