pub struct BertTokenizer { /* private fields */ }
Expand description
BERT word piece tokenizer.
This tokenizer splits CoNLL-X tokens into word pieces. For example, a sentence such as:
Veruntreute die AWO Spendengeld ?
Could be split (depending on the vocabulary) into the following word pieces:
Ver ##unt ##reute die A ##W ##O Spenden ##geld [UNK]
Then vocabulary index of each such piece is returned.
The unknown token (here [UNK]
) can be specified while
constructing a tokenizer.
Implementations§
Source§impl BertTokenizer
impl BertTokenizer
Sourcepub fn new(word_pieces: WordPieces, unknown_piece: impl Into<String>) -> Self
pub fn new(word_pieces: WordPieces, unknown_piece: impl Into<String>) -> Self
Construct a tokenizer from wordpieces and the unknown piece.
pub fn open<P>( model_path: P, unknown_piece: impl Into<String>, ) -> Result<Self, TokenizerError>
pub fn read<R>(
buf_read: R,
unknown_piece: impl Into<String>,
) -> Result<BertTokenizer, TokenizerError>where
R: BufRead,
Trait Implementations§
Source§impl Tokenize for BertTokenizer
impl Tokenize for BertTokenizer
Auto Trait Implementations§
impl Freeze for BertTokenizer
impl RefUnwindSafe for BertTokenizer
impl Send for BertTokenizer
impl Sync for BertTokenizer
impl Unpin for BertTokenizer
impl UnwindSafe for BertTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more