Crate sentencepiece[−][src]
Expand description
This crate binds the sentencepiece library. sentencepiece is an unsupervised text tokenizer.
The main data structure of this crate is SentencePieceProcessor
,
which is used to tokenize sentences:
use sentencepiece::SentencePieceProcessor; let spp = SentencePieceProcessor::open("testdata/toy.model").unwrap(); let pieces = spp.encode("I saw a girl with a telescope.").unwrap() .into_iter().map(|p| p.piece).collect::<Vec<_>>(); assert_eq!(pieces, vec!["▁I", "▁saw", "▁a", "▁girl", "▁with", "▁a", "▁t", "el", "es", "c", "o", "pe", "."]);
Structs
PieceWithId | Sentence piece with its identifier and string span. |
SentencePieceProcessor | Sentence piece tokenizer. |
Enums
CSentencePieceError | Errors that returned by the |
SentencePieceError |