Struct sentencepiece::SentencePieceProcessor
source · pub struct SentencePieceProcessor { /* private fields */ }
Expand description
Sentence piece tokenizer.
Instances of SentencePieceProcessor
can be used to tokenizer a
sentence using a sentencepiece model.
Implementations§
source§impl SentencePieceProcessor
impl SentencePieceProcessor
pub fn from_serialized_proto(data: &[u8]) -> Result<Self, SentencePieceError>
sourcepub fn to_serialized_proto(&self) -> Vec<u8>
pub fn to_serialized_proto(&self) -> Vec<u8>
Serialize the model to protobuf.
sourcepub fn open(path: impl AsRef<Path>) -> Result<Self, SentencePieceError>
pub fn open(path: impl AsRef<Path>) -> Result<Self, SentencePieceError>
Open a sentencepiece model.
pub fn bos_id(&self) -> Option<u32>
sourcepub fn decode_piece_ids(
&self,
pieces: &[u32]
) -> Result<String, SentencePieceError>
pub fn decode_piece_ids( &self, pieces: &[u32] ) -> Result<String, SentencePieceError>
Decode a sentence from piece identifiers.
pub fn decode_pieces( &self, pieces: &[impl AsRef<str>] ) -> Result<String, SentencePieceError>
sourcepub fn encode(
&self,
sentence: &str
) -> Result<Vec<PieceWithId>, SentencePieceError>
pub fn encode( &self, sentence: &str ) -> Result<Vec<PieceWithId>, SentencePieceError>
Encode a sentence as sentence pieces and their identifiers.
pub fn eos_id(&self) -> Option<u32>
pub fn is_empty(&self) -> bool
pub fn len(&self) -> usize
pub fn pad_id(&self) -> Option<u32>
sourcepub fn piece_to_id(&self, piece: &str) -> Result<Option<u32>, NulError>
pub fn piece_to_id(&self, piece: &str) -> Result<Option<u32>, NulError>
Get the identifier of a sentence piece.
sourcepub fn sample_encode(
&self,
sentence: &str,
n_best: usize,
alpha: f32
) -> Result<Vec<PieceWithId>, SentencePieceError>
pub fn sample_encode( &self, sentence: &str, n_best: usize, alpha: f32 ) -> Result<Vec<PieceWithId>, SentencePieceError>
Encode a sentence using sampling (subword regularization).
Sample for the n_best
segmentations, where alpha controls the
smoothness of the distribution.
This method panics when n_best > 512
or when alpha is not a (normal)
positive floating point number.
pub fn unk_id(&self) -> u32
Trait Implementations§
source§impl Debug for SentencePieceProcessor
impl Debug for SentencePieceProcessor
source§impl Drop for SentencePieceProcessor
impl Drop for SentencePieceProcessor
impl Send for SentencePieceProcessor
impl Sync for SentencePieceProcessor
Auto Trait Implementations§
impl RefUnwindSafe for SentencePieceProcessor
impl Unpin for SentencePieceProcessor
impl UnwindSafe for SentencePieceProcessor
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more