Expand description
SentencePiece runtime in Rust.
This crate loads existing SentencePiece .model / .spm files and exposes
a small processor API for normalization, encoding, and decoding.
Structs§
- Normalizer
- SentencePiece-compatible normalizer.
- Piece
- A vocabulary entry from a SentencePiece model.
- Sentence
Piece Model - Loaded SentencePiece model and vocabulary metadata.
- Sentence
Piece Processor - Main API for loading a SentencePiece model and tokenizing text.
Enums§
Constants§
- DEFAULT_
UNKNOWN_ SURFACE - Default decoded surface for the
<unk>piece. - REPLACEMENT_
CHARACTER - Unicode replacement character, U+FFFD.
- SPACE_
SYMBOL - SentencePiece’s visible whitespace marker, U+2581 LOWER ONE EIGHT BLOCK.
Type Aliases§
- Result
- Crate-wide result type.