pub struct TokenizerState {
pub vocab: HashMap<u32, String>,
pub merges: Vec<(u32, u32, u32)>,
pub special_tokens: HashMap<String, u32>,
}Expand description
A serializable snapshot of a trained tokenizer.
Fields§
§vocab: HashMap<u32, String>id → token string
merges: Vec<(u32, u32, u32)>(left_id, right_id, merged_id)
special_tokens: HashMap<String, u32>special token name → id (e.g. "<BOS>" → 1)
Implementations§
Source§impl TokenizerState
impl TokenizerState
Sourcepub fn from_trained(trained: &TrainedTokenizer) -> Self
pub fn from_trained(trained: &TrainedTokenizer) -> Self
Build a TokenizerState from a crate::trainer::TrainedTokenizer.
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Number of vocabulary entries.
Sourcepub fn save_to<W: Write>(
&self,
writer: &mut W,
) -> Result<(), SerializationError>
pub fn save_to<W: Write>( &self, writer: &mut W, ) -> Result<(), SerializationError>
Save to a writer.
The format is deterministic: vocab entries are written sorted by id, merges in their original order, special tokens sorted by name.
Sourcepub fn load_from<R: BufRead>(reader: &mut R) -> Result<Self, SerializationError>
pub fn load_from<R: BufRead>(reader: &mut R) -> Result<Self, SerializationError>
Load from a reader.
Sourcepub fn load(path: &Path) -> Result<Self, SerializationError>
pub fn load(path: &Path) -> Result<Self, SerializationError>
Load from a file path.
Sourcepub fn to_oxi_tokenizer(&self) -> OxiTokenizer
pub fn to_oxi_tokenizer(&self) -> OxiTokenizer
Convert to an crate::OxiTokenizer (char-level fallback using our vocab).
Trait Implementations§
Source§impl Debug for TokenizerState
impl Debug for TokenizerState
Auto Trait Implementations§
impl Freeze for TokenizerState
impl RefUnwindSafe for TokenizerState
impl Send for TokenizerState
impl Sync for TokenizerState
impl Unpin for TokenizerState
impl UnsafeUnpin for TokenizerState
impl UnwindSafe for TokenizerState
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more