pub struct ZeroCopyTokenizer { /* private fields */ }Expand description
Zero-copy tokenizer implementation
Implementations§
Source§impl ZeroCopyTokenizer
impl ZeroCopyTokenizer
Sourcepub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self>
Load a tokenizer from a zero-copy file
Sourcepub fn header(&self) -> &ZeroCopyHeader
pub fn header(&self) -> &ZeroCopyHeader
Get the header information
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Get vocabulary size
Sourcepub fn get_token_unchecked(&self, id: u32) -> Option<&str>
pub fn get_token_unchecked(&self, id: u32) -> Option<&str>
Get a token by ID without copying
Sourcepub fn get_id_unchecked(&self, token: &str) -> Option<u32>
pub fn get_id_unchecked(&self, token: &str) -> Option<u32>
Get token ID without copying
Sourcepub fn get_vocab_entry(&self, index: usize) -> Option<&ZeroCopyVocabEntry>
pub fn get_vocab_entry(&self, index: usize) -> Option<&ZeroCopyVocabEntry>
Get vocabulary entry by index
Sourcepub fn vocab_entries(&self) -> impl Iterator<Item = &ZeroCopyVocabEntry>
pub fn vocab_entries(&self) -> impl Iterator<Item = &ZeroCopyVocabEntry>
Iterate over all vocabulary entries
Sourcepub fn metadata_bytes(&self) -> &[u8] ⓘ
pub fn metadata_bytes(&self) -> &[u8] ⓘ
Get metadata section as bytes
Sourcepub fn special_tokens_bytes(&self) -> &[u8] ⓘ
pub fn special_tokens_bytes(&self) -> &[u8] ⓘ
Get special tokens section as bytes
Sourcepub fn memory_stats(&self) -> ZeroCopyMemoryStats
pub fn memory_stats(&self) -> ZeroCopyMemoryStats
Get memory usage statistics
Sourcepub fn verify_integrity(&self) -> Result<bool>
pub fn verify_integrity(&self) -> Result<bool>
Verify file integrity using checksum
Trait Implementations§
Source§impl Tokenizer for ZeroCopyTokenizer
impl Tokenizer for ZeroCopyTokenizer
Source§fn encode(&self, text: &str) -> Result<TokenizedInput>
fn encode(&self, text: &str) -> Result<TokenizedInput>
Encodes a single text string into tokens. Read more
Source§fn decode(&self, token_ids: &[u32]) -> Result<String>
fn decode(&self, token_ids: &[u32]) -> Result<String>
Decodes token IDs back into text. Read more
Source§fn get_vocab(&self) -> HashMap<String, u32>
fn get_vocab(&self) -> HashMap<String, u32>
Returns a copy of the vocabulary as a mapping from tokens to IDs. Read more
Source§fn token_to_id(&self, token: &str) -> Option<u32>
fn token_to_id(&self, token: &str) -> Option<u32>
Converts a token string to its corresponding ID. Read more
Source§fn id_to_token(&self, id: u32) -> Option<String>
fn id_to_token(&self, id: u32) -> Option<String>
Converts a token ID to its corresponding token string. Read more
Source§fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
Encodes a pair of texts for sequence-pair tasks. Read more
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Returns the size of the tokenizer’s vocabulary. Read more
Auto Trait Implementations§
impl Freeze for ZeroCopyTokenizer
impl RefUnwindSafe for ZeroCopyTokenizer
impl Send for ZeroCopyTokenizer
impl Sync for ZeroCopyTokenizer
impl Unpin for ZeroCopyTokenizer
impl UnsafeUnpin for ZeroCopyTokenizer
impl UnwindSafe for ZeroCopyTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more