pub struct HuggingFaceTokenizer { /* private fields */ }Expand description
HuggingFace tokenizer wrapper
Implementations§
Trait Implementations§
Source§impl IncrementalTokenizer for HuggingFaceTokenizer
impl IncrementalTokenizer for HuggingFaceTokenizer
Source§type State = IncrementalState
type State = IncrementalState
Tokenizer state for incremental operations
Source§fn create_state(&self) -> Self::State
fn create_state(&self) -> Self::State
Create initial state for incremental decoding
Source§fn decode_incremental_with_state(
&self,
state: &mut Self::State,
token: TokenId,
) -> Result<String>
fn decode_incremental_with_state( &self, state: &mut Self::State, token: TokenId, ) -> Result<String>
Add token to state and get incremental text
Source§fn reset_state(&self, state: &mut Self::State)
fn reset_state(&self, state: &mut Self::State)
Reset state to initial condition
Source§fn get_decoded_text(&self, state: &Self::State) -> String
fn get_decoded_text(&self, state: &Self::State) -> String
Get all decoded text from current state
Source§impl Tokenizer for HuggingFaceTokenizer
impl Tokenizer for HuggingFaceTokenizer
Source§fn encode(&self, text: &str, add_special: bool) -> Result<Vec<TokenId>>
fn encode(&self, text: &str, add_special: bool) -> Result<Vec<TokenId>>
Encode text to token IDs
Source§fn decode(&self, tokens: &[TokenId], skip_special: bool) -> Result<String>
fn decode(&self, tokens: &[TokenId], skip_special: bool) -> Result<String>
Decode token IDs to text
Source§fn decode_incremental(&self, prev: &[TokenId], next: TokenId) -> Result<String>
fn decode_incremental(&self, prev: &[TokenId], next: TokenId) -> Result<String>
Incremental decode: given previous tokens and new token, return only the new text
This is crucial for streaming applications to avoid re-decoding all tokens
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Get vocabulary size
Source§fn special_tokens(&self) -> &SpecialTokens
fn special_tokens(&self) -> &SpecialTokens
Get special tokens configuration
Source§fn token_id(&self, text: &str) -> Option<TokenId>
fn token_id(&self, text: &str) -> Option<TokenId>
Get token ID for a specific text (if exists in vocabulary)
Source§fn apply_chat_template(&self, messages: &[ChatMessage]) -> Result<String>
fn apply_chat_template(&self, messages: &[ChatMessage]) -> Result<String>
Apply chat template if supported
Source§fn info(&self) -> TokenizerInfo
fn info(&self) -> TokenizerInfo
Get tokenizer information
Source§fn is_special_token(&self, token_id: TokenId) -> bool
fn is_special_token(&self, token_id: TokenId) -> bool
Check if token is a special token
Auto Trait Implementations§
impl !Freeze for HuggingFaceTokenizer
impl !RefUnwindSafe for HuggingFaceTokenizer
impl Send for HuggingFaceTokenizer
impl Sync for HuggingFaceTokenizer
impl Unpin for HuggingFaceTokenizer
impl UnsafeUnpin for HuggingFaceTokenizer
impl UnwindSafe for HuggingFaceTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more