pub struct MathTokenizer { /* private fields */ }Expand description
Mathematical formula tokenizer
Implementations§
Source§impl MathTokenizer
impl MathTokenizer
Sourcepub fn with_config(config: MathTokenizerConfig) -> Result<Self>
pub fn with_config(config: MathTokenizerConfig) -> Result<Self>
Create a new math tokenizer with custom configuration
Sourcepub fn tokenize_math(&mut self, text: &str) -> Result<Vec<MathToken>>
pub fn tokenize_math(&mut self, text: &str) -> Result<Vec<MathToken>>
Tokenize mathematical text into MathTokens
Sourcepub fn math_tokens_to_input(&mut self, tokens: Vec<MathToken>) -> TokenizedInput
pub fn math_tokens_to_input(&mut self, tokens: Vec<MathToken>) -> TokenizedInput
Convert MathTokens to standard TokenizedInput
Sourcepub fn analyze_math(&self, tokens: &[MathToken]) -> MathAnalysis
pub fn analyze_math(&self, tokens: &[MathToken]) -> MathAnalysis
Get mathematical analysis of tokenized text
Sourcepub fn vocab_stats(&self) -> HashMap<String, usize>
pub fn vocab_stats(&self) -> HashMap<String, usize>
Get vocabulary statistics
Trait Implementations§
Source§impl Clone for MathTokenizer
impl Clone for MathTokenizer
Source§impl Default for MathTokenizer
impl Default for MathTokenizer
Source§impl Tokenizer for MathTokenizer
impl Tokenizer for MathTokenizer
Source§fn encode(&self, text: &str) -> Result<TokenizedInput>
fn encode(&self, text: &str) -> Result<TokenizedInput>
Encodes a single text string into tokens. Read more
Source§fn decode(&self, token_ids: &[u32]) -> Result<String>
fn decode(&self, token_ids: &[u32]) -> Result<String>
Decodes token IDs back into text. Read more
Source§fn get_vocab(&self) -> HashMap<String, u32>
fn get_vocab(&self) -> HashMap<String, u32>
Returns a copy of the vocabulary as a mapping from tokens to IDs. Read more
Source§fn token_to_id(&self, token: &str) -> Option<u32>
fn token_to_id(&self, token: &str) -> Option<u32>
Converts a token string to its corresponding ID. Read more
Source§fn id_to_token(&self, id: u32) -> Option<String>
fn id_to_token(&self, id: u32) -> Option<String>
Converts a token ID to its corresponding token string. Read more
Source§fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
fn encode_pair(&self, text_a: &str, text_b: &str) -> Result<TokenizedInput>
Encodes a pair of texts for sequence-pair tasks. Read more
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Returns the size of the tokenizer’s vocabulary. Read more
Auto Trait Implementations§
impl Freeze for MathTokenizer
impl RefUnwindSafe for MathTokenizer
impl Send for MathTokenizer
impl Sync for MathTokenizer
impl Unpin for MathTokenizer
impl UnsafeUnpin for MathTokenizer
impl UnwindSafe for MathTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more