pub struct TextProcessor { /* private fields */ }Expand description
Text preprocessor and tokenizer
Implementations§
Source§impl TextProcessor
impl TextProcessor
Sourcepub fn new(config: TextConfig) -> Self
pub fn new(config: TextConfig) -> Self
Create new text processor
Sourcepub fn load_tokenizer<P: AsRef<Path>>(&mut self, model_dir: P) -> Result<()>
pub fn load_tokenizer<P: AsRef<Path>>(&mut self, model_dir: P) -> Result<()>
Load tokenizer from model directory
Sourcepub fn preprocess_text(&self, text: &str) -> String
pub fn preprocess_text(&self, text: &str) -> String
Preprocess text (normalize, clean, etc.)
Sourcepub fn tokenize(&self, text: &str) -> Result<TokenizedText>
pub fn tokenize(&self, text: &str) -> Result<TokenizedText>
Tokenize text for model inference
Sourcepub fn tokenize_batch(&self, texts: &[String]) -> Result<Vec<TokenizedText>>
pub fn tokenize_batch(&self, texts: &[String]) -> Result<Vec<TokenizedText>>
Tokenize multiple texts in batch
Sourcepub fn vocab_size(&self) -> Option<usize>
pub fn vocab_size(&self) -> Option<usize>
Get tokenizer vocabulary size
Sourcepub fn config(&self) -> &TextConfig
pub fn config(&self) -> &TextConfig
Get configuration
Sourcepub fn has_tokenizer(&self) -> bool
pub fn has_tokenizer(&self) -> bool
Check if real tokenizer is loaded
Auto Trait Implementations§
impl !Freeze for TextProcessor
impl RefUnwindSafe for TextProcessor
impl Send for TextProcessor
impl Sync for TextProcessor
impl Unpin for TextProcessor
impl UnsafeUnpin for TextProcessor
impl UnwindSafe for TextProcessor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.