pub struct Worker<'a> { /* private fields */ }
Expand description
Provider of a routine for tokenization.
It holds the internal data structures used in tokenization, which can be reused to avoid unnecessary memory reallocation.
Implementations
sourceimpl<'a> Worker<'a>
impl<'a> Worker<'a>
sourcepub fn reset_sentence<S>(&mut self, input: S) -> Result<()>where
S: AsRef<str>,
pub fn reset_sentence<S>(&mut self, input: S) -> Result<()>where
S: AsRef<str>,
Resets the input sentence to be tokenized.
Errors
When the input sentence includes characters more than
MAX_SENTENCE_LENGTH
,
an error will be returned.
sourcepub fn tokenize(&mut self)
pub fn tokenize(&mut self)
Tokenizes the input sentence set in state
,
returning the result through state
.
sourcepub fn num_tokens(&self) -> usize
pub fn num_tokens(&self) -> usize
Gets the number of resultant tokens.
sourcepub const fn token_iter(&'a self) -> TokenIter<'a>ⓘNotable traits for TokenIter<'a>impl<'a> Iterator for TokenIter<'a> type Item = Token<'a>;
pub const fn token_iter(&'a self) -> TokenIter<'a>ⓘNotable traits for TokenIter<'a>impl<'a> Iterator for TokenIter<'a> type Item = Token<'a>;
Creates an iterator of resultant tokens.
sourcepub fn init_connid_counter(&mut self)
pub fn init_connid_counter(&mut self)
Initializes a counter to compute occurrence probabilities of connection ids.
sourcepub fn update_connid_counts(&mut self)
pub fn update_connid_counts(&mut self)
Updates frequencies of connection ids at the last tokenization.
Panics
It will panic when Self::init_connid_counter()
has never been called.
sourcepub fn compute_connid_probs(&self) -> (Vec<(usize, f64)>, Vec<(usize, f64)>)
pub fn compute_connid_probs(&self) -> (Vec<(usize, f64)>, Vec<(usize, f64)>)
Computes the computed occurrence probabilities of connection ids, returning those for left- and right-ids.
Panics
It will panic when Self::init_connid_counter()
has never been called.