pub trait Tokenizer {
// Required method
fn tokenize<'a>(&self, text: &'a str) -> Vec<&'a str>;
// Provided method
fn token_count(&self, text: &str) -> usize { ... }
}Expand description
Tokenizer over text slices.
Implementations are expected to be cheap to construct — ideally zero-size —
and stateless. Methods take &self to allow future implementations that
carry configuration (e.g. vocabulary, normalisation flags).
Required Methods§
Provided Methods§
Sourcefn token_count(&self, text: &str) -> usize
fn token_count(&self, text: &str) -> usize
Count the number of tokens in text.
Implementations should override this when a direct count is cheaper
than collecting tokens into a Vec.