pub struct WhitespaceTokenizer;Expand description
Unicode-scalar whitespace tokenizer.
Splits on any sequence of Unicode whitespace via str::split_whitespace
and discards empty spans. Zero-size; free to copy.
Token counts produced by this type are a structural estimate — see the module documentation for how they relate to subword model tokenizers.
§Performance
Both tokenize and token_count
are O(n) single-pass scans with no internal allocation beyond the returned
Vec. An LRU cache would add memory pressure and synchronisation overhead
that outweighs any benefit at these text sizes.
Trait Implementations§
Source§impl Clone for WhitespaceTokenizer
impl Clone for WhitespaceTokenizer
Source§fn clone(&self) -> WhitespaceTokenizer
fn clone(&self) -> WhitespaceTokenizer
Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for WhitespaceTokenizer
impl Debug for WhitespaceTokenizer
Source§impl Default for WhitespaceTokenizer
impl Default for WhitespaceTokenizer
Source§fn default() -> WhitespaceTokenizer
fn default() -> WhitespaceTokenizer
Returns the “default value” for a type. Read more
Source§impl Tokenizer for WhitespaceTokenizer
impl Tokenizer for WhitespaceTokenizer
impl Copy for WhitespaceTokenizer
Auto Trait Implementations§
impl Freeze for WhitespaceTokenizer
impl RefUnwindSafe for WhitespaceTokenizer
impl Send for WhitespaceTokenizer
impl Sync for WhitespaceTokenizer
impl Unpin for WhitespaceTokenizer
impl UnsafeUnpin for WhitespaceTokenizer
impl UnwindSafe for WhitespaceTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more