pub struct SimpleWhitespaceTokenizer { /* private fields */ }Expand description
A vocabulary-aware whitespace tokenizer.
Splits text on whitespace and maps each word to an integer token ID. Unknown words map to a reserved UNK ID.
§Example
use scirs2_text::tokenizer::{SimpleWhitespaceTokenizer, TransformerTokenizer};
let texts = &["hello world", "hello there"];
let tok = SimpleWhitespaceTokenizer::build(texts, 100);
let ids = tok.encode("hello world");
let decoded = tok.decode(&ids);
assert_eq!(decoded, "hello world");Implementations§
Trait Implementations§
Source§impl Clone for SimpleWhitespaceTokenizer
impl Clone for SimpleWhitespaceTokenizer
Source§fn clone(&self) -> SimpleWhitespaceTokenizer
fn clone(&self) -> SimpleWhitespaceTokenizer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for SimpleWhitespaceTokenizer
impl Debug for SimpleWhitespaceTokenizer
Auto Trait Implementations§
impl Freeze for SimpleWhitespaceTokenizer
impl RefUnwindSafe for SimpleWhitespaceTokenizer
impl Send for SimpleWhitespaceTokenizer
impl Sync for SimpleWhitespaceTokenizer
impl Unpin for SimpleWhitespaceTokenizer
impl UnsafeUnpin for SimpleWhitespaceTokenizer
impl UnwindSafe for SimpleWhitespaceTokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.