pub struct UnicodeTokenizerConfig {
pub lowercase: bool,
pub strip_accents: bool,
pub split_on_punctuation: bool,
pub split_on_whitespace: bool,
pub max_token_length: Option<usize>,
}Expand description
Configuration for UnicodeTokenizer.
Fields§
§lowercase: boolConvert characters to lowercase before tokenizing. Default: true.
strip_accents: boolStrip combining accent marks (NFD decomposition + remove Mn category
approximation). Default: true.
split_on_punctuation: boolSplit on Unicode punctuation characters. Default: true.
split_on_whitespace: boolSplit on ASCII whitespace. Default: true.
max_token_length: Option<usize>Maximum token length in characters. None = unlimited. Default: None.
Trait Implementations§
Source§impl Clone for UnicodeTokenizerConfig
impl Clone for UnicodeTokenizerConfig
Source§fn clone(&self) -> UnicodeTokenizerConfig
fn clone(&self) -> UnicodeTokenizerConfig
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for UnicodeTokenizerConfig
impl Debug for UnicodeTokenizerConfig
Auto Trait Implementations§
impl Freeze for UnicodeTokenizerConfig
impl RefUnwindSafe for UnicodeTokenizerConfig
impl Send for UnicodeTokenizerConfig
impl Sync for UnicodeTokenizerConfig
impl Unpin for UnicodeTokenizerConfig
impl UnsafeUnpin for UnicodeTokenizerConfig
impl UnwindSafe for UnicodeTokenizerConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.