pub struct AugmentationClassify {}
Expand description
An augmenter that rewrites the SegmentedToken::kind field to match reality.
It does so by reading the token text (preferring the normalized text) and applying heuristics based on the unicode GeneralCategoryGroup of the characters it contains.
The following heuristics are applied in the given order:
- If it contains Letters or Numbers -> SegmentedTokenKind::AlphaNumeric
- If it contains Symbols or Other -> SegmentedTokenKind::Symbol
- If it contains Punctuation or Separators -> SegmentedTokenKind::Separator
Exceptions from usual unicode classification: \n
and \0
are seperators.
The Mark category is ignored. If none of the heuristics apply the token kind is reset to None
.
Implementations§
Trait Implementations§
Source§impl Augmenter for AugmentationClassify
impl Augmenter for AugmentationClassify
Source§fn augment<'a>(&self, token: SegmentedToken<'a>) -> SegmentedToken<'a>
fn augment<'a>(&self, token: SegmentedToken<'a>) -> SegmentedToken<'a>
Apply augmentation function to the given token and return it.
Source§impl Clone for AugmentationClassify
impl Clone for AugmentationClassify
Source§fn clone(&self) -> AugmentationClassify
fn clone(&self) -> AugmentationClassify
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moreSource§impl Debug for AugmentationClassify
impl Debug for AugmentationClassify
Source§impl Default for AugmentationClassify
impl Default for AugmentationClassify
Source§fn default() -> AugmentationClassify
fn default() -> AugmentationClassify
Returns the “default value” for a type. Read more
Auto Trait Implementations§
impl Freeze for AugmentationClassify
impl RefUnwindSafe for AugmentationClassify
impl Send for AugmentationClassify
impl Sync for AugmentationClassify
impl Unpin for AugmentationClassify
impl UnwindSafe for AugmentationClassify
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Segmenter for Twhere
T: Augmenter,
impl<T> Segmenter for Twhere
T: Augmenter,
Source§type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
The iterator type returned by the
subdivide
function if it has multiple results. Read moreSource§fn subdivide<'a>(
&self,
token: SegmentedToken<'a>,
) -> UseOrSubdivide<SegmentedToken<'a>, <T as Segmenter>::SubdivisionIter<'a>> ⓘ
fn subdivide<'a>( &self, token: SegmentedToken<'a>, ) -> UseOrSubdivide<SegmentedToken<'a>, <T as Segmenter>::SubdivisionIter<'a>> ⓘ
A method that should split the given
token
into zero, one or more subtokens. Read more