pub struct NormalizationRustStemmers {
pub anyway_above_confidence: f64,
pub process_already_normalized: bool,
}Expand description
Will run stemming with the language tagged onto the token if an algorithm is available.
This uses the rust_stemmers crate under the hood.
This is recommended to be run after an AugmentationDetectLanguage has been used, it will not do anything if no language metadata is available!
If you need lowercase normalization, do that before this normalizer and set process_already_normalized to true. This is because some normalizers can’t handle SCREAMING CASE.
Tokens will be ignored if:
- They are known to not be an SegmentedTokenKind::AlphaNumeric
- They already have
normalized_textset whenprocess_already_normalizedisfalse(default)
If the tokens normalization_language is already set to a Some value that one wil be used and the detected language ignored.
Fields§
§anyway_above_confidence: f64Thereshold above which the flag about the lnguage detection flagging itself as reliable is ignored and the detected lnguage used for normalization anyway. Setting this can help with shorter texts.
1.0 which translates to never ignore the flag. 0.0 would mean to always ignore it.
Default is 0.4 as that is usually “good enough” for correct stemming.
process_already_normalized: boolWheter to process tokens that are already normalized. You want to enable this, if your pipeline does some generic preprocessing like lowercasing.
Default is false for backwards compatibility.
Implementations§
Source§impl NormalizationRustStemmers
impl NormalizationRustStemmers
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new NormalizationRustStemmers instance with the default settings.
Sourcepub fn set_anyway_above_confidence(self, anyway_above_confidence: f64) -> Self
pub fn set_anyway_above_confidence(self, anyway_above_confidence: f64) -> Self
Adjust the value of anyway_above_confidence builder style.
Sourcepub fn set_process_already_normalized(
self,
process_already_normalized: bool,
) -> Self
pub fn set_process_already_normalized( self, process_already_normalized: bool, ) -> Self
Adjust the value of process_already_normalized builder style.
Trait Implementations§
Source§impl Augmenter for NormalizationRustStemmers
impl Augmenter for NormalizationRustStemmers
Source§fn augment<'a>(&self, token: SegmentedToken<'a>) -> SegmentedToken<'a>
fn augment<'a>(&self, token: SegmentedToken<'a>) -> SegmentedToken<'a>
Source§impl Clone for NormalizationRustStemmers
impl Clone for NormalizationRustStemmers
Source§fn clone(&self) -> NormalizationRustStemmers
fn clone(&self) -> NormalizationRustStemmers
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for NormalizationRustStemmers
impl Debug for NormalizationRustStemmers
Auto Trait Implementations§
impl Freeze for NormalizationRustStemmers
impl RefUnwindSafe for NormalizationRustStemmers
impl Send for NormalizationRustStemmers
impl Sync for NormalizationRustStemmers
impl Unpin for NormalizationRustStemmers
impl UnwindSafe for NormalizationRustStemmers
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Segmenter for Twhere
T: Augmenter,
impl<T> Segmenter for Twhere
T: Augmenter,
Source§type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
subdivide function if it has multiple results. Read moreSource§fn subdivide<'a>(
&self,
token: SegmentedToken<'a>,
) -> UseOrSubdivide<SegmentedToken<'a>, <T as Segmenter>::SubdivisionIter<'a>> ⓘ
fn subdivide<'a>( &self, token: SegmentedToken<'a>, ) -> UseOrSubdivide<SegmentedToken<'a>, <T as Segmenter>::SubdivisionIter<'a>> ⓘ
token into zero, one or more subtokens. Read more