pub struct Tokenizer<'al, 'sw, A> { /* private fields */ }
Expand description
Structure used to tokenize a text with custom configurations.
See TokenizerBuilder
to know how to build a Tokenizer
.
Implementations§
source§impl<'al, 'sw, A: AsRef<[u8]>> Tokenizer<'al, 'sw, A>
impl<'al, 'sw, A: AsRef<[u8]>> Tokenizer<'al, 'sw, A>
sourcepub fn tokenize<'o>(
&self,
original: &'o str
) -> NormalizedTokenIter<'o, 'al, 'sw, A> ⓘ
pub fn tokenize<'o>(
&self,
original: &'o str
) -> NormalizedTokenIter<'o, 'al, 'sw, A> ⓘ
Creates an Iterator over Token
s.
The provided text is segmented creating tokens, then tokens are normalized and classified.
sourcepub fn reconstruct<'o>(
&self,
original: &'o str
) -> ReconstructedTokenIter<'o, 'al, 'sw, A> ⓘ
pub fn reconstruct<'o>(
&self,
original: &'o str
) -> ReconstructedTokenIter<'o, 'al, 'sw, A> ⓘ
Same as [tokenize
] but attaches each Token
to its corresponding portion of the original text.
sourcepub fn segment<'o>(&self, original: &'o str) -> SegmentedTokenIter<'o, 'al> ⓘ
pub fn segment<'o>(&self, original: &'o str) -> SegmentedTokenIter<'o, 'al> ⓘ
Segments the provided text creating an Iterator over Token
.
sourcepub fn segment_str<'o>(&self, original: &'o str) -> SegmentedStrIter<'o, 'al> ⓘ
pub fn segment_str<'o>(&self, original: &'o str) -> SegmentedStrIter<'o, 'al> ⓘ
Segments the provided text creating an Iterator over &str
.