Expand description
Things that split tokens into one or more subtokens.
Structs§
- Decomposition
AhoCorasick - Decompose compound words into their parts found in a given dictionary. Useful for small or on the fly generated dictionaries.
- Decomposition
Fst - Decompose compound words into their parts found in a given dictionary. Useful for compressed, memory mapped dictionaries.
- Naive
Word Splitter - Naive word splitting that is based on tokens being cut where alphanimerics, space and symbols change from one type to another.
- Unicode
Sentence Splitter - Split Large chunks of text into sentances according to the Unicode definition of a Sentence.
- Unicode
Word Splitter - Split text into words according to the Unicode definition of what a word is. While not perfect, it should work well enough as an easy starting point.
Traits§
- Segmenter
- Allows the segmenter to be part of a SubdivisionMap iterator and be chained using the chain_segmenter methods.