pub struct DecompositionFst<D: AsRef<[u8]>> { /* private fields */ }
Expand description
Decompose compound words into their parts found in a given dictionary. Useful for compressed, memory mapped dictionaries.
This implementation is based on the fst crate.
This will decompose only if a word can be fully resolved using the dictionary, any texts containing unknown “words” will be passed through as-is.
This is an adaption of the FstSegmenter
from charabia for this crate. Unlike the charabia crate, this does not do splitting into unknown segments using heuristics or character splitting.
Implementations§
Source§impl DecompositionFst<Vec<u8>>
impl DecompositionFst<Vec<u8>>
Sourcepub fn from_dictionary<I, P>(dict: I) -> Result<Self, Error>
pub fn from_dictionary<I, P>(dict: I) -> Result<Self, Error>
Convenience contructor for DecompositionFst
,
takes a list of lexicographically ordered words
to recognize as valid parts for decomposition.
If you’re using this constructor outside of testing and development please have a look at the DecompositionAhoCorasick implementation as it is more likely to fit your usecase of an in-memory automaton matcher.
Trait Implementations§
Source§impl<D: Clone + AsRef<[u8]>> Clone for DecompositionFst<D>
impl<D: Clone + AsRef<[u8]>> Clone for DecompositionFst<D>
Source§fn clone(&self) -> DecompositionFst<D>
fn clone(&self) -> DecompositionFst<D>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreSource§impl<D> Segmenter for DecompositionFst<D>
impl<D> Segmenter for DecompositionFst<D>
Source§type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>
subdivide
function if it has multiple results. Read moreSource§fn subdivide<'a>(
&self,
token: SegmentedToken<'a>,
) -> UseOrSubdivide<SegmentedToken<'a>, IntoIter<SegmentedToken<'a>>> ⓘ
fn subdivide<'a>( &self, token: SegmentedToken<'a>, ) -> UseOrSubdivide<SegmentedToken<'a>, IntoIter<SegmentedToken<'a>>> ⓘ
token
into zero, one or more subtokens. Read more