Struct DecompositionFst

Source
pub struct DecompositionFst<D: AsRef<[u8]>> { /* private fields */ }
Expand description

Decompose compound words into their parts found in a given dictionary. Useful for compressed, memory mapped dictionaries.

This implementation is based on the fst crate.

This will decompose only if a word can be fully resolved using the dictionary, any texts containing unknown “words” will be passed through as-is.

This is an adaption of the FstSegmenter from charabia for this crate. Unlike the charabia crate, this does not do splitting into unknown segments using heuristics or character splitting.

Implementations§

Source§

impl DecompositionFst<Vec<u8>>

Source

pub fn from_dictionary<I, P>(dict: I) -> Result<Self, Error>
where I: IntoIterator<Item = P>, P: AsRef<[u8]>,

Convenience contructor for DecompositionFst, takes a list of lexicographically ordered words to recognize as valid parts for decomposition.

If you’re using this constructor outside of testing and development please have a look at the DecompositionAhoCorasick implementation as it is more likely to fit your usecase of an in-memory automaton matcher.

Source§

impl<D> DecompositionFst<D>
where D: AsRef<[u8]>,

Source

pub fn from_fst(dictionary: Fst<D>) -> Self

Construct this decomposer from an existing Fst struct.

This is the recommended way as it allows you to choose the best datastore that fits you usecase.

Trait Implementations§

Source§

impl<D: Clone + AsRef<[u8]>> Clone for DecompositionFst<D>

Source§

fn clone(&self) -> DecompositionFst<D>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<D> Segmenter for DecompositionFst<D>
where D: AsRef<[u8]>,

Source§

type SubdivisionIter<'a> = IntoIter<SegmentedToken<'a>>

The iterator type returned by the subdivide function if it has multiple results. Read more
Source§

fn subdivide<'a>( &self, token: SegmentedToken<'a>, ) -> UseOrSubdivide<SegmentedToken<'a>, IntoIter<SegmentedToken<'a>>>

A method that should split the given token into zero, one or more subtokens. Read more

Auto Trait Implementations§

§

impl<D> Freeze for DecompositionFst<D>
where D: Freeze,

§

impl<D> RefUnwindSafe for DecompositionFst<D>
where D: RefUnwindSafe,

§

impl<D> Send for DecompositionFst<D>
where D: Send,

§

impl<D> Sync for DecompositionFst<D>
where D: Sync,

§

impl<D> Unpin for DecompositionFst<D>
where D: Unpin,

§

impl<D> UnwindSafe for DecompositionFst<D>
where D: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.