Module analiticcl::search

Structs§

Context
Refers to a match and its unigram context
ContextRule
Match
Represents a match between the input text and the lexicon.
Offset
Byte Offset
OutputSymbol
Intermediate datastructure tied to the Finite State Transducer used in most_likely_sequence() Holds the output symbol for each FST state and allows relating output symbols back to the input structures.
PatternMatchResult
Sequence
A complete sequence of output symbols with associated emission and language model (log) probabilities.

classify_boundaries
Classify the token boundaries as detected by find_boundaries as either weak, normal or hard boundaries. This information determines how eager the system is to split on certain boundaries.
find_boundaries
Given a text string, identify at what points token boundaries occur, for instance between alphabetic characters and punctuation. The text string always ends with a boundary (but it may be a dummy one that covers no length).
find_match_ngrams
Find all ngrams in the text of the specified order, respecting the boundaries. This will return a vector of Match instances, referring to the precise (untokenised) text.
redundant_match
A redundant match is a higher order match which already scores a perfect distance score when its unigram components are considered separately.