Struct analiticcl :: VariantModel Copy item path

impl VariantModel

pub fn new(alphabet_file: &str, weights: Weights, debug: u8) -> VariantModel

Instantiate a new variant model

pub fn new_with_alphabet( alphabet: Alphabet, weights: Weights, debug: u8 ) -> VariantModel

Instantiate a new variant model, explicitly passing an alphabet rather than loading one from file.

pub fn set_confusables_before_pruning(&mut self)

Configure the model to match against known confusables prior to pruning on maximum weight. This may lead to better results but may have a significant performance impact.

pub fn alphabet_size(&self) -> CharIndexType

Returns the size of the alphabet, this is typically +1 longer than the actual alphabet file as it includes the UNKNOWN symbol.

pub fn get_or_create_index<'a, 'b>( &'a mut self, anahash: &'b AnaValue ) -> &'a mut AnaIndexNode

Get an item from the index or insert it if it doesn’t exist yet

pub fn build(&mut self)

Build the anagram index (and secondary index) so the model is ready for variant matching

pub fn contains_key(&self, key: &AnaValue) -> bool

Tests if the anagram value exists in the index

pub fn get_anagram_instances(&self, text: &str) -> Vec<&VocabValue>

Get all anagram instances for a specific entry

pub fn get(&self, text: &str) -> Option<&VocabValue>

Get an exact item in the lexicon (if it exists)

pub fn has(&self, text: &str) -> bool

Tests if the lexicon has a specific entry, by text

pub fn get_vocab(&self, vocab_id: VocabId) -> Option<&VocabValue>

Resolves a vocabulary ID

pub fn decompose_anavalue(&self, av: &AnaValue) -> Vec<&str>

Decomposes and decodes and anagram value into the characters that make it up. Mostly intended for debugging purposes.

pub fn read_alphabet(&mut self, filename: &str) -> Result<(), Error>

Read the alphabet from a TSV file The file contains one alphabet entry per line, but may consist of multiple tab-separated alphabet entries on that line, which will be treated as the identical. The alphabet is not limited to single characters but may consist of longer string, a greedy matching approach will be used so order matters (but only for this)

pub fn read_confusablelist(&mut self, filename: &str) -> Result<(), Error>

Read a confusiblelist from a TSV file Contains edit scripts in the first columned (formatted in sesdiff style) and optionally a weight in the second column. favourable confusables have a weight > 1.0, unfavourable ones are < 1.0 (penalties) Weight values should be relatively close to 1.0 as they are applied to the entire score

pub fn add_to_confusables( &mut self, editscript: &str, weight: f64 ) -> Result<(), Error>

Add a confusable

pub fn add_variant( &mut self, ref_id: VocabId, variant: &str, score: f64, freq: Option<u32>, params: &VocabParams ) -> bool

Add a (weighted) variant to the model, referring to a reference that already exists in the model. Variants will be added to the lexicon automatically when necessary. Set VocabType::TRANSPARENT if you want variants to only be used as an intermediate towards items that have already been added previously through a more authoritative lexicon.

pub fn add_variant_by_id( &mut self, ref_id: VocabId, variantid: VocabId, score: f64 ) -> bool

Add a (weighted) variant to the model, referring to a reference that already exists in the model. Variants will be added to the lexicon automatically when necessary. Set VocabType::TRANSPARENT if you want variants to only be used as an intermediate towards items that have already been added previously through a more authoritative lexicon.

pub fn read_vocabulary( &mut self, filename: &str, params: &VocabParams ) -> Result<(), Error>

Read vocabulary (a lexicon or corpus-derived lexicon) from a TSV file May contain frequency information The parameters define what value can be read from what column

pub fn read_contextrules(&mut self, filename: &str) -> Result<(), Error>

pub fn add_contextrule( &mut self, pattern: &str, score: f32, tag: Vec<&str>, tagoffset: Vec<&str> ) -> Result<(), Error>

pub fn read_variants( &mut self, filename: &str, params: Option<&VocabParams>, transparent: bool ) -> Result<(), Error>

Read a weighted variant list from a TSV file. Contains a canonical/reference form in the first column, and variants with score (two columns) in the following columns. May also contain frequency information (auto detected), in which case the first column has the canonical/reference form, the second column the frequency, and all further columns hold variants, their score and their frequency (three columns). Consumes much more memory than equally weighted variants.

pub fn add_to_vocabulary( &mut self, text: &str, frequency: Option<u32>, params: &VocabParams ) -> VocabId

Adds an entry in the vocabulary

pub fn find_variants( &self, input: &str, params: &SearchParameters ) -> Vec<VariantResult>

Find variants in the vocabulary for a given string (in its totality), returns a vector of vocabulary ID and score pairs Returns a vector of three-tuples (VocabId, distance_score, freq_score) The resulting vocabulary Ids can be resolved through get_vocab()

pub fn learn_variants<'a, I>( &mut self, input: I, params: &SearchParameters, strict: bool, auto_build: bool ) -> usize
where I: IntoParallelIterator<Item = &'a String> + IntoIterator<Item = &'a String>,

Processes input and finds variants (like [find_variants()]), but all variants that are found (which meet the set thresholds) will be stored in the model rather than returned. Unlike find_variants(), this is invoked with an iterator over multiple inputs and returns no output by itself. It will automatically apply parallellisation.

pub fn rescore_confusables(&self, results: &mut Vec<VariantResult>, input: &str)

Rescore results according to confusables

pub fn rank_results(&self, results: &mut Vec<VariantResult>, freq_weight: f32)

Sorts a result vector of (VocabId, distance_score, freq_score) in decreasing order (best result first)

pub fn expand_variants(&self, results: Vec<VariantResult>) -> Vec<VariantResult>

Expand variants, adding all references for variants In case variants are ‘transparent’, only the references will be retained as results. The results list does not need to be sorted yet. This function may yield duplicates. For performance, call this only when you know there are variants that may be expanded.

pub fn compute_confusable_weight(&self, input: &str, candidate: VocabId) -> f64

compute weight over known confusables Should return 1.0 when there are no known confusables < 1.0 when there are unfavourable confusables

1.0 when there are favourable confusables

pub fn add_to_reverse_index( &self, reverseindex: &mut ReverseIndex, input: &str, matched_vocab_id: VocabId, score: f64 )

Adds the input item to the reverse index, as instantiation of the given vocabulary id

pub fn find_all_matches<'a>( &self, text: &'a str, params: &SearchParameters ) -> Vec<Match<'a>>

Searches a text and returns all highest-ranking variants found in the text

pub fn test_context_rules<'a>( &self, sequence: &Sequence ) -> (f64, Vec<Vec<PatternMatchResult>>)

Favours or penalizes certain combinations of lexicon matches. matching words X and Y respectively with lexicons A and B might be favoured over other combinations. This returns either a bonus or penalty (number slightly above/below 1.0) score/ for the sequence as a whole.

pub fn lm_score<'a>( &self, sequence: &Sequence, boundaries: &[Match<'a>] ) -> (f32, f64)

Computes the logprob and perplexity for a given sequence as produced in most_likely_sequence()

pub fn lm_score_tokens<'a>(&self, tokens: &Vec<Option<VocabId>>) -> (f32, f64)

Computes the logprob and perplexity for a given sequence of tokens. The tokens are either in the vocabulary or are None if out-of-vocabulary.

pub fn add_ngram(&mut self, ngram: NGram, frequency: u32)

Add an ngram for language modelling

pub fn match_to_str<'a>(&'a self, m: &Match<'a>) -> &'a str

Gives the text representation for this match, always uses the solution (if any) and falls back to the input text only when no solution was found.

pub fn match_to_vocabvalue<'a>( &'a self, m: &Match<'a> ) -> Option<&'a VocabValue>

Gives the vocabitem for this match, always uses the solution (if any) and falls back to the input text only when no solution was found.

pub fn ngram_to_str(&self, ngram: &NGram) -> String

Turns the ngram into a tokenised string; the tokens in the ngram will be separated by a space.

pub fn match_to_ngram<'a>( &'a self, m: &Match<'a>, boundaries: &[Match<'a>] ) -> Result<NGram, String>

Converts a match to an NGram representation, this only works if all tokens in the ngram are in the vocabulary.

Auto Trait Implementations§

impl UnwindSafe for VariantModel

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

impl<T> Pointable for T

const ALIGN: usize = _

The alignment of pointer.

type Init = T

The type for initializers.

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

impl<T> Same for T

type Output = T

Should always be Self

impl<T, U> TryFrom for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

impl<T, U> TryInto for T
where U: TryFrom<T>,

type Error = >::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,