Crate ngrammatic
source · [−]Expand description
This crate provides fuzzy search/string matching using N-grams.
This implementation is character-based, rather than word based, matching solely based on string similarity. It is modelled somewhat after the python ngram module with some inspiration from chappers’ blog post on fuzzy matching with ngrams.
The crate is implemented in three parts: the Corpus
, which is an
index connecting strings (words, symbols, whatever) to their Ngrams
,
and SearchResult
s, which contains a fuzzy match result, with the
word and a similarity measure in the range of 0.0 to 1.0.
The general usage pattern is to construct a Corpus
, .add()
your
list of valid symbols to it, and then perform .search()
es of valid,
unknown, misspelled, etc symbols on the Corpus
. The results come
back as a vector of up to 10 results, sorted from highest similarity
to lowest.
Examples
use ngrammatic::{CorpusBuilder, Pad};
let mut corpus = CorpusBuilder::new()
.arity(2)
.pad_full(Pad::Auto)
.finish();
// Build up the list of known words
corpus.add_text("pie");
corpus.add_text("animal");
corpus.add_text("tomato");
corpus.add_text("seven");
corpus.add_text("carbon");
// Now we can try an unknown/misspelled word, and find a similar match
// in the corpus
let results = corpus.search("tomacco", 0.25);
let top_match = results.first();
assert!(top_match.is_some());
assert!(top_match.unwrap().similarity > 0.5);
assert_eq!(top_match.unwrap().text,String::from("tomato"));
Structs
Holds a corpus of words and their ngrams, allowing fuzzy matches of candidate strings against known strings in the corpus.
Build an Ngram Corpus, one setting at a time.
Stores a “word”, with all its n-grams. The “arity” member determines the value of “n” used in generating the n-grams.
Build an Ngram
, one setting at a time.
Holds a fuzzy match search result string, and its associated similarity to the query text.
Enums
Determines how strings are padded before calculating the grams.
Having some sort of padding is especially important for small words
Auto pad pre/appends arity
-1 space chars
Read more about the effect of ngram padding