Expand description

This crate provides fuzzy search/string matching using N-grams.

This implementation is character-based, rather than word based, matching solely based on string similarity. It is modelled somewhat after the python ngram module with some inspiration from chappers’ blog post on fuzzy matching with ngrams.

The crate is implemented in three parts: the Corpus, which is an index connecting strings (words, symbols, whatever) to their Ngrams, and SearchResults, which contains a fuzzy match result, with the word and a similarity measure in the range of 0.0 to 1.0.

The general usage pattern is to construct a Corpus, .add() your list of valid symbols to it, and then perform .search()es of valid, unknown, misspelled, etc symbols on the Corpus. The results come back as a vector of up to 10 results, sorted from highest similarity to lowest.

Examples

use ngrammatic::{CorpusBuilder, Pad};

let mut corpus = CorpusBuilder::new()
    .arity(2)
    .pad_full(Pad::Auto)
    .finish();

// Build up the list of known words
corpus.add_text("pie");
corpus.add_text("animal");
corpus.add_text("tomato");
corpus.add_text("seven");
corpus.add_text("carbon");

// Now we can try an unknown/misspelled word, and find a similar match
// in the corpus
let results = corpus.search("tomacco", 0.25);
let top_match = results.first();

assert!(top_match.is_some());
assert!(top_match.unwrap().similarity > 0.5);
assert_eq!(top_match.unwrap().text,String::from("tomato"));

Structs

Holds a corpus of words and their ngrams, allowing fuzzy matches of candidate strings against known strings in the corpus.

Build an Ngram Corpus, one setting at a time.

Stores a “word”, with all its n-grams. The “arity” member determines the value of “n” used in generating the n-grams.

Build an Ngram, one setting at a time.

Holds a fuzzy match search result string, and its associated similarity to the query text.

Enums

Determines how strings are padded before calculating the grams. Having some sort of padding is especially important for small words Auto pad pre/appends arity-1 space chars Read more about the effect of ngram padding