Levenshtein

Struct Levenshtein 

Source
pub struct Levenshtein { /* private fields */ }
Expand description

A Unicode aware Levenshtein automaton for running efficient fuzzy queries.

This is only defined when the levenshtein crate feature is enabled.

A Levenshtein automata is one way to search any finite state transducer for keys that approximately match a given query. A Levenshtein automaton approximates this by returning all keys within a certain edit distance of the query. The edit distance is defined by the number of insertions, deletions and substitutions required to turn the query into the key. Insertions, deletions and substitutions are based on Unicode characters (where each character is a single Unicode scalar value).

§Example

This example shows how to find all keys within an edit distance of 1 from foo.

use fst_no_std::automaton::Levenshtein;
use fst_no_std::{IntoStreamer, Streamer, Set};

let keys = vec!["fa", "fo", "fob", "focus", "foo", "food", "foul"];
let set = Set::from_iter(keys).unwrap();

let lev = Levenshtein::new("foo", 1).unwrap();
let mut stream = set.search(&lev).into_stream();

let mut keys = vec![];
while let Some(key) = stream.next() {
    keys.push(key.to_vec());
}

assert_eq!(keys, vec![
   "fo".as_bytes(),   // 1 deletion
    "fob".as_bytes(),  // 1 substitution
    "foo".as_bytes(),  // 0 insertions/deletions/substitutions
    "food".as_bytes(), // 1 insertion
]);

This example only uses ASCII characters, but it will work equally well on Unicode characters.

§Warning: experimental

While executing this Levenshtein automaton against a finite state transducer will be very fast, constructing an automaton may not be. Namely, this implementation is a proof of concept. While I believe the algorithmic complexity is not exponential, the implementation is not speedy and it can use enormous amounts of memory (tens of MB before a hard-coded limit will cause an error to be returned).

This is important functionality, so one should count on this implementation being vastly improved in the future.

Implementations§

Source§

impl Levenshtein

Source

pub fn new(query: &str, distance: u32) -> Result<Levenshtein, LevenshteinError>

Create a new Levenshtein query.

The query finds all matching terms that are at most distance edit operations from query. (An edit operation may be an insertion, a deletion or a substitution.)

If the underlying automaton becomes too big, then an error is returned. Use new_with_limit to raise the limit dynamically.

A Levenshtein value satisfies the Automaton trait, which means it can be used with the search method of any finite state transducer.

Source

pub fn new_with_limit( query: &str, distance: u32, state_limit: usize, ) -> Result<Levenshtein, LevenshteinError>

Create a new Levenshtein query, but pass the state limit yourself.

The query finds all matching terms that are at most distance edit operations from query. (An edit operation may be an insertion, a deletion or a substitution.)

If the underlying automaton becomes too big, then an error is returned. This limit can be configured with state_limit.

A Levenshtein value satisfies the Automaton trait, which means it can be used with the search method of any finite state transducer.

Trait Implementations§

Source§

impl Automaton for Levenshtein

Available on crate feature alloc only.
Source§

type State = Option<usize>

The type of the state used in the automaton.
Source§

fn start(&self) -> Option<usize>

Returns a single start state for this automaton. Read more
Source§

fn is_match(&self, state: &Option<usize>) -> bool

Returns true if and only if state is a match state.
Source§

fn can_match(&self, state: &Option<usize>) -> bool

Returns true if and only if state can lead to a match in zero or more steps. Read more
Source§

fn accept(&self, state: &Option<usize>, byte: u8) -> Option<usize>

Return the next state given state and an input.
Source§

fn will_always_match(&self, _state: &Self::State) -> bool

Returns true if and only if state matches and must match no matter what steps are taken. Read more
Source§

fn accept_eof(&self, _: &Self::State) -> Option<Self::State>

If applicable, return the next state when the end of a key is seen.
Source§

fn starts_with(self) -> StartsWith<Self>
where Self: Sized,

Returns an automaton that matches the strings that start with something this automaton matches.
Source§

fn union<Rhs: Automaton>(self, rhs: Rhs) -> Union<Self, Rhs>
where Self: Sized,

Returns an automaton that matches the strings matched by either this or the other automaton.
Source§

fn intersection<Rhs: Automaton>(self, rhs: Rhs) -> Intersection<Self, Rhs>
where Self: Sized,

Returns an automaton that matches the strings matched by both this and the other automaton.
Source§

fn complement(self) -> Complement<Self>
where Self: Sized,

Returns an automaton that matches the strings not matched by this automaton.
Source§

impl Debug for Levenshtein

Available on crate feature alloc only.
Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.