pub struct DoubleArrayAhoCorasick<V> { /* private fields */ }
Expand description

A fast multiple pattern match automaton implemented with the Aho-Corasick algorithm and compact double-array data structure.

DoubleArrayAhoCorasick implements a pattern match automaton based on the Aho-Corasick algorithm, supporting linear-time pattern matching. The internal data structure employs the compact double-array structure, the fastest trie representation technique. It supports constant-time state-to-state traversal, allowing for very fast pattern matching. Moreover, each state is represented in the space of only 12 bytes.

Build instructions

DoubleArrayAhoCorasick supports the following two types of input data:

Limitations

The maximum number of patterns is limited to 2^24-1. If a larger number of patterns is given, DaachorseError will be reported.

Implementations

Creates a new DoubleArrayAhoCorasick from input patterns. The value i is automatically associated with patterns[i].

Arguments
  • patterns - List of patterns.
Errors

DaachorseError is returned when

  • patterns is empty,
  • patterns contains entries of length zero,
  • patterns contains duplicate entries,
  • the conversion from the index i to the specified type V fails,
  • the scale of patterns exceeds the expected one, or
  • the scale of the resulting automaton exceeds the expected one.
Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Creates a new DoubleArrayAhoCorasick from input pattern-value pairs.

Arguments
  • patvals - List of pattern-value pairs.
Errors

DaachorseError is returned when

  • patvals is empty,
  • patvals contains patterns of length zero,
  • patvals contains duplicate patterns,
  • the scale of patvals exceeds the expected one, or
  • the scale of the resulting automaton exceeds the expected one.
Examples
use daachorse::DoubleArrayAhoCorasick;

let patvals = vec![("bcd", 0), ("ab", 1), ("a", 2), ("e", 1)];
let pma = DoubleArrayAhoCorasick::with_values(patvals).unwrap();

let mut it = pma.find_iter("abcde");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((4, 5, 1), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of non-overlapping matches in the given haystack.

Arguments
  • haystack - String to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of non-overlapping matches in the given haystack iterator.

Arguments
  • haystack - u8 iterator to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of overlapping matches in the given haystack.

Arguments
  • haystack - String to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_overlapping_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of overlapping matches in the given haystack iterator.

Arguments
  • haystack - u8 iterator to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_overlapping_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of overlapping matches without suffixes in the given haystack iterator.

The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.

This iterator returns the first match on each report.

Arguments
  • haystack - String to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_overlapping_no_suffix_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of overlapping matches without suffixes in the given haystack iterator.

The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.

This iterator returns the first match on each report.

Arguments
  • haystack - u8 to search for.
Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_overlapping_no_suffix_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns an iterator of leftmost matches in the given haystack.

The leftmost match greedily searches the longest possible match at each iteration, and the match results do not overlap positionally such as DoubleArrayAhoCorasick::find_iter().

According to the MatchKind option you specified in the construction, the behavior is changed for multiple possible matches, as follows.

Arguments
  • haystack - String to search for.
Panics

If you do not specify MatchKind::LeftmostFirst or MatchKind::LeftmostLongest in the construction, the iterator is not supported and the function will panic.

Examples
LeftmostLongest
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};

let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostLongest)
    .build(&patterns)
    .unwrap();

let mut it = pma.leftmost_find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 4, 2), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
LeftmostFirst
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};

let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(&patterns)
    .unwrap();

let mut it = pma.leftmost_find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 2, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Returns the total amount of heap used by this automaton in bytes.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();

assert_eq!(3108, pma.heap_bytes());

Returns the total number of states this automaton has.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<usize>::new(patterns).unwrap();

assert_eq!(pma.num_states(), 6);

Serializes the automaton into a Vec.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();

Deserializes the automaton from a given slice.

Arguments
  • source - A source slice.
Returns

A tuple of the automaton and the slice not used for the deserialization.

Safety

The given data must be a correct automaton exported by DoubleArrayAhoCorasick::serialize() function.

Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();

let (pma, _) = unsafe { DoubleArrayAhoCorasick::<u32>::deserialize_unchecked(&bytes) };

let mut it = pma.find_overlapping_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Feeds this value into the given Hasher. Read more

Feeds a slice of this type into the given Hasher. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.