Skip to main content

DoubleArrayAhoCorasick

Struct DoubleArrayAhoCorasick 

Source
pub struct DoubleArrayAhoCorasick<V> { /* private fields */ }
Expand description

A fast multiple pattern match automaton implemented with the Aho-Corasick algorithm and compact double-array data structure.

DoubleArrayAhoCorasick implements a pattern match automaton based on the Aho-Corasick algorithm, supporting linear-time pattern matching. The internal data structure employs the compact double-array structure, the fastest trie representation technique. It supports constant-time state-to-state traversal, allowing for very fast pattern matching. Moreover, each state is represented in the space of only 12 bytes.

§Build instructions

DoubleArrayAhoCorasick supports the following two types of input data:

§Limitations

The maximum number of patterns is limited to 2^24-1. If a larger number of patterns is given, DaachorseError will be reported.

Implementations§

Source§

impl<V> DoubleArrayAhoCorasick<V>

Source

pub fn new<I, P>(patterns: I) -> Result<Self>
where I: IntoIterator<Item = P>, P: AsRef<[u8]>, V: Copy + TryFrom<usize>,

Creates a new DoubleArrayAhoCorasick from input patterns. The value i is automatically associated with patterns[i].

§Arguments
  • patterns - List of patterns.
§Errors

DaachorseError is returned when

  • patterns is empty,
  • patterns contains entries of length zero,
  • patterns contains duplicate entries,
  • the conversion from the index i to the specified type V fails,
  • the scale of patterns exceeds the expected one, or
  • the scale of the resulting automaton exceeds the expected one.
§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn with_values<I, P>(patvals: I) -> Result<Self>
where I: IntoIterator<Item = (P, V)>, P: AsRef<[u8]>, V: Copy,

Creates a new DoubleArrayAhoCorasick from input pattern-value pairs.

§Arguments
  • patvals - List of pattern-value pairs.
§Errors

DaachorseError is returned when

  • patvals is empty,
  • patvals contains patterns of length zero,
  • patvals contains duplicate patterns,
  • the scale of patvals exceeds the expected one, or
  • the scale of the resulting automaton exceeds the expected one.
§Examples
use daachorse::DoubleArrayAhoCorasick;

let patvals = vec![("bcd", 0), ("ab", 1), ("a", 2), ("e", 1)];
let pma = DoubleArrayAhoCorasick::with_values(patvals).unwrap();

let mut it = pma.find_iter("abcde");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((4, 5, 1), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_iter<P>( &self, haystack: P, ) -> FindIterator<'_, U8SliceIterator<P>, V>
where P: AsRef<[u8]>,

Returns an iterator of non-overlapping matches in the given haystack.

§Arguments
  • haystack - String to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_iter_from_iter<P>(&self, haystack: P) -> FindIterator<'_, P, V>
where P: Iterator<Item = u8>,

Returns an iterator of non-overlapping matches in the given haystack iterator.

§Arguments
  • haystack - u8 iterator to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_overlapping_iter<P>( &self, haystack: P, ) -> FindOverlappingIterator<'_, U8SliceIterator<P>, V>
where P: AsRef<[u8]>,

Returns an iterator of overlapping matches in the given haystack.

§Arguments
  • haystack - String to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_overlapping_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_overlapping_iter_from_iter<P>( &self, haystack: P, ) -> FindOverlappingIterator<'_, P, V>
where P: Iterator<Item = u8>,

Returns an iterator of overlapping matches in the given haystack iterator.

§Arguments
  • haystack - u8 iterator to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_overlapping_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_overlapping_no_suffix_iter<P>( &self, haystack: P, ) -> FindOverlappingNoSuffixIterator<'_, U8SliceIterator<P>, V>
where P: AsRef<[u8]>,

Returns an iterator of overlapping matches without suffixes in the given haystack iterator.

The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.

This iterator returns the first match on each report.

§Arguments
  • haystack - String to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut it = pma.find_overlapping_no_suffix_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_overlapping_no_suffix_iter_from_iter<P>( &self, haystack: P, ) -> FindOverlappingNoSuffixIterator<'_, P, V>
where P: Iterator<Item = u8>,

Returns an iterator of overlapping matches without suffixes in the given haystack iterator.

The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.

This iterator returns the first match on each report.

§Arguments
  • haystack - u8 to search for.
§Panics

If you do not specify MatchKind::Standard in the construction, the iterator is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();

let mut it = pma.find_overlapping_no_suffix_iter_from_iter(haystack);

let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn leftmost_find_iter<P>( &self, haystack: P, ) -> LeftmostFindIterator<'_, P, V>
where P: AsRef<[u8]>,

Returns an iterator of leftmost matches in the given haystack.

The leftmost match greedily searches the longest possible match at each iteration, and the match results do not overlap positionally such as DoubleArrayAhoCorasick::find_iter().

According to the MatchKind option you specified in the construction, the behavior is changed for multiple possible matches, as follows.

§Arguments
  • haystack - String to search for.
§Panics

If you do not specify MatchKind::LeftmostFirst or MatchKind::LeftmostLongest in the construction, the iterator is not supported and the function will panic.

§Examples
§LeftmostLongest
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};

let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostLongest)
    .build(&patterns)
    .unwrap();

let mut it = pma.leftmost_find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 4, 2), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
§LeftmostFirst
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};

let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(&patterns)
    .unwrap();

let mut it = pma.leftmost_find_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 2, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub fn find_stepper(&self) -> FindStepper<'_, V>

Returns a stepper of non-overlapping matches that consumes bytes one by one.

§Panics

If you do not specify MatchKind::Standard in the construction, the stepper is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut stepper = pma.find_stepper();

let m = stepper.consume(b'a');
let m = m.unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value())); // a

let m = stepper.consume(b'b');
assert_eq!(None, m);

let m = stepper.consume(b'c');
assert_eq!(None, m);

let m = stepper.consume(b'd');
let m = m.unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value())); // bcd
Source

pub fn find_overlapping_stepper(&self) -> FindOverlappingStepper<'_, V>

Returns a stepper of overlapping matches that consumes bytes one by one.

§Panics

If you do not specify MatchKind::Standard in the construction, the stepper is not supported and the function will panic.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();

let mut stepper = pma.find_overlapping_stepper();

let mut it = stepper.consume(b'a');
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value())); // a
assert_eq!(None, it.next());

let mut it = stepper.consume(b'b');
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value())); // ab
assert_eq!(None, it.next());

let mut it = stepper.consume(b'c');
assert_eq!(None, it.next());

let mut it = stepper.consume(b'd');
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value())); // bcd
assert_eq!(None, it.next());
Source

pub fn heap_bytes(&self) -> usize

Returns the total amount of heap used by this automaton in bytes.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();

assert_eq!(3108, pma.heap_bytes());
Source

pub fn num_states(&self) -> usize

Returns the total number of states this automaton has.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<usize>::new(patterns).unwrap();

assert_eq!(pma.num_states(), 6);
Source

pub fn serialize(&self) -> Vec<u8>
where V: Serializable,

Serializes the automaton into a Vec.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();
Source

pub fn deserialize(source: &[u8]) -> Result<(Self, &[u8])>
where V: Serializable,

Deserializes the automaton from the given slice.

§Warning

This function verifies that the input automaton data will not cause out-of-bounds memory access within this crate; however, it does not verify that the data is a valid Aho-Corasick automaton. Consequently, if malformed data is provided, it may lead to infinite loops or cause Match to return inaccurate ranges. Use this function only if you can tolerate such errors.

§Arguments
  • source - A source slice.
§Returns

A tuple of the automaton and the slice not used for the deserialization.

§Errors

DaachorseError is returned when the given data is an invalid automaton.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();

let (pma, _) = DoubleArrayAhoCorasick::<u32>::deserialize(&bytes).unwrap();

let mut it = pma.find_overlapping_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());
Source

pub unsafe fn deserialize_unchecked(source: &[u8]) -> (Self, &[u8])
where V: Serializable,

Deserializes the automaton from the given slice without performing any validation.

This function does not perform any validation on the input data. If processing speed is not critical, consider using DoubleArrayAhoCorasick::deserialize().

§Arguments
  • source - A source slice.
§Returns

A tuple of the automaton and the slice not used for the deserialization.

§Safety

The given data must be a correct automaton exported by DoubleArrayAhoCorasick::serialize() function.

§Examples
use daachorse::DoubleArrayAhoCorasick;

let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();

let (pma, _) = unsafe { DoubleArrayAhoCorasick::<u32>::deserialize_unchecked(&bytes) };

let mut it = pma.find_overlapping_iter("abcd");

let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));

let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));

assert_eq!(None, it.next());

Trait Implementations§

Source§

impl<V: Clone> Clone for DoubleArrayAhoCorasick<V>

Source§

fn clone(&self) -> DoubleArrayAhoCorasick<V>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<V: Hash> Hash for DoubleArrayAhoCorasick<V>

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl<V: PartialEq> PartialEq for DoubleArrayAhoCorasick<V>

Source§

fn eq(&self, other: &DoubleArrayAhoCorasick<V>) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl<V: Eq> Eq for DoubleArrayAhoCorasick<V>

Source§

impl<V> StructuralPartialEq for DoubleArrayAhoCorasick<V>

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.