pub struct DoubleArrayAhoCorasick<V> { /* private fields */ }Expand description
A fast multiple pattern match automaton implemented with the Aho-Corasick algorithm and compact double-array data structure.
DoubleArrayAhoCorasick implements a pattern match automaton based on the
Aho-Corasick algorithm, supporting linear-time
pattern matching. The internal data structure employs the
compact double-array structure, the fastest
trie representation technique. It supports constant-time state-to-state traversal, allowing for
very fast pattern matching. Moreover, each state is represented in the space of only 12 bytes.
§Build instructions
DoubleArrayAhoCorasick supports the following two types of input data:
-
DoubleArrayAhoCorasick::newbuilds an automaton from a set of byte strings while assigning unique identifiers in the input order. -
DoubleArrayAhoCorasick::with_valuesbuilds an automaton from a set of pairs of a byte string and a user-defined value.
§Limitations
The maximum number of patterns is limited to 2^24-1. If a larger number of patterns is given,
DaachorseError will be reported.
Implementations§
Source§impl<V> DoubleArrayAhoCorasick<V>
impl<V> DoubleArrayAhoCorasick<V>
Sourcepub fn new<I, P>(patterns: I) -> Result<Self>
pub fn new<I, P>(patterns: I) -> Result<Self>
Creates a new DoubleArrayAhoCorasick from input patterns. The value i is
automatically associated with patterns[i].
§Arguments
patterns- List of patterns.
§Errors
DaachorseError is returned when
patternsis empty,patternscontains entries of length zero,patternscontains duplicate entries,- the conversion from the index
ito the specified typeVfails, - the scale of
patternsexceeds the expected one, or - the scale of the resulting automaton exceeds the expected one.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn with_values<I, P>(patvals: I) -> Result<Self>
pub fn with_values<I, P>(patvals: I) -> Result<Self>
Creates a new DoubleArrayAhoCorasick from input pattern-value pairs.
§Arguments
patvals- List of pattern-value pairs.
§Errors
DaachorseError is returned when
patvalsis empty,patvalscontains patterns of length zero,patvalscontains duplicate patterns,- the scale of
patvalsexceeds the expected one, or - the scale of the resulting automaton exceeds the expected one.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patvals = vec![("bcd", 0), ("ab", 1), ("a", 2), ("e", 1)];
let pma = DoubleArrayAhoCorasick::with_values(patvals).unwrap();
let mut it = pma.find_iter("abcde");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((4, 5, 1), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_iter<P>(
&self,
haystack: P,
) -> FindIterator<'_, U8SliceIterator<P>, V> ⓘ
pub fn find_iter<P>( &self, haystack: P, ) -> FindIterator<'_, U8SliceIterator<P>, V> ⓘ
Returns an iterator of non-overlapping matches in the given haystack.
§Arguments
haystack- String to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_iter_from_iter<P>(&self, haystack: P) -> FindIterator<'_, P, V> ⓘ
pub fn find_iter_from_iter<P>(&self, haystack: P) -> FindIterator<'_, P, V> ⓘ
Returns an iterator of non-overlapping matches in the given haystack iterator.
§Arguments
haystack-u8iterator to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_overlapping_iter<P>(
&self,
haystack: P,
) -> FindOverlappingIterator<'_, U8SliceIterator<P>, V> ⓘ
pub fn find_overlapping_iter<P>( &self, haystack: P, ) -> FindOverlappingIterator<'_, U8SliceIterator<P>, V> ⓘ
Returns an iterator of overlapping matches in the given haystack.
§Arguments
haystack- String to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_overlapping_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_overlapping_iter_from_iter<P>(
&self,
haystack: P,
) -> FindOverlappingIterator<'_, P, V> ⓘ
pub fn find_overlapping_iter_from_iter<P>( &self, haystack: P, ) -> FindOverlappingIterator<'_, P, V> ⓘ
Returns an iterator of overlapping matches in the given haystack iterator.
§Arguments
haystack-u8iterator to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_overlapping_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_overlapping_no_suffix_iter<P>(
&self,
haystack: P,
) -> FindOverlappingNoSuffixIterator<'_, U8SliceIterator<P>, V> ⓘ
pub fn find_overlapping_no_suffix_iter<P>( &self, haystack: P, ) -> FindOverlappingNoSuffixIterator<'_, U8SliceIterator<P>, V> ⓘ
Returns an iterator of overlapping matches without suffixes in the given haystack iterator.
The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.
This iterator returns the first match on each report.
§Arguments
haystack- String to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_overlapping_no_suffix_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_overlapping_no_suffix_iter_from_iter<P>(
&self,
haystack: P,
) -> FindOverlappingNoSuffixIterator<'_, P, V> ⓘ
pub fn find_overlapping_no_suffix_iter_from_iter<P>( &self, haystack: P, ) -> FindOverlappingNoSuffixIterator<'_, P, V> ⓘ
Returns an iterator of overlapping matches without suffixes in the given haystack iterator.
The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.
This iterator returns the first match on each report.
§Arguments
haystack-u8to search for.
§Panics
If you do not specify MatchKind::Standard in the construction, the iterator is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_overlapping_no_suffix_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn leftmost_find_iter<P>(
&self,
haystack: P,
) -> LeftmostFindIterator<'_, P, V> ⓘ
pub fn leftmost_find_iter<P>( &self, haystack: P, ) -> LeftmostFindIterator<'_, P, V> ⓘ
Returns an iterator of leftmost matches in the given haystack.
The leftmost match greedily searches the longest possible match at each iteration, and
the match results do not overlap positionally such as
DoubleArrayAhoCorasick::find_iter().
According to the MatchKind option you specified in the construction, the behavior is
changed for multiple possible matches, as follows.
-
If you set
MatchKind::LeftmostLongest, it reports the match corresponding to the longest pattern. -
If you set
MatchKind::LeftmostFirst, it reports the match corresponding to the pattern earlier registered to the automaton.
§Arguments
haystack- String to search for.
§Panics
If you do not specify MatchKind::LeftmostFirst or MatchKind::LeftmostLongest in
the construction, the iterator is not supported and the function will panic.
§Examples
§LeftmostLongest
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};
let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostLongest)
.build(&patterns)
.unwrap();
let mut it = pma.leftmost_find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 4, 2), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());§LeftmostFirst
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};
let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(&patterns)
.unwrap();
let mut it = pma.leftmost_find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 2, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub fn find_stepper(&self) -> FindStepper<'_, V>
pub fn find_stepper(&self) -> FindStepper<'_, V>
Returns a stepper of non-overlapping matches that consumes bytes one by one.
§Panics
If you do not specify MatchKind::Standard in the construction, the stepper is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut stepper = pma.find_stepper();
let m = stepper.consume(b'a');
let m = m.unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value())); // a
let m = stepper.consume(b'b');
assert_eq!(None, m);
let m = stepper.consume(b'c');
assert_eq!(None, m);
let m = stepper.consume(b'd');
let m = m.unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value())); // bcdSourcepub fn find_overlapping_stepper(&self) -> FindOverlappingStepper<'_, V>
pub fn find_overlapping_stepper(&self) -> FindOverlappingStepper<'_, V>
Returns a stepper of overlapping matches that consumes bytes one by one.
§Panics
If you do not specify MatchKind::Standard in the construction, the stepper is not
supported and the function will panic.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut stepper = pma.find_overlapping_stepper();
let mut it = stepper.consume(b'a');
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value())); // a
assert_eq!(None, it.next());
let mut it = stepper.consume(b'b');
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value())); // ab
assert_eq!(None, it.next());
let mut it = stepper.consume(b'c');
assert_eq!(None, it.next());
let mut it = stepper.consume(b'd');
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value())); // bcd
assert_eq!(None, it.next());Sourcepub fn heap_bytes(&self) -> usize
pub fn heap_bytes(&self) -> usize
Returns the total amount of heap used by this automaton in bytes.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
assert_eq!(3108, pma.heap_bytes());Sourcepub fn num_states(&self) -> usize
pub fn num_states(&self) -> usize
Returns the total number of states this automaton has.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<usize>::new(patterns).unwrap();
assert_eq!(pma.num_states(), 6);Sourcepub fn serialize(&self) -> Vec<u8>where
V: Serializable,
pub fn serialize(&self) -> Vec<u8>where
V: Serializable,
Sourcepub fn deserialize(source: &[u8]) -> Result<(Self, &[u8])>where
V: Serializable,
pub fn deserialize(source: &[u8]) -> Result<(Self, &[u8])>where
V: Serializable,
Deserializes the automaton from the given slice.
§Warning
This function verifies that the input automaton data will not cause out-of-bounds memory
access within this crate; however, it does not verify that the data is a valid
Aho-Corasick automaton. Consequently, if malformed data is provided, it may lead to
infinite loops or cause Match to return inaccurate ranges. Use this
function only if you can tolerate such errors.
§Arguments
source- A source slice.
§Returns
A tuple of the automaton and the slice not used for the deserialization.
§Errors
DaachorseError is returned when the given data is an invalid automaton.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();
let (pma, _) = DoubleArrayAhoCorasick::<u32>::deserialize(&bytes).unwrap();
let mut it = pma.find_overlapping_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Sourcepub unsafe fn deserialize_unchecked(source: &[u8]) -> (Self, &[u8])where
V: Serializable,
pub unsafe fn deserialize_unchecked(source: &[u8]) -> (Self, &[u8])where
V: Serializable,
Deserializes the automaton from the given slice without performing any validation.
This function does not perform any validation on the input data. If processing speed is not
critical, consider using DoubleArrayAhoCorasick::deserialize().
§Arguments
source- A source slice.
§Returns
A tuple of the automaton and the slice not used for the deserialization.
§Safety
The given data must be a correct automaton exported by
DoubleArrayAhoCorasick::serialize() function.
§Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();
let (pma, _) = unsafe { DoubleArrayAhoCorasick::<u32>::deserialize_unchecked(&bytes) };
let mut it = pma.find_overlapping_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());Trait Implementations§
Source§impl<V: Clone> Clone for DoubleArrayAhoCorasick<V>
impl<V: Clone> Clone for DoubleArrayAhoCorasick<V>
Source§fn clone(&self) -> DoubleArrayAhoCorasick<V>
fn clone(&self) -> DoubleArrayAhoCorasick<V>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more