Struct daachorse::bytewise::DoubleArrayAhoCorasick
source · [−]pub struct DoubleArrayAhoCorasick<V> { /* private fields */ }
Expand description
A fast multiple pattern match automaton implemented with the Aho-Corasick algorithm and compact double-array data structure.
DoubleArrayAhoCorasick
implements a pattern match automaton based on the
Aho-Corasick algorithm, supporting linear-time
pattern matching. The internal data structure employs the
compact double-array structure, the fastest
trie representation technique. It supports constant-time state-to-state traversal, allowing for
very fast pattern matching. Moreover, each state is represented in the space of only 12 bytes.
Build instructions
DoubleArrayAhoCorasick
supports the following two types of input data:
-
DoubleArrayAhoCorasick::new
builds an automaton from a set of byte strings while assigning unique identifiers in the input order. -
DoubleArrayAhoCorasick::with_values
builds an automaton from a set of pairs of a byte string and a user-defined value.
Limitations
The maximum number of patterns is limited to 2^24-1. If a larger number of patterns is given,
DaachorseError
will be reported.
Implementations
sourceimpl<V> DoubleArrayAhoCorasick<V>
impl<V> DoubleArrayAhoCorasick<V>
sourcepub fn new<I, P>(patterns: I) -> Result<Self> where
I: IntoIterator<Item = P>,
P: AsRef<[u8]>,
V: Copy + TryFrom<usize>,
pub fn new<I, P>(patterns: I) -> Result<Self> where
I: IntoIterator<Item = P>,
P: AsRef<[u8]>,
V: Copy + TryFrom<usize>,
Creates a new DoubleArrayAhoCorasick
from input patterns. The value i
is
automatically associated with patterns[i]
.
Arguments
patterns
- List of patterns.
Errors
DaachorseError
is returned when
patterns
is empty,patterns
contains entries of length zero,patterns
contains duplicate entries,- the conversion from the index
i
to the specified typeV
fails, - the scale of
patterns
exceeds the expected one, or - the scale of the resulting automaton exceeds the expected one.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn with_values<I, P>(patvals: I) -> Result<Self> where
I: IntoIterator<Item = (P, V)>,
P: AsRef<[u8]>,
V: Copy,
pub fn with_values<I, P>(patvals: I) -> Result<Self> where
I: IntoIterator<Item = (P, V)>,
P: AsRef<[u8]>,
V: Copy,
Creates a new DoubleArrayAhoCorasick
from input pattern-value pairs.
Arguments
patvals
- List of pattern-value pairs.
Errors
DaachorseError
is returned when
patvals
is empty,patvals
contains patterns of length zero,patvals
contains duplicate patterns,- the scale of
patvals
exceeds the expected one, or - the scale of the resulting automaton exceeds the expected one.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patvals = vec![("bcd", 0), ("ab", 1), ("a", 2), ("e", 1)];
let pma = DoubleArrayAhoCorasick::with_values(patvals).unwrap();
let mut it = pma.find_iter("abcde");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((4, 5, 1), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_iter<P>(
&self,
haystack: P
) -> FindIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindIterator<'a, P, V>impl<'a, P, V> Iterator for FindIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
pub fn find_iter<P>(
&self,
haystack: P
) -> FindIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindIterator<'a, P, V>impl<'a, P, V> Iterator for FindIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of non-overlapping matches in the given haystack.
Arguments
haystack
- String to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_iter_from_iter<P>(&self, haystack: P) -> FindIterator<'_, P, V>ⓘNotable traits for FindIterator<'a, P, V>impl<'a, P, V> Iterator for FindIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
pub fn find_iter_from_iter<P>(&self, haystack: P) -> FindIterator<'_, P, V>ⓘNotable traits for FindIterator<'a, P, V>impl<'a, P, V> Iterator for FindIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of non-overlapping matches in the given haystack iterator.
Arguments
haystack
-u8
iterator to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_overlapping_iter<P>(
&self,
haystack: P
) -> FindOverlappingIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindOverlappingIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
pub fn find_overlapping_iter<P>(
&self,
haystack: P
) -> FindOverlappingIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindOverlappingIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of overlapping matches in the given haystack.
Arguments
haystack
- String to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_overlapping_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_overlapping_iter_from_iter<P>(
&self,
haystack: P
) -> FindOverlappingIterator<'_, P, V>ⓘNotable traits for FindOverlappingIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
pub fn find_overlapping_iter_from_iter<P>(
&self,
haystack: P
) -> FindOverlappingIterator<'_, P, V>ⓘNotable traits for FindOverlappingIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of overlapping matches in the given haystack iterator.
Arguments
haystack
-u8
iterator to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_overlapping_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_overlapping_no_suffix_iter<P>(
&self,
haystack: P
) -> FindOverlappingNoSuffixIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindOverlappingNoSuffixIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingNoSuffixIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
pub fn find_overlapping_no_suffix_iter<P>(
&self,
haystack: P
) -> FindOverlappingNoSuffixIterator<'_, U8SliceIterator<P>, V>ⓘNotable traits for FindOverlappingNoSuffixIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingNoSuffixIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of overlapping matches without suffixes in the given haystack iterator.
The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.
This iterator returns the first match on each report.
Arguments
haystack
- String to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let mut it = pma.find_overlapping_no_suffix_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn find_overlapping_no_suffix_iter_from_iter<P>(
&self,
haystack: P
) -> FindOverlappingNoSuffixIterator<'_, P, V>ⓘNotable traits for FindOverlappingNoSuffixIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingNoSuffixIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
pub fn find_overlapping_no_suffix_iter_from_iter<P>(
&self,
haystack: P
) -> FindOverlappingNoSuffixIterator<'_, P, V>ⓘNotable traits for FindOverlappingNoSuffixIterator<'a, P, V>impl<'a, P, V> Iterator for FindOverlappingNoSuffixIterator<'a, P, V> where
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
where
P: Iterator<Item = u8>,
P: Iterator<Item = u8>,
V: Copy, type Item = Match<V>;
Returns an iterator of overlapping matches without suffixes in the given haystack iterator.
The Aho-Corasick algorithm reads through the haystack from left to right and reports matches when it reaches the end of each pattern. In the overlapping match, more than one pattern can be returned per report.
This iterator returns the first match on each report.
Arguments
haystack
-u8
to search for.
Panics
If you do not specify MatchKind::Standard
in the construction, the iterator is not
supported and the function will panic.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "cd", "abc"];
let pma = DoubleArrayAhoCorasick::new(patterns).unwrap();
let haystack = "ab".as_bytes().iter().chain("cd".as_bytes()).copied();
let mut it = pma.find_overlapping_no_suffix_iter_from_iter(haystack);
let m = it.next().unwrap();
assert_eq!((0, 3, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn leftmost_find_iter<P>(
&self,
haystack: P
) -> LestmostFindIterator<'_, P, V>ⓘNotable traits for LestmostFindIterator<'a, P, V>impl<'a, P, V> Iterator for LestmostFindIterator<'a, P, V> where
P: AsRef<[u8]>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
pub fn leftmost_find_iter<P>(
&self,
haystack: P
) -> LestmostFindIterator<'_, P, V>ⓘNotable traits for LestmostFindIterator<'a, P, V>impl<'a, P, V> Iterator for LestmostFindIterator<'a, P, V> where
P: AsRef<[u8]>,
V: Copy, type Item = Match<V>;
where
P: AsRef<[u8]>,
P: AsRef<[u8]>,
V: Copy, type Item = Match<V>;
Returns an iterator of leftmost matches in the given haystack.
The leftmost match greedily searches the longest possible match at each iteration, and
the match results do not overlap positionally such as
DoubleArrayAhoCorasick::find_iter()
.
According to the MatchKind
option you specified in the construction, the behavior is
changed for multiple possible matches, as follows.
-
If you set
MatchKind::LeftmostLongest
, it reports the match corresponding to the longest pattern. -
If you set
MatchKind::LeftmostFirst
, it reports the match corresponding to the pattern earlier registered to the automaton.
Arguments
haystack
- String to search for.
Panics
If you do not specify MatchKind::LeftmostFirst
or MatchKind::LeftmostLongest
in
the construction, the iterator is not supported and the function will panic.
Examples
LeftmostLongest
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};
let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostLongest)
.build(&patterns)
.unwrap();
let mut it = pma.leftmost_find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 4, 2), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
LeftmostFirst
use daachorse::{DoubleArrayAhoCorasickBuilder, MatchKind};
let patterns = vec!["ab", "a", "abcd"];
let pma = DoubleArrayAhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(&patterns)
.unwrap();
let mut it = pma.leftmost_find_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 2, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
sourcepub fn heap_bytes(&self) -> usize
pub fn heap_bytes(&self) -> usize
Returns the total amount of heap used by this automaton in bytes.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
assert_eq!(3108, pma.heap_bytes());
sourcepub fn num_states(&self) -> usize
pub fn num_states(&self) -> usize
Returns the total number of states this automaton has.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<usize>::new(patterns).unwrap();
assert_eq!(pma.num_states(), 6);
sourcepub fn serialize(&self) -> Vec<u8> where
V: Serializable,
pub fn serialize(&self) -> Vec<u8> where
V: Serializable,
sourcepub unsafe fn deserialize_unchecked(source: &[u8]) -> (Self, &[u8]) where
V: Serializable,
pub unsafe fn deserialize_unchecked(source: &[u8]) -> (Self, &[u8]) where
V: Serializable,
Deserializes the automaton from a given slice.
Arguments
source
- A source slice.
Returns
A tuple of the automaton and the slice not used for the deserialization.
Safety
The given data must be a correct automaton exported by
DoubleArrayAhoCorasick::serialize()
function.
Examples
use daachorse::DoubleArrayAhoCorasick;
let patterns = vec!["bcd", "ab", "a"];
let pma = DoubleArrayAhoCorasick::<u32>::new(patterns).unwrap();
let bytes = pma.serialize();
let (pma, _) = unsafe { DoubleArrayAhoCorasick::<u32>::deserialize_unchecked(&bytes) };
let mut it = pma.find_overlapping_iter("abcd");
let m = it.next().unwrap();
assert_eq!((0, 1, 2), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((0, 2, 1), (m.start(), m.end(), m.value()));
let m = it.next().unwrap();
assert_eq!((1, 4, 0), (m.start(), m.end(), m.value()));
assert_eq!(None, it.next());
Trait Implementations
sourceimpl<V: Clone> Clone for DoubleArrayAhoCorasick<V>
impl<V: Clone> Clone for DoubleArrayAhoCorasick<V>
sourcefn clone(&self) -> DoubleArrayAhoCorasick<V>
fn clone(&self) -> DoubleArrayAhoCorasick<V>
Returns a copy of the value. Read more
1.0.0 · sourcefn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from source
. Read more
sourceimpl<V: Hash> Hash for DoubleArrayAhoCorasick<V>
impl<V: Hash> Hash for DoubleArrayAhoCorasick<V>
sourceimpl<V: PartialEq> PartialEq<DoubleArrayAhoCorasick<V>> for DoubleArrayAhoCorasick<V>
impl<V: PartialEq> PartialEq<DoubleArrayAhoCorasick<V>> for DoubleArrayAhoCorasick<V>
sourcefn eq(&self, other: &DoubleArrayAhoCorasick<V>) -> bool
fn eq(&self, other: &DoubleArrayAhoCorasick<V>) -> bool
This method tests for self
and other
values to be equal, and is used
by ==
. Read more
sourcefn ne(&self, other: &DoubleArrayAhoCorasick<V>) -> bool
fn ne(&self, other: &DoubleArrayAhoCorasick<V>) -> bool
This method tests for !=
.
impl<V: Eq> Eq for DoubleArrayAhoCorasick<V>
impl<V> StructuralEq for DoubleArrayAhoCorasick<V>
impl<V> StructuralPartialEq for DoubleArrayAhoCorasick<V>
Auto Trait Implementations
impl<V> RefUnwindSafe for DoubleArrayAhoCorasick<V> where
V: RefUnwindSafe,
impl<V> Send for DoubleArrayAhoCorasick<V> where
V: Send,
impl<V> Sync for DoubleArrayAhoCorasick<V> where
V: Sync,
impl<V> Unpin for DoubleArrayAhoCorasick<V> where
V: Unpin,
impl<V> UnwindSafe for DoubleArrayAhoCorasick<V> where
V: UnwindSafe,
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more