Enum regex_automata::DenseDFA

source ·

pub enum DenseDFA<T: AsRef<[S]>, S: StateID> {
    Standard(Standard<T, S>),
    ByteClass(ByteClass<T, S>),
    Premultiplied(Premultiplied<T, S>),
    PremultipliedByteClass(PremultipliedByteClass<T, S>),
    // some variants omitted
}

Expand description

A dense table-based deterministic finite automaton (DFA).

A dense DFA represents the core matching primitive in this crate. That is, logically, all DFAs have a single start state, one or more match states and a transition table that maps the current state and the current byte of input to the next state. A DFA can use this information to implement fast searching. In particular, the use of a dense DFA generally makes the trade off that match speed is the most valuable characteristic, even if building the regex may take significant time and space. As such, the processing of every byte of input is done with a small constant number of operations that does not vary with the pattern, its size or the size of the alphabet. If your needs don’t line up with this trade off, then a dense DFA may not be an adequate solution to your problem.

In contrast, a sparse DFA makes the opposite trade off: it uses less space but will execute a variable number of instructions per byte at match time, which makes it slower for matching.

A DFA can be built using the default configuration via the DenseDFA::new constructor. Otherwise, one can configure various aspects via the dense::Builder.

A single DFA fundamentally supports the following operations:

Detection of a match.
Location of the end of the first possible match.
Location of the end of the leftmost-first match.

A notable absence from the above list of capabilities is the location of the start of a match. In order to provide both the start and end of a match, two DFAs are required. This functionality is provided by a Regex, which can be built with its basic constructor, Regex::new, or with a RegexBuilder.

State size

A DenseDFA has two type parameters, T and S. T corresponds to the type of the DFA’s transition table while S corresponds to the representation used for the DFA’s state identifiers as described by the StateID trait. This type parameter is typically usize, but other valid choices provided by this crate include u8, u16, u32 and u64. The primary reason for choosing a different state identifier representation than the default is to reduce the amount of memory used by a DFA. Note though, that if the chosen representation cannot accommodate the size of your DFA, then building the DFA will fail and return an error.

While the reduction in heap memory used by a DFA is one reason for choosing a smaller state identifier representation, another possible reason is for decreasing the serialization size of a DFA, as returned by to_bytes_little_endian, to_bytes_big_endian or to_bytes_native_endian.

The type of the transition table is typically either Vec<S> or &[S], depending on where the transition table is stored.

Variants

This DFA is defined as a non-exhaustive enumeration of different types of dense DFAs. All of these dense DFAs use the same internal representation for the transition table, but they vary in how the transition table is read. A DFA’s specific variant depends on the configuration options set via dense::Builder. The default variant is PremultipliedByteClass.

The `DFA` trait

This type implements the DFA trait, which means it can be used for searching. For example:

use regex_automata::{DFA, DenseDFA};

let dfa = DenseDFA::new("foo[0-9]+")?;
assert_eq!(Some(8), dfa.find(b"foo12345"));

The DFA trait also provides an assortment of other lower level methods for DFAs, such as start_state and next_state. While these are correctly implemented, it is an anti-pattern to use them in performance sensitive code on the DenseDFA type directly. Namely, each implementation requires a branch to determine which type of dense DFA is being used. Instead, this branch should be pushed up a layer in the code since walking the transitions of a DFA is usually a hot path. If you do need to use these lower level methods in performance critical code, then you should match on the variants of this DFA and use each variant’s implementation of the DFA trait directly.

Variants

Standard(Standard<T, S>)

A standard DFA that does not use premultiplication or byte classes.

ByteClass(ByteClass<T, S>)

A DFA that shrinks its alphabet to a set of equivalence classes instead of using all possible byte values. Any two bytes belong to the same equivalence class if and only if they can be used interchangeably anywhere in the DFA while never discriminating between a match and a non-match.

This type of DFA can result in significant space reduction with a very small match time performance penalty.

Premultiplied(Premultiplied<T, S>)

A DFA that premultiplies all of its state identifiers in its transition table. This saves an instruction per byte at match time which improves search performance.

The only downside of premultiplication is that it may prevent one from using a smaller state identifier representation than you otherwise could.

PremultipliedByteClass(PremultipliedByteClass<T, S>)

The default configuration of a DFA, which uses byte classes and premultiplies its state identifiers.

Enum regex_automata::DenseDFA

Variants

Standard(Standard<T, S>)

ByteClass(ByteClass<T, S>)

Premultiplied(Premultiplied<T, S>)

PremultipliedByteClass(PremultipliedByteClass<T, S>)

Implementations

impl DenseDFA<Vec<usize>, usize>

pub fn new(pattern: &str) -> Result<DenseDFA<Vec<usize>, usize>, Error>

impl<S: StateID> DenseDFA<Vec<S>, S>

pub fn empty() -> DenseDFA<Vec<S>, S>

impl<T: AsRef<[S]>, S: StateID> DenseDFA<T, S>

pub fn as_ref<'a>(&'a self) -> DenseDFA<&'a [S], S>

pub fn to_owned(&self) -> DenseDFA<Vec<S>, S>

pub fn memory_usage(&self) -> usize

impl<T: AsRef<[S]>, S: StateID> DenseDFA<T, S>

pub fn to_sparse(&self) -> Result<SparseDFA<Vec<u8>, S>, Error>

pub fn to_sparse_sized<A: StateID>( &self) -> Result<SparseDFA<Vec<u8>, A>, Error>

pub fn to_u8(&self) -> Result<DenseDFA<Vec<u8>, u8>, Error>

pub fn to_u16(&self) -> Result<DenseDFA<Vec<u16>, u16>, Error>

pub fn to_u32(&self) -> Result<DenseDFA<Vec<u32>, u32>, Error>

pub fn to_u64(&self) -> Result<DenseDFA<Vec<u64>, u64>, Error>

pub fn to_sized<A: StateID>(&self) -> Result<DenseDFA<Vec<A>, A>, Error>

pub fn to_bytes_little_endian(&self) -> Result<Vec<u8>, Error>

pub fn to_bytes_big_endian(&self) -> Result<Vec<u8>, Error>

pub fn to_bytes_native_endian(&self) -> Result<Vec<u8>, Error>

impl<'a, S: StateID> DenseDFA<&'a [S], S>

pub unsafe fn from_bytes(buf: &'a [u8]) -> DenseDFA<&'a [S], S>

Trait Implementations

impl<T: Clone + AsRef<[S]>, S: Clone + StateID> Clone for DenseDFA<T, S>

fn clone(&self) -> DenseDFA<T, S>

fn clone_from(&mut self, source: &Self)

impl<T: AsRef<[S]>, S: StateID> DFA for DenseDFA<T, S>

type ID = S

fn start_state(&self) -> S

fn is_match_state(&self, id: S) -> bool

fn is_dead_state(&self, id: S) -> bool

fn is_match_or_dead_state(&self, id: S) -> bool

fn is_anchored(&self) -> bool

fn next_state(&self, current: S, input: u8) -> S

unsafe fn next_state_unchecked(&self, current: S, input: u8) -> S

fn is_match_at(&self, bytes: &[u8], start: usize) -> bool

fn shortest_match_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn find_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn rfind_at(&self, bytes: &[u8], start: usize) -> Option<usize>

fn is_match(&self, bytes: &[u8]) -> bool

fn shortest_match(&self, bytes: &[u8]) -> Option<usize>

fn find(&self, bytes: &[u8]) -> Option<usize>

fn rfind(&self, bytes: &[u8]) -> Option<usize>

impl<T: Debug + AsRef<[S]>, S: Debug + StateID> Debug for DenseDFA<T, S>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations

impl<T, S> RefUnwindSafe for DenseDFA<T, S>where S: RefUnwindSafe, T: RefUnwindSafe,

impl<T, S> Send for DenseDFA<T, S>where S: Send, T: Send,

impl<T, S> Sync for DenseDFA<T, S>where S: Sync, T: Sync,

impl<T, S> Unpin for DenseDFA<T, S>where S: Unpin, T: Unpin,

impl<T, S> UnwindSafe for DenseDFA<T, S>where S: UnwindSafe, T: UnwindSafe,

Blanket Implementations

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

pub fn to_sparse_sized<A: StateID>(
&self
) -> Result<SparseDFA<Vec<u8>, A>, Error>

impl<T, S> RefUnwindSafe for DenseDFA<T, S>where
S: RefUnwindSafe,
T: RefUnwindSafe,

impl<T, S> Send for DenseDFA<T, S>where
S: Send,
T: Send,

impl<T, S> Sync for DenseDFA<T, S>where
S: Sync,
T: Sync,

impl<T, S> Unpin for DenseDFA<T, S>where
S: Unpin,
T: Unpin,

impl<T, S> UnwindSafe for DenseDFA<T, S>where
S: UnwindSafe,
T: UnwindSafe,

impl<T> Any for Twhere
T: 'static + ?Sized,

impl<T> Borrow<T> for Twhere
T: ?Sized,

impl<T> BorrowMut<T> for Twhere
T: ?Sized,

impl<T, U> Into<U> for Twhere
U: From<T>,

impl<T> ToOwned for Twhere
T: Clone,

impl<T, U> TryFrom<U> for Twhere
U: Into<T>,

impl<T, U> TryInto<U> for Twhere
U: TryFrom<T>,