pub struct DFA<T> { /* private fields */ }
A dense table-based deterministic finite automaton (DFA).

All dense DFAs have one or more start states, zero or more match states and a transition table that maps the current state and the current byte of input to the next state. A DFA can use this information to implement fast searching. In particular, the use of a dense DFA generally makes the trade off that match speed is the most valuable characteristic, even if building the DFA may take significant time and space. (More concretely, building a DFA takes time and space that is exponential in the size of the pattern in the worst case.) As such, the processing of every byte of input is done with a small constant number of operations that does not vary with the pattern, its size or the size of the alphabet. If your needs don’t line up with this trade off, then a dense DFA may not be an adequate solution to your problem.

In contrast, a sparse::DFA makes the opposite trade off: it uses less space but will execute a variable number of instructions per byte at match time, which makes it slower for matching. (Note that space usage is still exponential in the size of the pattern in the worst case.)

A DFA can be built using the default configuration via the DFA::new constructor. Otherwise, one can configure various aspects via dense::Builder.

A single DFA fundamentally supports the following operations:

  1. Detection of a match.
  2. Location of the end of a match.
  3. In the case of a DFA with multiple patterns, which pattern matched is reported as well.

A notable absence from the above list of capabilities is the location of the start of a match. In order to provide both the start and end of a match, two DFAs are required. This functionality is provided by a Regex.

Type parameters

A DFA has one type parameter, T, which is used to represent state IDs, pattern IDs and accelerators. T is typically a Vec<u32> or a &[u32].

The Automaton trait

This type implements the Automaton trait, which means it can be used for searching. For example:

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

let dfa = DFA::new("foo[0-9]+")?;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.try_search_fwd(&Input::new("foo12345"))?);



impl DFA<Vec<u32>>


pub fn new(pattern: &str) -> Result<DFA<Vec<u32>>, BuildError>

Parse the given regular expression using a default configuration and return the corresponding DFA.

If you want a non-default configuration, then use the dense::Builder to set your own configuration.

use regex_automata::{dfa::{Automaton, dense}, HalfMatch, Input};

let dfa = dense::DFA::new("foo[0-9]+bar")?;
let expected = Some(HalfMatch::must(0, 11));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345bar"))?);

pub fn new_many<P: AsRef<str>>( patterns: &[P] ) -> Result<DFA<Vec<u32>>, BuildError>

Parse the given regular expressions using a default configuration and return the corresponding multi-DFA.

If you want a non-default configuration, then use the dense::Builder to set your own configuration.

use regex_automata::{dfa::{Automaton, dense}, HalfMatch, Input};

let dfa = dense::DFA::new_many(&["[0-9]+", "[a-z]+"])?;
let expected = Some(HalfMatch::must(1, 3));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345bar"))?);

impl DFA<Vec<u32>>


pub fn always_match() -> Result<DFA<Vec<u32>>, BuildError>

Create a new DFA that matches every input.

use regex_automata::{dfa::{Automaton, dense}, HalfMatch, Input};

let dfa = dense::DFA::always_match()?;

let expected = Some(HalfMatch::must(0, 0));
assert_eq!(expected, dfa.try_search_fwd(&Input::new(""))?);
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo"))?);

pub fn never_match() -> Result<DFA<Vec<u32>>, BuildError>

Create a new DFA that never matches any input.

use regex_automata::{dfa::{Automaton, dense}, Input};

let dfa = dense::DFA::never_match()?;
assert_eq!(None, dfa.try_search_fwd(&Input::new(""))?);
assert_eq!(None, dfa.try_search_fwd(&Input::new("foo"))?);

impl DFA<&[u32]>


pub fn config() -> Config

Return a new default dense DFA compiler configuration.

This is a convenience routine to avoid needing to import the Config type when customizing the construction of a dense DFA.


pub fn builder() -> Builder

Create a new dense DFA builder with the default configuration.

This is a convenience routine to avoid needing to import the Builder type in common cases.


impl<T: AsRef<[u32]>> DFA<T>


pub fn as_ref(&self) -> DFA<&[u32]>

Cheaply return a borrowed version of this dense DFA. Specifically, the DFA returned always uses &[u32] for its transition table.


pub fn to_owned(&self) -> DFA<Vec<u32>>

Return an owned version of this sparse DFA. Specifically, the DFA returned always uses Vec<u32> for its transition table.

Effectively, this returns a dense DFA whose transition table lives on the heap.


pub fn start_kind(&self) -> StartKind

Returns the starting state configuration for this DFA.

The default is StartKind::Both, which means the DFA supports both unanchored and anchored searches. However, this can generally lead to bigger DFAs. Therefore, a DFA might be compiled with support for just unanchored or anchored searches. In that case, running a search with an unsupported configuration will panic.


pub fn starts_for_each_pattern(&self) -> bool

Returns true only if this DFA has starting states for each pattern.

When a DFA has starting states for each pattern, then a search with the DFA can be configured to only look for anchored matches of a specific pattern. Specifically, APIs like Automaton::try_search_fwd can accept a non-None pattern_id if and only if this method returns true. Otherwise, calling try_search_fwd will panic.

Note that if the DFA has no patterns, this always returns false.


pub fn byte_classes(&self) -> &ByteClasses

Returns the equivalence classes that make up the alphabet for this DFA.

Unless Config::byte_classes was disabled, it is possible that multiple distinct bytes are grouped into the same equivalence class if it is impossible for them to discriminate between a match and a non-match. This has the effect of reducing the overall alphabet size and in turn potentially substantially reducing the size of the DFA’s transition table.

The downside of using equivalence classes like this is that every state transition will automatically use this map to convert an arbitrary byte to its corresponding equivalence class. In practice this has a negligible impact on performance.


pub fn alphabet_len(&self) -> usize

Returns the total number of elements in the alphabet for this DFA.

That is, this returns the total number of transitions that each state in this DFA must have. Typically, a normal byte oriented DFA would always have an alphabet size of 256, corresponding to the number of unique values in a single byte. However, this implementation has two peculiarities that impact the alphabet length:

  • Every state has a special “EOI” transition that is only followed after the end of some haystack is reached. This EOI transition is necessary to account for one byte of look-ahead when implementing things like \b and $.
  • Bytes are grouped into equivalence classes such that no two bytes in the same class can distinguish a match from a non-match. For example, in the regex ^[a-z]+$, the ASCII bytes a-z could all be in the same equivalence class. This leads to a massive space savings.

Note though that the alphabet length does not necessarily equal the total stride space taken up by a single DFA state in the transition table. Namely, for performance reasons, the stride is always the smallest power of two that is greater than or equal to the alphabet length. For this reason, DFA::stride or DFA::stride2 are often more useful. The alphabet length is typically useful only for informational purposes.


pub fn stride2(&self) -> usize

Returns the total stride for every state in this DFA, expressed as the exponent of a power of 2. The stride is the amount of space each state takes up in the transition table, expressed as a number of transitions. (Unused transitions map to dead states.)

The stride of a DFA is always equivalent to the smallest power of 2 that is greater than or equal to the DFA’s alphabet length. This definition uses extra space, but permits faster translation between premultiplied state identifiers and contiguous indices (by using shifts instead of relying on integer division).

For example, if the DFA’s stride is 16 transitions, then its stride2 is 4 since 2^4 = 16.

The minimum stride2 value is 1 (corresponding to a stride of 2) while the maximum stride2 value is 9 (corresponding to a stride of 512). The maximum is not 8 since the maximum alphabet size is 257 when accounting for the special EOI transition. However, an alphabet length of that size is exceptionally rare since the alphabet is shrunk into equivalence classes.


pub fn stride(&self) -> usize

Returns the total stride for every state in this DFA. This corresponds to the total number of transitions used by each state in this DFA’s transition table.

Please see DFA::stride2 for more information. In particular, this returns the stride as the number of transitions, where as stride2 returns it as the exponent of a power of 2.


pub fn memory_usage(&self) -> usize

Returns the memory usage, in bytes, of this DFA.

The memory usage is computed based on the number of bytes used to represent this DFA.

This does not include the stack size used up by this DFA. To compute that, use std::mem::size_of::<dense::DFA>().


impl<T: AsRef<[u32]>> DFA<T>

Routines for converting a dense DFA to other representations, such as sparse DFAs or raw bytes suitable for persistent storage.


pub fn to_sparse(&self) -> Result<DFA<Vec<u8>>, BuildError>

Convert this dense DFA to a sparse DFA.

If a StateID is too small to represent all states in the sparse DFA, then this returns an error. In most cases, if a dense DFA is constructable with StateID then a sparse DFA will be as well. However, it is not guaranteed.

use regex_automata::{dfa::{Automaton, dense}, HalfMatch, Input};

let dense = dense::DFA::new("foo[0-9]+")?;
let sparse = dense.to_sparse()?;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, sparse.try_search_fwd(&Input::new("foo12345"))?);

pub fn to_bytes_little_endian(&self) -> (Vec<u8>, usize)

Serialize this DFA as raw bytes to a Vec<u8> in little endian format. Upon success, the Vec<u8> and the initial padding length are returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

The padding returned is non-zero if the returned Vec<u8> starts at an address that does not have the same alignment as u32. The padding corresponds to the number of leading bytes written to the returned Vec<u8>.


This example shows how to serialize and deserialize a DFA:

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

// N.B. We use native endianness here to make the example work, but
// using to_bytes_little_endian would work on a little endian target.
let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn to_bytes_big_endian(&self) -> (Vec<u8>, usize)

Serialize this DFA as raw bytes to a Vec<u8> in big endian format. Upon success, the Vec<u8> and the initial padding length are returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

The padding returned is non-zero if the returned Vec<u8> starts at an address that does not have the same alignment as u32. The padding corresponds to the number of leading bytes written to the returned Vec<u8>.


This example shows how to serialize and deserialize a DFA:

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

// N.B. We use native endianness here to make the example work, but
// using to_bytes_big_endian would work on a big endian target.
let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn to_bytes_native_endian(&self) -> (Vec<u8>, usize)

Serialize this DFA as raw bytes to a Vec<u8> in native endian format. Upon success, the Vec<u8> and the initial padding length are returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

The padding returned is non-zero if the returned Vec<u8> starts at an address that does not have the same alignment as u32. The padding corresponds to the number of leading bytes written to the returned Vec<u8>.

Generally speaking, native endian format should only be used when you know that the target you’re compiling the DFA for matches the endianness of the target on which you’re compiling DFA. For example, if serialization and deserialization happen in the same process or on the same machine. Otherwise, when serializing a DFA for use in a portable environment, you’ll almost certainly want to serialize both a little endian and a big endian version and then load the correct one based on the target’s configuration.


This example shows how to serialize and deserialize a DFA:

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn write_to_little_endian( &self, dst: &mut [u8] ) -> Result<usize, SerializeError>

Serialize this DFA as raw bytes to the given slice, in little endian format. Upon success, the total number of bytes written to dst is returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

Note that unlike the various to_byte_* routines, this does not write any padding. Callers are responsible for handling alignment correctly.


This returns an error if the given destination slice is not big enough to contain the full serialized DFA. If an error occurs, then nothing is written to dst.


This example shows how to serialize and deserialize a DFA without dynamic memory allocation.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

// Create a 4KB buffer on the stack to store our serialized DFA. We
// need to use a special type to force the alignment of our [u8; N]
// array to be aligned to a 4 byte boundary. Otherwise, deserializing
// the DFA may fail because of an alignment mismatch.
struct Aligned<B: ?Sized> {
    _align: [u32; 0],
    bytes: B,
let mut buf = Aligned { _align: [], bytes: [0u8; 4 * (1<<10)] };
// N.B. We use native endianness here to make the example work, but
// using write_to_little_endian would work on a little endian target.
let written = original_dfa.write_to_native_endian(&mut buf.bytes)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf.bytes[..written])?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn write_to_big_endian( &self, dst: &mut [u8] ) -> Result<usize, SerializeError>

Serialize this DFA as raw bytes to the given slice, in big endian format. Upon success, the total number of bytes written to dst is returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

Note that unlike the various to_byte_* routines, this does not write any padding. Callers are responsible for handling alignment correctly.


This returns an error if the given destination slice is not big enough to contain the full serialized DFA. If an error occurs, then nothing is written to dst.


This example shows how to serialize and deserialize a DFA without dynamic memory allocation.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

// Create a 4KB buffer on the stack to store our serialized DFA. We
// need to use a special type to force the alignment of our [u8; N]
// array to be aligned to a 4 byte boundary. Otherwise, deserializing
// the DFA may fail because of an alignment mismatch.
struct Aligned<B: ?Sized> {
    _align: [u32; 0],
    bytes: B,
let mut buf = Aligned { _align: [], bytes: [0u8; 4 * (1<<10)] };
// N.B. We use native endianness here to make the example work, but
// using write_to_big_endian would work on a big endian target.
let written = original_dfa.write_to_native_endian(&mut buf.bytes)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf.bytes[..written])?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn write_to_native_endian( &self, dst: &mut [u8] ) -> Result<usize, SerializeError>

Serialize this DFA as raw bytes to the given slice, in native endian format. Upon success, the total number of bytes written to dst is returned.

The written bytes are guaranteed to be deserialized correctly and without errors in a semver compatible release of this crate by a DFA’s deserialization APIs (assuming all other criteria for the deserialization APIs has been satisfied):

Generally speaking, native endian format should only be used when you know that the target you’re compiling the DFA for matches the endianness of the target on which you’re compiling DFA. For example, if serialization and deserialization happen in the same process or on the same machine. Otherwise, when serializing a DFA for use in a portable environment, you’ll almost certainly want to serialize both a little endian and a big endian version and then load the correct one based on the target’s configuration.

Note that unlike the various to_byte_* routines, this does not write any padding. Callers are responsible for handling alignment correctly.


This returns an error if the given destination slice is not big enough to contain the full serialized DFA. If an error occurs, then nothing is written to dst.


This example shows how to serialize and deserialize a DFA without dynamic memory allocation.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;

// Create a 4KB buffer on the stack to store our serialized DFA. We
// need to use a special type to force the alignment of our [u8; N]
// array to be aligned to a 4 byte boundary. Otherwise, deserializing
// the DFA may fail because of an alignment mismatch.
struct Aligned<B: ?Sized> {
    _align: [u32; 0],
    bytes: B,
let mut buf = Aligned { _align: [], bytes: [0u8; 4 * (1<<10)] };
let written = original_dfa.write_to_native_endian(&mut buf.bytes)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf.bytes[..written])?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

pub fn write_to_len(&self) -> usize

Return the total number of bytes required to serialize this DFA.

This is useful for determining the size of the buffer required to pass to one of the serialization routines:

Passing a buffer smaller than the size returned by this method will result in a serialization error. Serialization routines are guaranteed to succeed when the buffer is big enough.


This example shows how to dynamically allocate enough room to serialize a DFA.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

let original_dfa = DFA::new("foo[0-9]+")?;

let mut buf = vec![0; original_dfa.write_to_len()];
// This is guaranteed to succeed, because the only serialization error
// that can occur is when the provided buffer is too small. But
// write_to_len guarantees a correct size.
let written = original_dfa.write_to_native_endian(&mut buf).unwrap();
// But this is not guaranteed to succeed! In particular,
// deserialization requires proper alignment for &[u32], but our buffer
// was allocated as a &[u8] whose required alignment is smaller than
// &[u32]. However, it's likely to work in practice because of how most
// allocators work. So if you write code like this, make sure to either
// handle the error correctly and/or run it under Miri since Miri will
// likely provoke the error by returning Vec<u8> buffers with alignment
// less than &[u32].
let dfa: DFA<&[u32]> = match DFA::from_bytes(&buf[..written]) {
    // As mentioned above, it is legal for an error to be returned
    // here. It is quite difficult to get a Vec<u8> with a guaranteed
    // alignment equivalent to Vec<u32>.
    Err(_) => return Ok(()),
    Ok((dfa, _)) => dfa,

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

Note that this example isn’t actually guaranteed to work! In particular, if buf is not aligned to a 4-byte boundary, then the DFA::from_bytes call will fail. If you need this to work, then you either need to deal with adding some initial padding yourself, or use one of the to_bytes methods, which will do it for you.


impl<'a> DFA<&'a [u32]>


pub fn from_bytes( slice: &'a [u8] ) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>

Safely deserialize a DFA with a specific state identifier representation. Upon success, this returns both the deserialized DFA and the number of bytes read from the given slice. Namely, the contents of the slice beyond the DFA are not read.

Deserializing a DFA using this routine will never allocate heap memory. For safety purposes, the DFA’s transition table will be verified such that every transition points to a valid state. If this verification is too costly, then a DFA::from_bytes_unchecked API is provided, which will always execute in constant time.

The bytes given must be generated by one of the serialization APIs of a DFA using a semver compatible release of this crate. Those include:

The to_bytes methods allocate and return a Vec<u8> for you, along with handling alignment correctly. The write_to methods do not allocate and write to an existing slice (which may be on the stack). Since deserialization always uses the native endianness of the target platform, the serialization API you use should match the endianness of the target platform. (It’s often a good idea to generate serialized DFAs for both forms of endianness and then load the correct one based on endianness.)


Generally speaking, it’s easier to state the conditions in which an error is not returned. All of the following must be true:

  • The bytes given must be produced by one of the serialization APIs on this DFA, as mentioned above.
  • The endianness of the target platform matches the endianness used to serialized the provided DFA.
  • The slice given must have the same alignment as u32.

If any of the above are not true, then an error will be returned.


This routine will never panic for any input.


This example shows how to serialize a DFA to raw bytes, deserialize it and then use it for searching.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

let initial = DFA::new("foo[0-9]+")?;
let (bytes, _) = initial.to_bytes_native_endian();
let dfa: DFA<&[u32]> = DFA::from_bytes(&bytes)?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);
Example: dealing with alignment and padding

In the above example, we used the to_bytes_native_endian method to serialize a DFA, but we ignored part of its return value corresponding to padding added to the beginning of the serialized DFA. This is OK because deserialization will skip this initial padding. What matters is that the address immediately following the padding has an alignment that matches u32. That is, the following is an equivalent but alternative way to write the above example:

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

let initial = DFA::new("foo[0-9]+")?;
// Serialization returns the number of leading padding bytes added to
// the returned Vec<u8>.
let (bytes, pad) = initial.to_bytes_native_endian();
let dfa: DFA<&[u32]> = DFA::from_bytes(&bytes[pad..])?.0;

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

This padding is necessary because Rust’s standard library does not expose any safe and robust way of creating a Vec<u8> with a guaranteed alignment other than 1. Now, in practice, the underlying allocator is likely to provide a Vec<u8> that meets our alignment requirements, which means pad is zero in practice most of the time.

The purpose of exposing the padding like this is flexibility for the caller. For example, if one wants to embed a serialized DFA into a compiled program, then it’s important to guarantee that it starts at a u32-aligned address. The simplest way to do this is to discard the padding bytes and set it up so that the serialized DFA itself begins at a properly aligned address. We can show this in two parts. The first part is serializing the DFA to a file:

use regex_automata::dfa::dense::DFA;

let dfa = DFA::new("foo[0-9]+")?;

let (bytes, pad) = dfa.to_bytes_big_endian();
// Write the contents of the DFA *without* the initial padding.
std::fs::write("foo.bigendian.dfa", &bytes[pad..])?;

// Do it again, but this time for little endian.
let (bytes, pad) = dfa.to_bytes_little_endian();
std::fs::write("foo.littleendian.dfa", &bytes[pad..])?;

And now the second part is embedding the DFA into the compiled program and deserializing it at runtime on first use. We use conditional compilation to choose the correct endianness.

use regex_automata::{
    dfa::{Automaton, dense::DFA},
    util::{lazy::Lazy, wire::AlignAs},
    HalfMatch, Input,

// This crate provides its own "lazy" type, kind of like
// lazy_static! or once_cell::sync::Lazy. But it works in no-alloc
// no-std environments and let's us write this using completely
// safe code.
static RE: Lazy<DFA<&'static [u32]>> = Lazy::new(|| {
    // This assignment is made possible (implicitly) via the
    // CoerceUnsized trait. This is what guarantees that our
    // bytes are stored in memory on a 4 byte boundary. You
    // *must* do this or something equivalent for correct
    // deserialization.
    static ALIGNED: &AlignAs<[u8], u32> = &AlignAs {
        _align: [],
        #[cfg(target_endian = "big")]
        bytes: *include_bytes!("foo.bigendian.dfa"),
        #[cfg(target_endian = "little")]
        bytes: *include_bytes!("foo.littleendian.dfa"),

    let (dfa, _) = DFA::from_bytes(&ALIGNED.bytes)
        .expect("serialized DFA should be valid");

let expected = Ok(Some(HalfMatch::must(0, 8)));
assert_eq!(expected, RE.try_search_fwd(&Input::new("foo12345")));

An alternative to util::lazy::Lazy is lazy_static or once_cell, which provide stronger guarantees (like the initialization function only being executed once). And once_cell in particular provides a more expressive API. But a Lazy value from this crate is likely just fine in most circumstances.

Note that regardless of which initialization method you use, you will still need to use the AlignAs trick above to force correct alignment, but this is safe to do and from_bytes will return an error if you get it wrong.


pub unsafe fn from_bytes_unchecked( slice: &'a [u8] ) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>

Deserialize a DFA with a specific state identifier representation in constant time by omitting the verification of the validity of the transition table and other data inside the DFA.

This is just like DFA::from_bytes, except it can potentially return a DFA that exhibits undefined behavior if its transition table contains invalid state identifiers.

This routine is useful if you need to deserialize a DFA cheaply and cannot afford the transition table validation performed by from_bytes.

use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch, Input};

let initial = DFA::new("foo[0-9]+")?;
let (bytes, _) = initial.to_bytes_native_endian();
// SAFETY: This is guaranteed to be safe since the bytes given come
// directly from a compatible serialization routine.
let dfa: DFA<&[u32]> = unsafe { DFA::from_bytes_unchecked(&bytes)?.0 };

let expected = Some(HalfMatch::must(0, 8));
assert_eq!(expected, dfa.try_search_fwd(&Input::new("foo12345"))?);

impl<T: AsRef<[u32]>> Automaton for DFA<T>


fn is_special_state(&self, id: StateID) -> bool

Returns true if and only if the given identifier corresponds to a “special” state. A special state is one or more of the following: a dead state, a quit state, a match state, a start state or an accelerated state. Read more

fn is_dead_state(&self, id: StateID) -> bool

Returns true if and only if the given identifier corresponds to a dead state. When a DFA enters a dead state, it is impossible to leave. That is, every transition on a dead state by definition leads back to the same dead state. Read more

fn is_quit_state(&self, id: StateID) -> bool

Returns true if and only if the given identifier corresponds to a quit state. A quit state is like a dead state (it has no transitions other than to itself), except it indicates that the DFA failed to complete the search. When this occurs, callers can neither accept or reject that a match occurred. Read more

fn is_match_state(&self, id: StateID) -> bool

Returns true if and only if the given identifier corresponds to a match state. A match state is also referred to as a “final” state and indicates that a match has been found. Read more

fn is_start_state(&self, id: StateID) -> bool

Returns true only if the given identifier corresponds to a start state Read more

fn is_accel_state(&self, id: StateID) -> bool

Returns true if and only if the given identifier corresponds to an accelerated state. Read more

fn next_state(&self, current: StateID, input: u8) -> StateID

Transitions from the current state to the next state, given the next byte of input. Read more

unsafe fn next_state_unchecked(&self, current: StateID, byte: u8) -> StateID

Transitions from the current state to the next state, given the next byte of input. Read more

fn next_eoi_state(&self, current: StateID) -> StateID

Transitions from the current state to the next state for the special EOI symbol. Read more

fn pattern_len(&self) -> usize

Returns the total number of patterns compiled into this DFA. Read more

fn match_len(&self, id: StateID) -> usize

Returns the total number of patterns that match in this state. Read more

fn match_pattern(&self, id: StateID, match_index: usize) -> PatternID

Returns the pattern ID corresponding to the given match index in the given state. Read more

fn has_empty(&self) -> bool

Returns true if and only if this automaton can match the empty string. When it returns false, all possible matches are guaranteed to have a non-zero length. Read more

fn is_utf8(&self) -> bool

Whether UTF-8 mode is enabled for this DFA or not. Read more

fn is_always_start_anchored(&self) -> bool

Returns true if and only if this DFA is limited to returning matches whose start position is 0. Read more

fn start_state_forward(&self, input: &Input<'_>) -> Result<StateID, MatchError>

Return the ID of the start state for this lazy DFA when executing a forward search. Read more

fn start_state_reverse(&self, input: &Input<'_>) -> Result<StateID, MatchError>

Return the ID of the start state for this lazy DFA when executing a reverse search. Read more

fn universal_start_state(&self, mode: Anchored) -> Option<StateID>

If this DFA has a universal starting state for the given anchor mode and the DFA supports universal starting states, then this returns that state’s identifier. Read more

fn accelerator(&self, id: StateID) -> &[u8]

Return a slice of bytes to accelerate for the given state, if possible. Read more

fn get_prefilter(&self) -> Option<&Prefilter>

Returns the prefilter associated with a DFA, if one exists. Read more

fn try_search_fwd( &self, input: &Input<'_> ) -> Result<Option<HalfMatch>, MatchError>

Executes a forward search and returns the end position of the leftmost match that is found. If no match exists, then None is returned. Read more

fn try_search_rev( &self, input: &Input<'_> ) -> Result<Option<HalfMatch>, MatchError>

Executes a reverse search and returns the start of the position of the leftmost match that is found. If no match exists, then None is returned. Read more

fn try_search_overlapping_fwd( &self, input: &Input<'_>, state: &mut OverlappingState ) -> Result<(), MatchError>

Executes an overlapping forward search. Matches, if one exists, can be obtained via the OverlappingState::get_match method. Read more

fn try_search_overlapping_rev( &self, input: &Input<'_>, state: &mut OverlappingState ) -> Result<(), MatchError>

Executes a reverse overlapping forward search. Matches, if one exists, can be obtained via the OverlappingState::get_match method. Read more

fn try_which_overlapping_matches( &self, input: &Input<'_>, patset: &mut PatternSet ) -> Result<(), MatchError>

Writes the set of patterns that match anywhere in the given search configuration to patset. If multiple patterns match at the same position and the underlying DFA supports overlapping matches, then all matching patterns are written to the given set. Read more

impl<T: Clone> Clone for DFA<T>


fn clone(&self) -> DFA<T>

Returns a copy of the value. Read more
fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl<T: AsRef<[u32]>> Debug for DFA<T>


fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<T> RefUnwindSafe for DFA<T>
where T: RefUnwindSafe,


impl<T> Send for DFA<T>
where T: Send,


impl<T> Sync for DFA<T>
where T: Sync,


impl<T> Unpin for DFA<T>
where T: Unpin,


impl<T> UnwindSafe for DFA<T>
where T: UnwindSafe,

impl<T> Any for T
where T: 'static + ?Sized,


fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,


fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,


fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T


fn from(t: T) -> T

Returns the argument unchanged.


impl<T, U> Into<U> for T
where U: From<T>,


fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.


impl<T> ToOwned for T
where T: Clone,


type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,


type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,


type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.