Struct regex_automata::dfa::regex::Regex

source ·
pub struct Regex<A = DFA<Vec<u32>>> { /* private fields */ }
Expand description

A regular expression that uses deterministic finite automata for fast searching.

A regular expression is comprised of two DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

The type of the DFA used by a Regex corresponds to the A type parameter, which must satisfy the Automaton trait. Typically, A is either a dense::DFA or a sparse::DFA, where dense DFAs use more memory but search faster, while sparse DFAs use less memory but search more slowly.

Crate features

Note that despite what the documentation auto-generates, the only crate feature needed to use this type is dfa-search. You do not need to enable the alloc feature.

By default, a regex’s automaton type parameter is set to dense::DFA<Vec<u32>> when the alloc feature is enabled. For most in-memory work loads, this is the most convenient type that gives the best search performance. When the alloc feature is disabled, no default type is used.

When should I use this?

Generally speaking, if you can afford the overhead of building a full DFA for your regex, and you don’t need things like capturing groups, then this is a good choice if you’re looking to optimize for matching speed. Note however that its speed may be worse than a general purpose regex engine if you don’t provide a dense::Config::prefilter to the underlying DFA.

Sparse DFAs

Since a Regex is generic over the Automaton trait, it can be used with any kind of DFA. While this crate constructs dense DFAs by default, it is easy enough to build corresponding sparse DFAs, and then build a regex from them:

use regex_automata::dfa::regex::Regex;

// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;

// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;

// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::builder().build_from_dfas(fwd, rev);

// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));

Alternatively, one can use a Builder to construct a sparse DFA more succinctly. (Note though that dense DFAs are still constructed first internally, and then converted to sparse DFAs, as in the example above.)

use regex_automata::dfa::regex::Regex;

let sparse_re = Regex::builder().build_sparse(r"foo[0-9]+")?;
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert!(sparse_re.is_match(b"foo123"));

Fallibility

Most of the search routines defined on this type will panic when the underlying search fails. This might be because the DFA gave up because it saw a quit byte, whether configured explicitly or via heuristic Unicode word boundary support, although neither are enabled by default. Or it might fail because an invalid Input configuration is given, for example, with an unsupported Anchored mode.

If you need to handle these error cases instead of allowing them to trigger a panic, then the lower level Regex::try_search provides a fallible API that never panics.

Example

This example shows how to cause a search to terminate if it sees a \n byte, and handle the error returned. This could be useful if, for example, you wanted to prevent a user supplied pattern from matching across a line boundary.

use regex_automata::{dfa::{self, regex::Regex}, Input, MatchError};

let re = Regex::builder()
    .dense(dfa::dense::Config::new().quit(b'\n', true))
    .build(r"foo\p{any}+bar")?;

let input = Input::new("foo\nbar");
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::quit(b'\n', 3);
let got = re.try_search(&input).unwrap_err();
assert_eq!(expected, got);

Implementations§

source§

impl Regex

source

pub fn new(pattern: &str) -> Result<Regex, BuildError>

Parse the given regular expression using the default configuration and return the corresponding regex.

If you want a non-default configuration, then use the Builder to set your own configuration.

Example
use regex_automata::{Match, dfa::regex::Regex};

let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(
    Some(Match::must(0, 3..14)),
    re.find(b"zzzfoo12345barzzz"),
);
source

pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, BuildError>

Like new, but parses multiple patterns into a single “regex set.” This similarly uses the default regex configuration.

Example
use regex_automata::{Match, dfa::regex::Regex};

let re = Regex::new_many(&["[a-z]+", "[0-9]+"])?;

let mut it = re.find_iter(b"abc 1 foo 4567 0 quux");
assert_eq!(Some(Match::must(0, 0..3)), it.next());
assert_eq!(Some(Match::must(1, 4..5)), it.next());
assert_eq!(Some(Match::must(0, 6..9)), it.next());
assert_eq!(Some(Match::must(1, 10..14)), it.next());
assert_eq!(Some(Match::must(1, 15..16)), it.next());
assert_eq!(Some(Match::must(0, 17..21)), it.next());
assert_eq!(None, it.next());
source§

impl Regex<DFA<Vec<u8>>>

source

pub fn new_sparse(pattern: &str) -> Result<Regex<DFA<Vec<u8>>>, BuildError>

Parse the given regular expression using the default configuration, except using sparse DFAs, and return the corresponding regex.

If you want a non-default configuration, then use the Builder to set your own configuration.

Example
use regex_automata::{Match, dfa::regex::Regex};

let re = Regex::new_sparse("foo[0-9]+bar")?;
assert_eq!(
    Some(Match::must(0, 3..14)),
    re.find(b"zzzfoo12345barzzz"),
);
source

pub fn new_many_sparse<P: AsRef<str>>( patterns: &[P] ) -> Result<Regex<DFA<Vec<u8>>>, BuildError>

Like new, but parses multiple patterns into a single “regex set” using sparse DFAs. This otherwise similarly uses the default regex configuration.

Example
use regex_automata::{Match, dfa::regex::Regex};

let re = Regex::new_many_sparse(&["[a-z]+", "[0-9]+"])?;

let mut it = re.find_iter(b"abc 1 foo 4567 0 quux");
assert_eq!(Some(Match::must(0, 0..3)), it.next());
assert_eq!(Some(Match::must(1, 4..5)), it.next());
assert_eq!(Some(Match::must(0, 6..9)), it.next());
assert_eq!(Some(Match::must(1, 10..14)), it.next());
assert_eq!(Some(Match::must(1, 15..16)), it.next());
assert_eq!(Some(Match::must(0, 17..21)), it.next());
assert_eq!(None, it.next());
source§

impl Regex<DFA<&'static [u32]>>

Convenience routines for regex construction.

source

pub fn builder() -> Builder

Return a builder for configuring the construction of a Regex.

This is a convenience routine to avoid needing to import the Builder type in common cases.

Example

This example shows how to use the builder to disable UTF-8 mode everywhere.

use regex_automata::{
    dfa::regex::Regex, nfa::thompson, util::syntax, Match,
};

let re = Regex::builder()
    .syntax(syntax::Config::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(haystack);
assert_eq!(expected, got);
source§

impl<A: Automaton> Regex<A>

Standard search routines for finding and iterating over matches.

source

pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool

Returns true if and only if this regex matches the given haystack.

This routine may short circuit if it knows that scanning future input will never lead to a different result. In particular, if the underlying DFA enters a match state or a dead state, then this routine will return true or false, respectively, without inspecting any future input.

Panics

This routine panics if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search panics, callers cannot know whether a match exists or not.

Use Regex::try_search if you want to handle these error conditions.

Example
use regex_automata::dfa::regex::Regex;

let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(true, re.is_match("foo12345bar"));
assert_eq!(false, re.is_match("foobar"));
source

pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>

Returns the start and end offset of the leftmost match. If no match exists, then None is returned.

Panics

This routine panics if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search panics, callers cannot know whether a match exists or not.

Use Regex::try_search if you want to handle these error conditions.

Example
use regex_automata::{Match, dfa::regex::Regex};

// Greediness is applied appropriately.
let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(Match::must(0, 3..11)), re.find("zzzfoo12345zzz"));

// Even though a match is found after reading the first byte (`a`),
// the default leftmost-first match semantics demand that we find the
// earliest match that prefers earlier parts of the pattern over latter
// parts.
let re = Regex::new("abc|a")?;
assert_eq!(Some(Match::must(0, 0..3)), re.find("abc"));
source

pub fn find_iter<'r, 'h, I: Into<Input<'h>>>( &'r self, input: I ) -> FindMatches<'r, 'h, A>

Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.

This corresponds to the “standard” regex search iterator.

Panics

If the search returns an error during iteration, then iteration panics. See Regex::find for the panic conditions.

Use Regex::try_search with util::iter::Searcher if you want to handle these error conditions.

Example
use regex_automata::{Match, dfa::regex::Regex};

let re = Regex::new("foo[0-9]+")?;
let text = "foo1 foo12 foo123";
let matches: Vec<Match> = re.find_iter(text).collect();
assert_eq!(matches, vec![
    Match::must(0, 0..4),
    Match::must(0, 5..10),
    Match::must(0, 11..17),
]);
source§

impl<A: Automaton> Regex<A>

Lower level fallible search routines that permit controlling where the search starts and ends in a particular sequence.

Returns the start and end offset of the leftmost match. If no match exists, then None is returned.

This is like Regex::find but with two differences:

  1. It is not generic over Into<Input> and instead accepts a &Input. This permits reusing the same Input for multiple searches without needing to create a new one. This may help with latency.
  2. It returns an error if the search could not complete where as Regex::find will panic.
Errors

This routine errors if the search could not complete. This can occur in the following circumstances:

  • The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search returns an error, callers cannot know whether a match exists or not.

source§

impl<A: Automaton> Regex<A>

Non-search APIs for querying information about the regex and setting a prefilter.

source

pub fn forward(&self) -> &A

Return the underlying DFA responsible for forward matching.

This is useful for accessing the underlying DFA and converting it to some other format or size. See the Builder::build_from_dfas docs for an example of where this might be useful.

source

pub fn reverse(&self) -> &A

Return the underlying DFA responsible for reverse matching.

This is useful for accessing the underlying DFA and converting it to some other format or size. See the Builder::build_from_dfas docs for an example of where this might be useful.

source

pub fn pattern_len(&self) -> usize

Returns the total number of patterns matched by this regex.

Example
use regex_automata::dfa::regex::Regex;

let re = Regex::new_many(&[r"[a-z]+", r"[0-9]+", r"\w+"])?;
assert_eq!(3, re.pattern_len());

Trait Implementations§

source§

impl<A: Clone> Clone for Regex<A>

source§

fn clone(&self) -> Regex<A>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<A: Debug> Debug for Regex<A>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<A> RefUnwindSafe for Regex<A>
where A: RefUnwindSafe,

§

impl<A> Send for Regex<A>
where A: Send,

§

impl<A> Sync for Regex<A>
where A: Sync,

§

impl<A> Unpin for Regex<A>
where A: Unpin,

§

impl<A> UnwindSafe for Regex<A>
where A: UnwindSafe,

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.