Crate brzozowski_regex

Source
Expand description

A regular expression library over sequences of arbitrary symbols, based on the regular expression derivatives of Brzozowski (1964). The implementation is is heavily based on the description by Owens et al (2009).

Regular expressions can be defined over any alphabet type that implements the traits Clone, Eq, Hash, and Ord. The following constructs are supported:

  • The empty set
  • The empty string ε
  • A symbol s
  • Concatenation R S
  • Closure R*
  • Disjunction R | S
  • Conjunction R & R
  • Complement ¬R

§Constructing regular expressions

Regular expressions can be constructed using builder methods or, more conveniently, using short-hands.

An example using builder methods:

use brzozowski_regex::Regex;

let r = Regex::and(
    Regex::complement(Regex::empty_set()),
    Regex::concat(
        Regex::concat(
          Regex::closure(Regex::symbol(5)),
          Regex::symbol(7),
        ),
        Regex::or(Regex::empty_string(), Regex::symbol(11)),
    ),
);

Constructing regular expression susing these builder methods can be quite verbose. A set of short-hands is available to write regular expression literals more compactly:

  • EXPR.s() builds a symbol from the expressions value.
  • ().r() builds an empty set.
  • [ REGEX* ].r() builds a concatenation from the given regular expressions. If the array is empty, it builds the empty string.
  • REGEX.c() build the closure of the given regular expression.

Additionally, the following operations are overloaded for regular expressions:

  • REGEX + REGEX builds the concatenation of the two given regular expressions.
  • REGEX | REGEX builds the logical or of the two given regular expressions.
  • REGEX & REGEX builds the logical and of the two given regular expressions.
  • !REGEX builds the complement of the given regular expression.

The same example using short-hands and operators:

use brzozowski_regex::syntax::*; // imports short-hands
use brzozowski_regex::Regex;

let r = !().r() & [ 5.s().c(), 7.s(), ([].r() | 11.s()) ].r();

§Matching with automata

The recommended way to match regular expressions is to turn them into an automaton. One can either match a whole sequence of symbols at once against the automaton, or create a matcher for incremental matching.

Matching a whole sequence at once works as follows:

use brzozowski_regex::syntax::*; // import short-hands
use brzozowski_regex::Regex;

let r = !().r() & [ 5.s().c(), 7.s(), ([].r() | 11.s()) ].r();
let a = r.to_automaton();
assert!(a.matches([5, 5, 7, 11]));

Using a matcher allows for incremental matching. After matching a symbol or sequence of symbols the matcher returns one of the following results:

  • accept: the sequence until now matches the regular expression
  • may-accept: the sequence is not a match, but adding more symbols may lead to an accept.
  • reject: the sequence is not a match, and adding more symbols will not change that.

The matcher can also return the current residual regular expression.

A matcher is used as follows:

use brzozowski_regex::syntax::*; // import short-hands
use brzozowski_regex::Acceptance;
use brzozowski_regex::Regex;

let r = !().r() & [ 5.s().c(), 7.s(), ([].r() | 11.s()) ].r();
let a = r.to_automaton();
let mut m = a.to_matcher();
assert_eq!(Acceptance::MayAccept, m.next(&5));
assert_eq!(Acceptance::Accept, m.next_iter([5, 7]));
assert_eq!(&(11.s() | [].r()), m.regex());
assert_eq!(Acceptance::Reject, m.next(&13));

§Directly working with derivatives

It is also possible to run operations on regular expressions directly, such as testing for nullability, computing the derivative w.r.t. a symbol or a sequence of symbols, and matching a sequence of symbols. Note that matching directly against regular expression value is very inefficient compared to constructing an automaton first!

Examples of operation on regular expression values:

use brzozowski_regex::syntax::*; // import short-hands
use brzozowski_regex::Regex;

let r = [7.s(), 11.s().c()].r();
assert!(r.matches([7, 11]));

let s = r.derive(&7);
assert!(s.is_nullable());

let s = s.derive_iter([11, 13]);
assert!(!s.is_nullable());

§References

Modules§

builder
Regular expression data type and builders.
syntax
Short-hand syntax for writing regular expression literals.

Structs§

FiniteAutomaton
Finite automaton over the given alphabet.
Matcher
Type implementing incremental matching against a finite automaton.

Enums§

Acceptance
Type describing acceptance of a matched sequence of symbols.

Traits§

Alphabet
Trait describing types that can be used as an alphabet for regular expression queries.

Type Aliases§

Regex
Regular expressions over symbols of the given alphabet.