Skip to main content

Crate eregex

Crate eregex 

Source
Expand description

eregex — an advanced regular expression engine for Rust.

The full overview, quick-start guide, feature matrix and roadmap live in the README, which is also rendered as this crate’s documentation landing page (see below). For navigating the API, start with Regex for matching and Match for results, then see PartialMatch for end-anchored partial matching and flags for compile-time flags.

§eregex

An advanced regular expression engine for Rust, inspired by mrab-regex (the Python regex module).

eregex aims to bring the richer feature set of mrab-regex to Rust:

  • Named groups, duplicate group names, repeated captures
  • Greedy / lazy / possessive quantifiers
  • Atomic groups (?>...)
  • Variable-length lookbehind
  • Nested character-class set operations [a-z&&[^aeiou]] (planned)
  • Inline, scoped flags (?flags-flags:...)
  • Backreferences \1, \g<name>, (?P=name)
  • Unicode properties \p{L}, \P{^N} (subset)
  • Fuzzy / approximate matching (?:foo){e<=1} (planned)
  • Recursive patterns (?R), (?(DEFINE)...) (planned)
  • Partial matches, POSIX matching, reverse search (planned)

This crate currently implements a strong foundation. See Feature status below for what is ready today and what is on the roadmap.

§Quick start

use eregex::{Regex, flags};

let re = Regex::new(r"(\w+)\s+(\w+)").unwrap();
let m = re.find("hello world").unwrap();
assert_eq!(m.group(1), Some("hello"));
assert_eq!(m.group(2), Some("world"));

let re = Regex::new_with_flags(r"(?i)hello", flags::IGNORECASE).unwrap();
assert!(re.is_match("HELLO, World"));

// Repeated captures (signature mrab-regex feature)
let re = Regex::new(r"(\w)+").unwrap();
let m = re.find("abc").unwrap();
assert_eq!(m.captures(1), vec![Some("a"), Some("b"), Some("c")]);

// Partial matching: is the input a prefix of some full match?
let re = Regex::new(r"token=([a-z]+)([0-9]+)").unwrap();
// "token=abc" is incomplete — more input could turn it into a full match.
let p = re.find_partial("xxx token=abc").unwrap();
assert!(p.is_partial());
assert_eq!(p.matched, "token=abc");
// Group 1 fully matched, group 2 is still empty/partial.
assert_eq!(p.group(1), Some("abc"));
assert_eq!(p.group(2), Some(""));
// A wrong character rules out any continuation -> no match at all.
assert!(re.find_partial("xxx token=abc!").is_none());

§Feature status

§Implemented

  • Literals, ., anchors ^ $ \A \z \b \B
  • Predefined classes \d \D \w \W \s \S (ASCII + Unicode via std)
  • Character classes [...] with ranges, negation, escapes
  • Alternation a|b|c
  • Quantifiers * + ? {m} {m,} {m,n} with greedy ?-lazy and +-possessive
  • Capturing / non-capturing / named groups (...) (?:...) (?P<n>...) (?<n>...)
  • Atomic groups (?>...)
  • Backreferences \1 \g<n> \g<name> (?P=name)
  • Lookahead / lookbehind (?=...) (?!...) (?<=...) (?<!...) (variable length)
  • Partial (end-anchored) matching via find_partial
  • Inline scoped flags (?i) (?i:...) (?i-m:...)
  • Inline comments (?#...) and free-spacing (VERBOSE)
  • Named & unicode properties \p{...} (a curated subset)
  • Repeated captures (captures, captures_iter)
  • is_match, find, find_at, find_iter, find_partial, captures, captures_iter
  • replace, replace_all with $1 / ${name} / $$ templates
  • split, split_iter
  • escape

§Roadmap (signature mrab-regex features)

  • Fuzzy / approximate matching {e<=2}
  • Recursive patterns & subexpression calls (?R) (?1) (?&name) (?(DEFINE)...)
  • Branch reset (?|...|...)
  • Nested set operations [a&&b] [a--b] [a||b] [a~~b]
  • Full Unicode case-folding (ß ↔ ss); currently simple casefolding
  • \K, (*PRUNE), (*SKIP), (*FAIL), \G semantics
  • POSIX (leftmost-longest) and reverse ((?r)) matching modes
  • Concurrent/GIL-free operation, timeouts
  • \L<name> named lists

§Core concepts

§Error handling

All fallible operations return Result<T, Error>. Error carries an ErrorKind (syntax error, bad escape, bad quantifier, unknown group, …) plus the byte offset in the pattern where the problem was detected, when known.

use eregex::Regex;

let err = Regex::new(r"(").unwrap_err();
println!("{}", err); // e.g. "eregex error at position 1: unclosed group"

§Examples

The examples/ directory contains runnable programs:

  • demo.rs — a tour of the core API.
  • gap_match.rs — gap-tolerant (“fuzzy”) matching built on find_at + find_partial, for inputs where the target is split by noise (a workaround while in-pattern fuzzy matching is on the roadmap).

Run them with cargo run --example demo / cargo run --example gap_match.

§Language bindings

The Rust core is wrapped by three companion crates, each under crates/:

packagetechnologyinstall
eregex (npm)napi-rs (native addon)npm i eregex
eregex-wasm (npm)wasm-bindgen / wasm-packnpm i eregex-wasm
eregex (PyPI)pyo3 / maturinpip install eregex

The Node and WASM packages expose the same JavaScript API (Regex, Match, PartialMatch, the flag constants, parseFlags, …) and the same null-on-absent semantics, so they are interchangeable: pick the native build for raw speed, or the WASM build for a single portable binary that can also be rebuilt for bundlers / browsers (wasm-pack build --target web).

§Development

A shared pre-commit hook runs cargo fmt --all --check and cargo test --workspace before each commit. Enable it once per clone:

git config core.hooksPath .githooks

Bypass it for a single commit with git commit --no-verify.

§Compatibility

  • MSRV: 1.85 (uses the 2024 edition).
  • License: Apache-2.0.
  • #![forbid(unsafe_code)] is enforced crate-wide.

§License

Apache-2.0, matching the upstream mrab-regex project.

Re-exports§

pub use error::Error;
pub use error::Result;
pub use escape::escape;
pub use escape::escape_literal_spaces;
pub use escape::escape_special_only;
pub use flags::Flags;

Modules§

charset
A compact representation of character sets as sorted, disjoint codepoint ranges.
error
Error types for pattern parsing and matching.
escape
Pattern / literal escaping (mrab-regex’s regex.escape).
flags
Pattern flags.
matcher
The backtracking matching engine.
unicode
Character-property helpers built on top of std’s built-in Unicode tables.

Structs§

FindIter
Iterator over non-overlapping matches of a Regex.
Match
A successful match, carrying the full capture state.
PartialMatch
A partial (or full) match produced by Regex::find_partial.
Regex
A compiled regular expression.

Enums§

GroupMatch
The state of a single group within a PartialMatch.
MatchStatus
The outcome kind of a Regex::find_partial attempt. NoMatch is represented by Option::<PartialMatch>::None.

Functions§

find
Search for the first match of pattern in haystack.
find_all
Collect every non-overlapping match of pattern in haystack.
is_match
Returns true if pattern matches anywhere in haystack.
new
Compile a pattern with the default flags.
new_with_flags
Compile a pattern with the given flags.
replace
Replace the first match of pattern in haystack using the template repl ($1, ${name}, $$).
replace_all
Replace all non-overlapping matches of pattern in haystack.
split
Split haystack by pattern, returning the parts.

Type Aliases§

CaptureMatches
Iterator that yields Match objects with full capture state (an alias of FindIter in this implementation, since matches always carry captures).