eregex
An advanced regular expression engine for Rust, inspired by
mrab-regex (the Python regex
module).
eregex aims to bring the richer feature set of mrab-regex to Rust:
- Named groups, duplicate group names, repeated captures
- Greedy / lazy / possessive quantifiers
- Atomic groups
(?>...) - Variable-length lookbehind
- Nested character-class set operations
[a-z&&[^aeiou]](planned) - Inline, scoped flags
(?flags-flags:...) - Backreferences
\1,\g<name>,(?P=name) - Unicode properties
\p{L},\P{^N}(subset) - Fuzzy / approximate matching
(?:foo){e<=1}(planned) - Recursive patterns
(?R),(?(DEFINE)...)(planned) - Partial matches, POSIX matching, reverse search (planned)
This crate currently implements a strong foundation. See Feature status below for what is ready today and what is on the roadmap.
Quick start
use ;
let re = new.unwrap;
let m = re.find.unwrap;
assert_eq!;
assert_eq!;
let re = new_with_flags.unwrap;
assert!;
// Repeated captures (signature mrab-regex feature)
let re = new.unwrap;
let m = re.find.unwrap;
assert_eq!;
// Partial matching: is the input a prefix of some full match?
let re = new.unwrap;
// "token=abc" is incomplete — more input could turn it into a full match.
let p = re.find_partial.unwrap;
assert!;
assert_eq!;
// Group 1 fully matched, group 2 is still empty/partial.
assert_eq!;
assert_eq!;
// A wrong character rules out any continuation -> no match at all.
assert!;
Feature status
Implemented
- Literals,
., anchors^ $ \A \z \b \B - Predefined classes
\d \D \w \W \s \S(ASCII + Unicode viastd) - Character classes
[...]with ranges, negation, escapes - Alternation
a|b|c - Quantifiers
* + ? {m} {m,} {m,n}with greedy?-lazy and+-possessive - Capturing / non-capturing / named groups
(...) (?:...) (?P<n>...) (?<n>...) - Atomic groups
(?>...) - Backreferences
\1 \g<n> \g<name> (?P=name) - Lookahead / lookbehind
(?=...) (?!...) (?<=...) (?<!...)(variable length) - Partial (end-anchored) matching via
find_partial - Inline scoped flags
(?i) (?i:...) (?i-m:...) - Inline comments
(?#...)and free-spacing (VERBOSE) - Named & unicode properties
\p{...}(a curated subset) - Repeated captures (
captures,captures_iter) is_match,find,find_at,find_iter,find_partial,captures,captures_iterreplace,replace_allwith$1/${name}/$$templatessplit,split_iterescape
Roadmap (signature mrab-regex features)
- Fuzzy / approximate matching
{e<=2} - Recursive patterns & subexpression calls
(?R) (?1) (?&name) (?(DEFINE)...) - Branch reset
(?|...|...) - Nested set operations
[a&&b] [a--b] [a||b] [a~~b] - Full Unicode case-folding (ß ↔ ss); currently simple casefolding
\K,(*PRUNE),(*SKIP),(*FAIL),\Gsemantics- POSIX (
leftmost-longest) and reverse ((?r)) matching modes - Concurrent/GIL-free operation, timeouts
\L<name>named lists
Core concepts
Regex— a compiled pattern. Compile once withRegex::new(orRegex::new_with_flags), then search many inputs.Match— a successful full match, with group lookup by index or name and full repeated- capture history.PartialMatch— the result ofRegex::find_partial, carrying aMatchStatusofFullorPartialand per-groupGroupMatchstate.Flagsand theflagsmodule — compile-time flags (IGNORECASE,MULTILINE,DOTALL, …) and their inline(?im)syntax.
Error handling
All fallible operations return Result<T, Error>.
Error carries
an ErrorKind
(syntax error, bad escape, bad quantifier, unknown group, …) plus the byte
offset in the pattern where the problem was detected, when known.
use Regex;
let err = new.unwrap_err;
println!; // e.g. "eregex error at position 1: unclosed group"
Examples
The examples/ directory contains runnable programs:
demo.rs— a tour of the core API.gap_match.rs— gap-tolerant ("fuzzy") matching built onfind_at+find_partial, for inputs where the target is split by noise (a workaround while in-pattern fuzzy matching is on the roadmap).
Run them with cargo run --example demo / cargo run --example gap_match.
Language bindings
The Rust core is wrapped by three companion crates, each under crates/:
| package | technology | install |
|---|---|---|
eregex (npm) |
napi-rs (native addon) |
npm i eregex |
eregex-wasm (npm) |
wasm-bindgen / wasm-pack |
npm i eregex-wasm |
eregex (PyPI) |
pyo3 / maturin |
pip install eregex |
The Node and WASM packages expose the same JavaScript API (Regex,
Match, PartialMatch, the flag constants, parseFlags, …) and the same
null-on-absent semantics, so they are interchangeable: pick the native
build for raw speed, or the WASM build for a single portable binary that can
also be rebuilt for bundlers / browsers (wasm-pack build --target web).
Development
A shared pre-commit hook runs cargo fmt --all --check and
cargo test --workspace before each commit. Enable it once per clone:
Bypass it for a single commit with git commit --no-verify.
Compatibility
- MSRV: 1.85 (uses the 2024 edition).
- License: Apache-2.0.
#![forbid(unsafe_code)]is enforced crate-wide.
License
Apache-2.0, matching the upstream mrab-regex project.