Expand description
eregex — an advanced regular expression engine for Rust.
The full overview, quick-start guide, feature matrix and roadmap live in
the README, which is
also rendered as this crate’s documentation landing page (see below). For
navigating the API, start with Regex for matching and Match for
results, then see PartialMatch for end-anchored partial matching and
flags for compile-time flags.
§eregex
An advanced regular expression engine for Rust, inspired by
mrab-regex (the Python regex
module).
eregex aims to bring the richer feature set of mrab-regex to Rust:
- Named groups, duplicate group names, repeated captures
- Greedy / lazy / possessive quantifiers
- Atomic groups
(?>...) - Variable-length lookbehind
- Nested character-class set operations
[a-z&&[^aeiou]](planned) - Inline, scoped flags
(?flags-flags:...) - Backreferences
\1,\g<name>,(?P=name) - Unicode properties
\p{L},\P{^N}(subset) - Fuzzy / approximate matching
(?:foo){e<=1}(planned) - Recursive patterns
(?R),(?(DEFINE)...)(planned) - Partial matches, POSIX matching, reverse search (planned)
This crate currently implements a strong foundation. See Feature status below for what is ready today and what is on the roadmap.
§Quick start
use eregex::{Regex, flags};
let re = Regex::new(r"(\w+)\s+(\w+)").unwrap();
let m = re.find("hello world").unwrap();
assert_eq!(m.group(1), Some("hello"));
assert_eq!(m.group(2), Some("world"));
let re = Regex::new_with_flags(r"(?i)hello", flags::IGNORECASE).unwrap();
assert!(re.is_match("HELLO, World"));
// Repeated captures (signature mrab-regex feature)
let re = Regex::new(r"(\w)+").unwrap();
let m = re.find("abc").unwrap();
assert_eq!(m.captures(1), vec![Some("a"), Some("b"), Some("c")]);
// Partial matching: is the input a prefix of some full match?
let re = Regex::new(r"token=([a-z]+)([0-9]+)").unwrap();
// "token=abc" is incomplete — more input could turn it into a full match.
let p = re.find_partial("xxx token=abc").unwrap();
assert!(p.is_partial());
assert_eq!(p.matched, "token=abc");
// Group 1 fully matched, group 2 is still empty/partial.
assert_eq!(p.group(1), Some("abc"));
assert_eq!(p.group(2), Some(""));
// A wrong character rules out any continuation -> no match at all.
assert!(re.find_partial("xxx token=abc!").is_none());§Feature status
§Implemented
- Literals,
., anchors^ $ \A \z \b \B - Predefined classes
\d \D \w \W \s \S(ASCII + Unicode viastd) - Character classes
[...]with ranges, negation, escapes - Alternation
a|b|c - Quantifiers
* + ? {m} {m,} {m,n}with greedy?-lazy and+-possessive - Capturing / non-capturing / named groups
(...) (?:...) (?P<n>...) (?<n>...) - Atomic groups
(?>...) - Backreferences
\1 \g<n> \g<name> (?P=name) - Lookahead / lookbehind
(?=...) (?!...) (?<=...) (?<!...)(variable length) - Partial (end-anchored) matching via
find_partial - Inline scoped flags
(?i) (?i:...) (?i-m:...) - Inline comments
(?#...)and free-spacing (VERBOSE) - Named & unicode properties
\p{...}(a curated subset) - Repeated captures (
captures,captures_iter) is_match,find,find_at,find_iter,find_partial,captures,captures_iterreplace,replace_allwith$1/${name}/$$templatessplit,split_iterescape
§Roadmap (signature mrab-regex features)
- Fuzzy / approximate matching
{e<=2} - Recursive patterns & subexpression calls
(?R) (?1) (?&name) (?(DEFINE)...) - Branch reset
(?|...|...) - Nested set operations
[a&&b] [a--b] [a||b] [a~~b] - Full Unicode case-folding (ß ↔ ss); currently simple casefolding
\K,(*PRUNE),(*SKIP),(*FAIL),\Gsemantics- POSIX (
leftmost-longest) and reverse ((?r)) matching modes - Concurrent/GIL-free operation, timeouts
\L<name>named lists
§Core concepts
Regex— a compiled pattern. Compile once withRegex::new(orRegex::new_with_flags), then search many inputs.Match— a successful full match, with group lookup by index or name and full repeated- capture history.PartialMatch— the result ofRegex::find_partial, carrying aMatchStatusofFullorPartialand per-groupGroupMatchstate.Flagsand theflagsmodule — compile-time flags (IGNORECASE,MULTILINE,DOTALL, …) and their inline(?im)syntax.
§Error handling
All fallible operations return Result<T, Error>.
Error carries
an ErrorKind
(syntax error, bad escape, bad quantifier, unknown group, …) plus the byte
offset in the pattern where the problem was detected, when known.
use eregex::Regex;
let err = Regex::new(r"(").unwrap_err();
println!("{}", err); // e.g. "eregex error at position 1: unclosed group"§Examples
The examples/ directory contains runnable programs:
demo.rs— a tour of the core API.gap_match.rs— gap-tolerant (“fuzzy”) matching built onfind_at+find_partial, for inputs where the target is split by noise (a workaround while in-pattern fuzzy matching is on the roadmap).
Run them with cargo run --example demo / cargo run --example gap_match.
§Language bindings
The Rust core is wrapped by three companion crates, each under crates/:
| package | technology | install |
|---|---|---|
eregex (npm) | napi-rs (native addon) | npm i eregex |
eregex-wasm (npm) | wasm-bindgen / wasm-pack | npm i eregex-wasm |
eregex (PyPI) | pyo3 / maturin | pip install eregex |
The Node and WASM packages expose the same JavaScript API (Regex,
Match, PartialMatch, the flag constants, parseFlags, …) and the same
null-on-absent semantics, so they are interchangeable: pick the native
build for raw speed, or the WASM build for a single portable binary that can
also be rebuilt for bundlers / browsers (wasm-pack build --target web).
§Development
A shared pre-commit hook runs cargo fmt --all --check and
cargo test --workspace before each commit. Enable it once per clone:
git config core.hooksPath .githooksBypass it for a single commit with git commit --no-verify.
§Compatibility
- MSRV: 1.85 (uses the 2024 edition).
- License: Apache-2.0.
#![forbid(unsafe_code)]is enforced crate-wide.
§License
Apache-2.0, matching the upstream mrab-regex project.
Re-exports§
pub use error::Error;pub use error::Result;pub use escape::escape;pub use escape::escape_literal_spaces;pub use escape::escape_special_only;pub use flags::Flags;
Modules§
- charset
- A compact representation of character sets as sorted, disjoint codepoint ranges.
- error
- Error types for pattern parsing and matching.
- escape
- Pattern / literal escaping (mrab-regex’s
regex.escape). - flags
- Pattern flags.
- matcher
- The backtracking matching engine.
- unicode
- Character-property helpers built on top of
std’s built-in Unicode tables.
Structs§
- Find
Iter - Iterator over non-overlapping matches of a
Regex. - Match
- A successful match, carrying the full capture state.
- Partial
Match - A partial (or full) match produced by
Regex::find_partial. - Regex
- A compiled regular expression.
Enums§
- Group
Match - The state of a single group within a
PartialMatch. - Match
Status - The outcome kind of a
Regex::find_partialattempt.NoMatchis represented byOption::<PartialMatch>::None.
Functions§
- find
- Search for the first match of
patterninhaystack. - find_
all - Collect every non-overlapping match of
patterninhaystack. - is_
match - Returns
trueifpatternmatches anywhere inhaystack. - new
- Compile a pattern with the default flags.
- new_
with_ flags - Compile a pattern with the given flags.
- replace
- Replace the first match of
patterninhaystackusing the templaterepl($1,${name},$$). - replace_
all - Replace all non-overlapping matches of
patterninhaystack. - split
- Split
haystackbypattern, returning the parts.
Type Aliases§
- Capture
Matches - Iterator that yields
Matchobjects with full capture state (an alias ofFindIterin this implementation, since matches always carry captures).