eregex 0.1.3

An advanced regular expression engine for Rust, inspired by mrab-regex
Documentation
  • Coverage
  • 100%
    119 out of 119 items documented6 out of 57 items with examples
  • Size
  • Source code size: 270.32 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.46 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 4s Average build duration of successful builds.
  • all releases: 5s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • a5i/eregex
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • a5i

eregex

An advanced regular expression engine for Rust, inspired by mrab-regex (the Python regex module).

eregex aims to bring the richer feature set of mrab-regex to Rust:

  • Named groups, duplicate group names, repeated captures
  • Greedy / lazy / possessive quantifiers
  • Atomic groups (?>...)
  • Variable-length lookbehind
  • Nested character-class set operations [a-z&&[^aeiou]] (planned)
  • Inline, scoped flags (?flags-flags:...)
  • Backreferences \1, \g<name>, (?P=name)
  • Unicode properties \p{L}, \P{^N} (subset)
  • Fuzzy / approximate matching (?:foo){e<=1} (planned)
  • Recursive patterns (?R), (?(DEFINE)...) (planned)
  • Partial matches, POSIX matching, reverse search (planned)

This crate currently implements a strong foundation. See Feature status below for what is ready today and what is on the roadmap.

Quick start

use eregex::{Regex, flags};

let re = Regex::new(r"(\w+)\s+(\w+)").unwrap();
let m = re.find("hello world").unwrap();
assert_eq!(m.group(1), Some("hello"));
assert_eq!(m.group(2), Some("world"));

let re = Regex::new_with_flags(r"(?i)hello", flags::IGNORECASE).unwrap();
assert!(re.is_match("HELLO, World"));

// Repeated captures (signature mrab-regex feature)
let re = Regex::new(r"(\w)+").unwrap();
let m = re.find("abc").unwrap();
assert_eq!(m.captures(1), vec![Some("a"), Some("b"), Some("c")]);

// Partial matching: is the input a prefix of some full match?
let re = Regex::new(r"token=([a-z]+)([0-9]+)").unwrap();
// "token=abc" is incomplete — more input could turn it into a full match.
let p = re.find_partial("xxx token=abc").unwrap();
assert!(p.is_partial());
assert_eq!(p.matched, "token=abc");
// Group 1 fully matched, group 2 is still empty/partial.
assert_eq!(p.group(1), Some("abc"));
assert_eq!(p.group(2), Some(""));
// A wrong character rules out any continuation -> no match at all.
assert!(re.find_partial("xxx token=abc!").is_none());

Feature status

Implemented

  • Literals, ., anchors ^ $ \A \z \b \B
  • Predefined classes \d \D \w \W \s \S (ASCII + Unicode via std)
  • Character classes [...] with ranges, negation, escapes
  • Alternation a|b|c
  • Quantifiers * + ? {m} {m,} {m,n} with greedy ?-lazy and +-possessive
  • Capturing / non-capturing / named groups (...) (?:...) (?P<n>...) (?<n>...)
  • Atomic groups (?>...)
  • Backreferences \1 \g<n> \g<name> (?P=name)
  • Lookahead / lookbehind (?=...) (?!...) (?<=...) (?<!...) (variable length)
  • Partial (end-anchored) matching via find_partial
  • Inline scoped flags (?i) (?i:...) (?i-m:...)
  • Inline comments (?#...) and free-spacing (VERBOSE)
  • Named & unicode properties \p{...} (a curated subset)
  • Repeated captures (captures, captures_iter)
  • is_match, find, find_at, find_iter, find_partial, captures, captures_iter
  • replace, replace_all with $1 / ${name} / $$ templates
  • split, split_iter
  • escape

Roadmap (signature mrab-regex features)

  • Fuzzy / approximate matching {e<=2}
  • Recursive patterns & subexpression calls (?R) (?1) (?&name) (?(DEFINE)...)
  • Branch reset (?|...|...)
  • Nested set operations [a&&b] [a--b] [a||b] [a~~b]
  • Full Unicode case-folding (ß ↔ ss); currently simple casefolding
  • \K, (*PRUNE), (*SKIP), (*FAIL), \G semantics
  • POSIX (leftmost-longest) and reverse ((?r)) matching modes
  • Concurrent/GIL-free operation, timeouts
  • \L<name> named lists

Core concepts

Error handling

All fallible operations return Result<T, Error>. Error carries an ErrorKind (syntax error, bad escape, bad quantifier, unknown group, …) plus the byte offset in the pattern where the problem was detected, when known.

use eregex::Regex;

let err = Regex::new(r"(").unwrap_err();
println!("{}", err); // e.g. "eregex error at position 1: unclosed group"

Examples

The examples/ directory contains runnable programs:

  • demo.rs — a tour of the core API.
  • gap_match.rs — gap-tolerant ("fuzzy") matching built on find_at + find_partial, for inputs where the target is split by noise (a workaround while in-pattern fuzzy matching is on the roadmap).

Run them with cargo run --example demo / cargo run --example gap_match.

Language bindings

The Rust core is wrapped by three companion crates, each under crates/:

package technology install
eregex (npm) napi-rs (native addon) npm i eregex
eregex-wasm (npm) wasm-bindgen / wasm-pack npm i eregex-wasm
eregex (PyPI) pyo3 / maturin pip install eregex

The Node and WASM packages expose the same JavaScript API (Regex, Match, PartialMatch, the flag constants, parseFlags, …) and the same null-on-absent semantics, so they are interchangeable: pick the native build for raw speed, or the WASM build for a single portable binary that can also be rebuilt for bundlers / browsers (wasm-pack build --target web).

Development

A shared pre-commit hook runs cargo fmt --all --check and cargo test --workspace before each commit. Enable it once per clone:

git config core.hooksPath .githooks

Bypass it for a single commit with git commit --no-verify.

Compatibility

  • MSRV: 1.85 (uses the 2024 edition).
  • License: Apache-2.0.
  • #![forbid(unsafe_code)] is enforced crate-wide.

License

Apache-2.0, matching the upstream mrab-regex project.