RE#
A high-performance, automata-based regex engine with first-class support for intersection (&), complement (~). Non-backtracking with linear-time matching. Built for complex patterns (large alternations, lookarounds, boolean combinations) that make traditional engines degrade or fall back to slower paths.
paper | blog post | syntax docs | dotnet version and web playground
Quick start
// 8+ alphanumeric & contains digit & contains uppercase
let re = new.unwrap;
let found = re.is_match.unwrap;
let matches = re.find_all.unwrap;
When to use RE# over regex
RE# operates on &[u8] / UTF-8 and aims to match regex crate throughput on standard patterns. Use RE# when you need:
- intersection (
&), complement (~), or lookarounds - large alternations with high throughput (at the cost of memory)
- fail-loud behavior: capacity / lookahead overflow returns
Errinstead of silently degrading
RE# is designed around is_match and find_all. It doesn't provide find or captures, but for simple cases you can often substitute find_anchored, or emulate a capture group with lookarounds. For example, a(b)c becomes (?<=a)b(?=c). For anything more involved, use the regex crate instead.
Syntax extensions
RE# supports standard regex syntax plus three extensions: _ (any byte), & (intersection), and ~(...) (complement). _* means "any string".
_*
a_*
_*a
_*a_*
~(_*a_*) NOT
(_*a_*)&~(_*b_*) AND does not
(?<=b)_*&_*(?=a) AND
You combine all of these with & to get more complex patterns. RE# also supports lookarounds ((?=...), (?<=...), (?!...), (?<!...)), compiled directly into the automaton with no backtracking.
Differences from PCRE / regex
- Leftmost-longest, not leftmost-greedy.
y|yeson"yes"matchesyes. Branch order is irrelevant. - Multiline on by default.
^/$match start/end of line; disable with(?-m).\A/\zalways anchor to input. \wdefaults to 2-byte UTF-8. See UnicodeMode.
Lazy quantifiers (*?, +?, ...) are parse errors; rewrite with complement when possible: <div>.*?</div> -> <div>~(_*</div>_*)</div>. See syntax.md for the rest.
Configuration
let opts = RegexOptions ;
let re = with_options.unwrap;
Benchmarks
RE# against regex, fancy-regex, and PCRE2 on a few popular patterns from crates.io. Regenerate with:
resharp runs with UnicodeMode::Full and multiline(false) to match the other engines. Ratios are vs the fastest per row.
Scan (find_all over a 1 MiB haystack), throughput
| Pattern | resharp | regex | fancy-regex | pcre2 |
|---|---|---|---|---|
\s+ |
414.94 MiB/s (1.00x) | 391.82 MiB/s (1.06x) | 155.91 MiB/s (2.66x) | 184.44 MiB/s (2.25x) |
\d+ |
1012.4 MiB/s (1.00x) | 503.52 MiB/s (2.01x) | 304.87 MiB/s (3.32x) | 362.47 MiB/s (2.79x) |
.* |
2.42 GiB/s (1.00x) | 326.02 MiB/s (7.60x) | 166.82 MiB/s (14.86x) | 303.4 MiB/s (8.17x) |
[0-9a-f]{64} |
1.3 GiB/s (1.00x) | 718 MiB/s (1.86x) | 597.23 MiB/s (2.23x) | 180.28 MiB/s (7.39x) |
https?://\S+ |
4.58 GiB/s (1.00x) | 2.35 GiB/s (1.95x) | 1.34 GiB/s (3.41x) | 1.81 GiB/s (2.53x) |
Version/([.0-9]+) |
7.09 GiB/s (1.04x) | 7.38 GiB/s (1.00x) | 3.68 GiB/s (2.01x) | 3.96 GiB/s (1.86x) |
\n{3,} |
11.66 GiB/s (1.00x) | 11.24 GiB/s (1.04x) | 5.15 GiB/s (2.27x) | 1.79 GiB/s (6.53x) |
[-_.]+ |
1.74 GiB/s (1.00x) | 1008.6 MiB/s (1.77x) | 481.64 MiB/s (3.71x) | 480.85 MiB/s (3.71x) |
Validate (is_match on a single value), latency
| Pattern | resharp | regex | fancy-regex | pcre2 |
|---|---|---|---|---|
^\d{4}-\d{2}-\d{2}$ |
23.42 ns (1.05x) | 24.32 ns (1.09x) | 22.3 ns (1.00x) | 59.97 ns (2.69x) |
^([a-zA-Z][a-zA-Z0-9_-]+)$ |
34.62 ns (1.05x) | 34.84 ns (1.06x) | 32.86 ns (1.00x) | 77.11 ns (2.35x) |
^[0-9]+$ |
24.53 ns (1.25x) | 22.86 ns (1.16x) | 19.64 ns (1.00x) | 56.37 ns (2.87x) |