RE#
A high-performance, automata-based regex engine with first-class support for intersection and complement operations. RE#'s main strength is complex patterns - large lists of alternatives, lookarounds, and boolean combinations - where traditional engines degrade or fall back to slower paths.
RE# compiles patterns into deterministic automata. All matching is non-backtracking with guaranteed linear-time execution. RE# extends standard regex syntax with intersection (&), complement (~), and a universal wildcard (_), enabling patterns that are impossible or impractical to express with standard regex.
paper | blog post | syntax docs | dotnet version and web playground
Install
Usage
let re = new.unwrap;
let matches = re.find_all;
let found = re.is_match;
Syntax extensions
RE# supports standard regex syntax plus three extensions: _ (universal wildcard), & (intersection), and ~(...) (complement). _ matches any character including newlines, so _* means "any string".
_*
a_*
_*a
_*a_*
~(_*a_*) NOT
(_*a_*)&~(_*b_*) AND does not
(?<=b)_*&_*(?=a) AND
You combine all of these with & to get more complex patterns. RE# also supports lookarounds ((?=...), (?<=...), (?!...), (?<!...)), compiled directly into the automaton with no backtracking.
RE# is not compatible with some
regexcrate features, eg. lazy quantifiers (.*?). See the full syntax reference for details, and features for untrusted mode and other advanced options.
When to use RE# over regex
This is a from-scratch rust implementation operating on &[u8] / UTF-8 (the dotnet version uses UTF-16), with regex-syntax as a parser base. RE# aims to match regex crate performance on standard patterns, with trade-offs on either side. Reasons to reach for RE#:
- intersection, complement, or lookarounds
- large alternatives with high performance (at the expense of memory)
- leftmost longest matches rather than leftmost-greedy (PCRE)
find_anchoredandfind_all(nofindorcaptures)
Matching returns Result<Vec<Match>, Error> - capacity or lookahead overflow will fail outright rather than silently degrade. EngineOptions controls precompilation threshold, capacity, and lookahead context:
let opts = EngineOptions ;
let re = with_options.unwrap;
Benchmarks
Throughput comparison with regex and fancy-regex, compiled with --release. Compile time is excluded; only matching is measured. Uses SIMD intrinsics (AVX2, NEON) with possibly more backends in the near future. Run with cargo bench -- 'readme/' --list.
AMD Ryzen 7 5800X (105W TDP)
| Benchmark | resharp | regex | fancy-regex |
|---|---|---|---|
| dictionary 2663 words (900KB, ~15 matches) | 633 MiB/s | 541 MiB/s | 531 MiB/s |
| dictionary 2663 words (944KB, ~2678 matches) | 535 MiB/s | 58 MiB/s | 20 MiB/s |
dictionary (?i) 2663 words (900KB) |
632 MiB/s | 0.03 MiB/s | 0.03 MiB/s |
lookaround (?<=\s)[A-Z][a-z]+(?=\s) (900KB) |
460 MiB/s | -- | 25 MiB/s |
| literal alternatives (900KB) | 12.0 GiB/s | 11.2 GiB/s | 10.1 GiB/s |
literal "Sherlock Holmes" (900KB) |
33.2 GiB/s | 34.0 GiB/s | 30.3 GiB/s |
Rockchip RK3588 ARM (5-10W TDP)
| Benchmark | resharp | regex | fancy-regex |
|---|---|---|---|
| dictionary 2663 words (900KB, ~15 matches) | 271 MiB/s | 315 MiB/s | 317 MiB/s |
| dictionary 2663 words (944KB, ~2678 matches) | 214 MiB/s | 25 MiB/s | 9 MiB/s |
dictionary (?i) 2663 words (900KB) |
271 MiB/s | 0.01 MiB/s | 0.01 MiB/s |
lookaround (?<=\s)[A-Z][a-z]+(?=\s) (900KB) |
198 MiB/s | -- | 10 MiB/s |
| literal alternatives (900KB) | 1.73 GiB/s | 2.00 GiB/s | 1.95 GiB/s |
literal "Sherlock Holmes" (900KB) |
6.74 GiB/s | 7.05 GiB/s | 6.78 GiB/s |
(crazy how close a board smaller than a phone gets to desktop throughput these days. what a time to be alive)
Notes on the results:
- The first dictionary row is roughly tied - the prose haystack only contains ~15 matches, so the lazy DFA barely explores any states. RE#'s advantage is that its full DFA is smaller, but this isn't visible when most states are never materialized.
- On longer inputs or denser matches, the other engines will degrade - take lazy-dfa benchmarks with a grain of salt, you will not be matching the exact same string over and over in the real world. The seeded dictionary row confirms this: with ~2678 matches, RE# holds at 535 MiB/s vs 58 MiB/s for
regexon x86. - The
(?i)row shows what happens when the pattern forcesregexto fall back from its DFA to an NFA: throughput drops to 0.03 MiB/s. RE# handles case folding in the DFA and maintains full speed. You can increaseregex's DFA threshold to avoid this fallback, but only up to a point. - RE# compiles lookarounds directly into the automaton - no back-and-forth between forward and backward passes.
regexdoesn't support lookarounds except for anchors;fancy-regexhandles them via backtracking, which is occasionally much slower. - The same patterns that win on x86 also win on ARM - the full DFA approach scales down well.
- If you encounter a bug or a pattern where RE# is >5x slower than
regexorfancy-regex, please open an issue - it would help improve the library. Note thatregexreturns leftmost-greedy (PCRE) matches while RE# returns leftmost-longest, so match results may differ. The performance profile also differs - RE# works right to left whileregexworks left to right. - Also see the rebar comparison to
regex- despite its own bias disclaimer, the fairest and most varied regex benchmark suite out there, where most others pick patterns with rare bytes to skip to or run entirely different configurations. Rebar targets leftmost-first engines, so RE#'s leftmost-longest semantics do some extra work. Be wary of the throughput numbers on short inputs though - they letregexbuild a tiny purpose-built automaton for matching the exact same string repeatedly, so the reported MiB/s doesn't reflect real-world scanning speed. On longer inputs the gap shifts further in RE#'s favor. As it says, all models are wrong but some are useful.
Crate structure
| Crate | Description |
|---|---|
resharp |
engine and public API (resharp-engine) |
resharp-algebra |
algebraic regex tree, constraint solver, nullability analysis |
resharp-parser |
pattern string to AST, extends regex-syntax with RE# operators |
And most importantly, have fun! :)