RE#
A high-performance, automata-based regex engine with first-class support for intersection and complement operations. RE#'s main strength is complex patterns - large lists of alternatives, lookarounds, and boolean combinations - where traditional engines degrade or fall back to slower paths.
RE# compiles patterns into deterministic automata. All matching is non-backtracking with guaranteed linear-time execution. RE# extends standard regex syntax with intersection (&), complement (~), and a universal wildcard (_), enabling patterns that are impossible or impractical to express with standard regex.
paper | blog post | syntax docs | dotnet version and web playground
Install
cargo add resharp
Usage
let re = new.unwrap;
let matches = re.find_all;
let found = re.is_match;
Syntax extensions
RE# supports standard regex syntax plus three extensions: _ (universal wildcard), & (intersection), and ~(...) (complement). _ matches any character including newlines, so _* means "any string".
_* any string
a_* any string that starts with 'a'
_*a any string that ends with 'a'
_*a_* any string that contains 'a'
~(_*a_*) any string that does NOT contain 'a'
(_*a_*)&~(_*b_*) contains 'a' AND does not contain 'b'
(?<=b)_*&_*(?=a) preceded by 'b' AND followed by 'a'
You combine all of these with & to get more complex patterns. RE# also supports lookarounds ((?=...), (?<=...), (?!...), (?<!...)), compiled directly into the automaton with no backtracking.
NOTE: RE# is not compatible with some regex crate features, eg. lazy quantifiers (.*?). See the full syntax reference for details.
Differences from resharp-dotnet and rust regex
Written from scratch in rust - operates on &[u8] / UTF-8 rather than UTF-16, uses regex-syntax as a parser base. When to use this over the regex crate:
- intersection, complement, or lookarounds
- large alternatives with high performance (at the expense of memory)
- leftmost longest matches rather than leftmost first
find_anchoredandfind_all(nofindorcaptures)
Matching returns Result<Vec<Match>, Error> - capacity or lookahead overflow will fail outright rather than silently degrade. EngineOptions controls precompilation threshold, capacity, and lookahead context:
let opts = EngineOptions ;
let re = with_options.unwrap;
Benchmarks
Throughput comparison with regex and fancy-regex, compiled with --release. Compile time is excluded; only matching is measured. Uses SIMD intrinsics (AVX2, NEON) with possibly more backends in the near future. Run with cargo bench -- 'readme/' --list.
AMD Ryzen 7 5800X (105W TDP)
| Benchmark | resharp | regex | fancy-regex |
|---|---|---|---|
| dictionary 2663 words (900KB, ~15 matches) | 500 MiB/s | 552 MiB/s | 545 MiB/s |
| dictionary 2663 words (944KB, ~2678 matches) | 449 MiB/s | 58 MiB/s | 20 MiB/s |
dictionary (?i) 2663 words (900KB) |
503 MiB/s | 0.03 MiB/s | 0.03 MiB/s |
lookaround (?<=\s)[A-Z][a-z]+(?=\s) (900KB) |
386 MiB/s | -- | 25 MiB/s |
| literal alternation (900KB) | 12.1 GiB/s | 11.4 GiB/s | 10.2 GiB/s |
literal "Sherlock Holmes" (900KB) |
33.9 GiB/s | 38.7 GiB/s | 33.7 GiB/s |
Rockchip RK3588 ARM (5-10W TDP)
| Benchmark | resharp | regex | fancy-regex |
|---|---|---|---|
| dictionary 2663 words (900KB, ~15 matches) | 287 MiB/s | 313 MiB/s | 312 MiB/s |
| dictionary seeded (944KB, ~2678 matches) | 229 MiB/s | 25 MiB/s | 8 MiB/s |
dictionary (?i) 2663 words (900KB) |
288 MiB/s | 0.01 MiB/s | 0.01 MiB/s |
lookaround (?<=\s)[A-Z][a-z]+(?=\s) (900KB) |
204 MiB/s | -- | 9 MiB/s |
| literal alternation (900KB) | 1.70 GiB/s | 1.99 GiB/s | 1.94 GiB/s |
literal "Sherlock Holmes" (900KB) |
6.44 GiB/s | 7.11 GiB/s | 6.85 GiB/s |
Notes on the results:
- The first dictionary row is roughly tied - the prose haystack only contains ~15 matches, so the lazy DFA barely explores any states. RE#'s advantage is that its full DFA is smaller, but this isn't visible when most states are never materialized.
- On longer inputs or denser matches, the other engines will degrade - take lazy-dfa benchmarks with a grain of salt, you will not be matching the exact same string over and over in the real world. The seeded dictionary row confirms this: with ~2678 matches, RE# holds at 449 MiB/s vs 58 MiB/s for
regexon x86, and 229 MiB/s vs 25 MiB/s on ARM. - The
(?i)row shows what happens when the pattern forcesregexto fall back from its DFA to an NFA: throughput drops to 0.03 MiB/s (x86) / 0.01 MiB/s (ARM). RE# handles case folding in the DFA and maintains full speed. You can increaseregex's DFA threshold to avoid this fallback, but only up to a point. - RE# compiles lookarounds directly into the automaton - no back-and-forth between forward and backward passes.
regexdoesn't support lookarounds except for anchors;fancy-regexhandles them via backtracking, which is occasionally much slower. - The same patterns that win on x86 also win on ARM - the full DFA approach scales down well.
- If you encounter a bug or a pattern where RE# is >5x slower than
regexorfancy-regex, please open an issue - it would help improve the library. Note thatregexreturns leftmost-first matches while RE# returns leftmost-longest, so match results may differ. The performance profile also differs - RE# works right to left whileregexworks left to right.
Crate structure
| Crate | Description |
|---|---|
resharp |
engine and public API (resharp-engine) |
resharp-algebra |
algebraic regex tree, constraint solver, nullability analysis |
resharp-parser |
pattern string to AST, extends regex-syntax with RE# operators |
And most importantly, have fun! :)