Expand description
An implementation of regexes, supporting a relatively rich set of features, including backreferences and lookaround. Aims to be compatible with Oniguruma syntax when the relevant flag is set.
It builds on top of the excellent regex crate. If you are not familiar with it, make sure you read its documentation and maybe you don’t even need fancy-regex.
If your regex or parts of it does not use any special features, the matching is delegated to the regex crate. That means it has linear runtime. But if you use “fancy” features such as backreferences or look-around, an engine with backtracking needs to be used. In that case, the regex can be slow and take exponential time to run because of what is called “catastrophic backtracking”. This depends on the regex and the input.
§Usage
The API should feel very similar to the regex crate, and involves compiling a regex and then using it to find matches in text.
§Example: Matching text
An example with backreferences to check if a text consists of two identical words:
use fancy_regex::Regex;
let re = Regex::new(r"^(\w+) (\1)$").unwrap();
let result = re.is_match("foo foo");
assert!(result.is_ok());
let did_match = result.unwrap();
assert!(did_match);Note that like in the regex crate, the regex needs anchors like ^ and $ to match against the
entire input text.
§Example: Finding the position of matches
use fancy_regex::Regex;
let re = Regex::new(r"(\d)\1").unwrap();
let result = re.find("foo 22");
assert!(result.is_ok(), "execution was successful");
let match_option = result.unwrap();
assert!(match_option.is_some(), "found a match");
let m = match_option.unwrap();
assert_eq!(m.start(), 4);
assert_eq!(m.end(), 6);
assert_eq!(m.as_str(), "22");§Example: Capturing groups
use fancy_regex::Regex;
let re = Regex::new(r"(?<!AU)\$(\d+)").unwrap();
let result = re.captures("AU$10, $20");
let captures = result.expect("Error running regex").expect("No match found");
let group = captures.get(1).expect("No group");
assert_eq!(group.as_str(), "20");§Example: Splitting text
use fancy_regex::Regex;
let re = Regex::new(r"[ \t]+").unwrap();
let target = "a b \t c\td e";
let fields: Vec<&str> = re.split(target).map(|x| x.unwrap()).collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
let fields: Vec<&str> = re.splitn(target, 3).map(|x| x.unwrap()).collect();
assert_eq!(fields, vec!["a", "b", "c\td e"]);§Features
This crate supports several optional features that can be enabled or disabled:
std(enabled by default): Enables standard library support. Disable forno_stdenvironments.unicode(enabled by default): Enables Unicode support for character classes and word boundaries.perf(enabled by default): Enables performance optimizations in the underlying regex engine.variable-lookbehinds(enabled by default): Enables support for variable-length lookbehind assertions (e.g.,(?<=a+)). Without this feature, only constant-length lookbehinds are supported. This feature uses reverse DFA matching from theregex-automatacrate to efficiently handle variable-length patterns that don’t use backreferences or other fancy features.
§Syntax
The regex syntax is based on the regex crate’s and on Oniguruma, with some additional supported syntax.
Where the two conflict, there is a flag to prefer Oniguruma parsing rules. (By default regex crate compatible parsing is used.)
Escapes:
\h
: hex digit ([0-9A-Fa-f])
\H
: not hex digit ([^0-9A-Fa-f])
\e
: escape control character (\x1B)
\K
: keep text matched so far out of the overall match (docs)
\G
: anchor to where the previous match ended (docs)
\Z
: anchor to the end of the text before any trailing newlines
\O
: any character including newline
\N
: any character except newline
\R
: general newline - matches all common line break characters: \n, \v, \f, \r, treating \r\n as an atomic unit
Backreferences:
\1
: match the exact string that the first capture group matched
\2
: backref to the second capture group, etc.
\k<name>
: match the exact string that the capture group named name matched
(?P=name)
: same as \k<name> for compatibility with Python, etc.
\g<name>
: call the subroutine defined in capture group named name
\g<1>
: call the subroutine defined in capture group 1. Subroutines can be recursive up to 20 levels deep.
Named capture groups:
(?<name>exp)
: match exp, creating capture group named name
(?P<name>exp)
: same as (?<name>exp) for compatibility with Python, etc.
Look-around assertions for matching without changing the current position:
(?=exp)
: look-ahead, succeeds if exp matches to the right of the current position
(?!exp)
: negative look-ahead, succeeds if exp doesn’t match to the right
(?<=exp)
: look-behind, succeeds if exp matches to the left of the current position
(?<!exp)
: negative look-behind, succeeds if exp doesn’t match to the left
Note: Look-behind assertions with variable length (e.g., (?<=a+)) are supported with the
variable-lookbehinds feature (enabled by default). Without this feature, only constant-length
look-behinds are supported. Variable-length look-behinds can include word boundaries and other
zero-width assertions (e.g., (?<=\ba+)) as long as the rest of the pattern doesn’t use
backreferences or other “fancy” features that require backtracking within the lookbehind.
Atomic groups using (?>exp) to prevent backtracking within exp, e.g.:
let re = Regex::new(r"^a(?>bc|b)c$").unwrap();
assert!(re.is_match("abcc").unwrap());
// Doesn't match because `|b` is never tried because of the atomic group
assert!(!re.is_match("abc").unwrap());Conditionals - if/then/else:
(?(1))
: continue only if first capture group matched
(?(<name>)) or (?('name'))
: continue only if capture group named name matched
(?(1)true_branch|false_branch)
: if the first capture group matched then execute the true_branch regex expression, else execute false_branch (docs)
(?(condition)true_branch|false_branch)
: if the condition matches then execute the true_branch regex expression, else execute false_branch from the point just before the condition was evaluated
(?(DEFINE)(capture group)(?<named_group>another)
: define capture groups for later use in subroutine calls
Backtracking control verbs:
(*FAIL)
: fail the current backtracking branch
Absent repeater:
(?~abc)
: match anything until abc would match or until the end of the haystack if no match
§Subroutines: reusable patterns with stable meaning
§What is a subroutine
Subroutines in fancy-regex are compiled pattern definitions that can be invoked safely and predictably. Any capture group can become a subroutine - it just needs to be “called”.
(?<num>\d*\.\d+|\d+) x \g<num>In the above example, a capture group called num is defined, to match numbers with or without decimal places. \g<num> executes the capture group again, without the author having to re-type the pattern inside it.
The above pattern would match text like 5.2 x 6 for instance.
Think of a subroutine as:
- defined by a capture group
- executed exactly the way the capture group was originally defined
- reusable from multiple places
§Side effects
A subroutine call has one side-effect - it updates the capture group position, which affects backref matching etc.
§Example
Let’s imagine a pattern which will match a digit and capture it into group 1. Then it will call that capture group as a subroutine. Then it will do a backref to group 1.
This will match three consecutive digits. The 2nd and 3rd digits must be identical.
use fancy_regex::{Error, Regex};
let re = Regex::new(r"(\d)\g<1>\1")?;
let result = re.captures("foo 711")?;
let captures = result.unwrap();
let m = captures.get(0).unwrap();
assert_eq!(m.start(), 4);
assert_eq!(m.end(), 7);
assert_eq!(m.as_str(), "711");
let group = captures.get(1).unwrap();
assert_eq!(group.as_str(), "1");
assert!(!re.is_match("foo 717")?);
In the above example, 7 was stored in capture group 1. Then it was replaced with 1 by the subroutine call. Then the backreference to group 1 can only match the literal 1.
§Side effect edge cases
Also, in a lookbehind, a subroutine call would not update the capture group position when the currently stored position for that capture group is further to the right in the haystack. i.e. right-most captures take precedence.
§Flags and the common misconception
When subroutines are first introduced, a very common assumption is:
“If I apply flags at the call site, they should affect the subroutine.”
This assumption is reasonable — many regex engines either behave this way or do not specify the behavior clearly. fancy-regex does not do this, and the reason is central to its design.
Let’s look at a concrete example.
§The pattern
\A(?<word>[a-z]+)\s+(?i:\g<word>)§The input
hello MrAt first glance, this pattern appears to say:
- At the beginning of the input string
- Match a word of lowercase letters and capture it as
word - Match some whitespace
- Call the subroutine
word, but case-insensitively
Many users therefore expect this pattern to match the input above.
It does not.
§What actually happens
The key to understanding this behavior is that flags belong to the subroutine definition, not the call site.
Let’s walk through the execution step by step.
§Execution trace
§Step 1: Assert position at the beginning of the input string
- Pattern:
\A - Active flags: none
- Input position: 0 (start of
"hello Mr")
§Step 2: Enter subroutine definition word
- Pattern:
[a-z]+ - Active flags: none
- Input position: 0 (start of
"hello Mr")
The engine greedily matches [a-z]+ against "hello".
The range 0 - 5 is stored in capture group 1, whose name is word.
§Step 3: Exit subroutine definition word and continue matching
- Pattern:
\s+ - Active flags: none
- Input position: 5 (at the space after
hello)
The engine matches \s+ against " ".
§Step 4: Call subroutine word
- Pattern:
\g<word>->(?<word>[a-z]+) - Active flags: none - because the capture group definition had no flags active
- Input position: 6 (after
" "at the'M'of"Mr")
This fails immediately, because:
- The pattern is case-sensitive
- The first character is
'M', which does not fall in the range a-z.
§Step 5: No alternatives available
- There are no alternations inside
word - There are no backtracking points before the failure
- The anchor prevents us from trying other starting positions in the input string
The match fails.
§Why the i flag did not apply
The i flag appears only at the call site:
(?i:\g<word>)However, calling a subroutine does not re-evaluate or modify its definition.
The subroutine word was compiled once, with these properties:
- Pattern:
[a-z]+ - Flags: none
- Capture group number: 1
When the subroutine is called, the engine:
- Enters the already-compiled definition
- Executes it exactly as defined
- Ignores any flags applied at the call site
§This is a feature, not a limitation
fancy-regex deliberately enforces this rule to guarantee that:
- A subroutine behaves the same everywhere it is used
- Flags cannot silently change the meaning of a reused pattern
- There is no “action at a distance” from call sites
If call-site flags were allowed to affect subroutines, the same subroutine could behave differently depending on where it was called - making patterns harder to reason about and easier to misuse.
§Expressing the intended behavior
If the intent is for word to be matched case-insensitively, the flag must be applied at the definition:
(?i:(?<word>[a-z]+))\s+\g<word>or
(?<word>(?i:[a-z]+))\s+\g<word>Now the subroutine is compiled with the i flag, and every call to it behaves consistently.
§Key takeaway
Subroutines in fancy-regex are compiled once, with fixed flags. Call sites cannot change their behavior.
This rule enables safe reuse, predictable execution, and clear reasoning - especially in larger and more complex patterns. It also matches Oniguruma behavior, so if you plan to use fancy-regex as a memory-safe alternative, you can!
§Compile-time rejection of left-recursive patterns
fancy-regex’s support of subroutines unlocks powerful features such as recursion. With that power comes the risk of defining patterns that can recurse forever.
To guarantee termination and predictable behavior, fancy-regex rejects left-recursive patterns at compile time.
This check is conservative: if a pattern could recurse without consuming input, it is rejected - even if a particular input would not trigger that behavior.
§What is left recursion?
A pattern is left-recursive if it can re-enter itself without consuming any input.
In other words, the engine can make recursive calls while staying at the same input position.
A simplified example looks like this:
(?<expr>\g<expr>a|a)Here, the subroutine expr can immediately call itself before matching anything. No matter what the input is, this definition allows infinite recursion.
§Why fancy-regex rejects these patterns
Left recursion is problematic because:
- It can cause infinite recursion or unbounded backtracking
- It cannot be made safe by input inspection alone
Even if a specific input would not trigger the recursion, the pattern itself is unsafe.
Rather than attempting to detect or recover from such cases at runtime, fancy-regex enforces a stronger rule:
Every recursive call must consume input before it can recurse again.
This guarantees that evaluation always makes progress.
§Conservative by design
The left-recursion check is intentionally conservative.
Consider the following pattern:
(?<expr>ab|\g<expr>a)For the input “ab”, this pattern would terminate successfully. However, fancy-regex still rejects it.
Why?
Because the second alternative allows recursion before any input is consumed. The engine cannot rely on runtime input to guarantee termination.
This is a deliberate design choice:
fancy-regex validates the structure of the pattern, not the behavior of a particular input.
§What is allowed
Recursive patterns are allowed as long as they consume input before recursing.
For example:
(?<paren>\((?:[^()]*|\g<paren>)\))Here:
- Each recursive call to
parenis preceded by a literal ‘(’ - Input is always consumed before recursion
- Termination is guaranteed
This kind of recursion is safe and fully supported.
§How to restructure left-recursive patterns
Left-recursive definitions can often be rewritten in a right-recursive or iterative form.
For example, instead of:
(?<expr>\g<expr>a|a)You can write:
a+Or, when recursion is genuinely required:
(?<expr>a\g<expr>?)In this version, input is consumed before the recursive call, satisfying fancy-regex’s safety rules.
§Recursion
Recursion is when a subroutine calls itself, directly or indirectly.
§Depth limit
fancy-regex supports recursion up to 20 levels deep.
Let’s look at a simple example to prove this:
(a\g<1>?)Here we have a pattern which defines capture group 1 as consuming the literal a, followed by calling itself between 0 and 1 times greedily.
With 22 a characters as input, only 20 are matched:
use fancy_regex::{Error, Regex};
let pattern = r"(a\g<1>?)";
let re = Regex::new(pattern)?;
let haystack = "aaaaaaaaaaaaaaaaaaaaaa"; // 22 a's
let result = re.find(haystack)?;
let found = result.unwrap();
// match is limited to 20 characters due to recursion depth limit
assert_eq!(found.as_str().len(), 20);
§Unbounded recursion
fancy-regex will return a compile error for patterns which recurse indefinitely.
Let’s look at a simple example to prove this:
(a\g<1>)Here, capture group 1 consumes the literal a, then calls itself unconditionally. After recursion level 20 is reached, there is not a single path which would return a match.
§Side effects
You may remember that it was stated earlier that the side effect of a subroutine call is that the capture group will be updated. It would be more accurate to say that the capture group is updated for non-recursive subroutine calls only.
Why?
Imagine a pattern like:
(?<foo>a|\(\g<foo>\))It will match the literal a, or any number of balanced parenthesis surrounding a.
If the recursive subroutine call would update the capture group start position, the opening parenthesis would not be included in the capture group.
If the recursive subroutine call would update the capture group end position (as well), you’d get the inner most subroutine call’s start position and outer most subroutine call’s end position, which would then be overridden anyway when the capture group at the root level is exited.
This would produce exceptionally odd and confusing behavior.
§Backreferences
fancy-regex does not yet support relative recursion level backreferences, and attempting to backreference a capture group which is currently being recursed is at present a compile error.
Example (adapting the previous pattern):
(?<foo>a|\(\g<foo>\)\k<foo>?)With an input like:
(((a)(a)))Oniguruma would give you two matches - the two (a)s.
fancy-regex would give (if the compile error were removed and no other changes made,) a single match of (a)(a, which is clearly not what anyone would expect.
fancy-regex prefers correctness and rejects such patterns rather than exhibiting undefined behavior.
§Absent Operators
The absent operators are worth talking about because they are quite uncommon.
§Absent Repeater
An absent repeater node is defined by the syntax (?~inner_pattern), and it will match any text where the inner pattern does not match (i.e. is absent), including across newlines.
This does not add any new abilities to the engine, it just allows to clarify intent and to be more easily optimized under the hood. fancy-regex mainly implements this for Oniguruma compatibility.
It works best or is at least easiest to understand when the inner pattern is a literal.
§Example
Let’s imagine you have some Markdown, containing some code fences.
It might look something like this:
# Some Heading
Given a todo list like this:
**Input:**
```json
{
"todos": [
{
"content": "Create `some_helper_func` helper in some_file.rs that takes a closure to check the error",
"status": "complete",
"priority": "high"
},
{
"content": "Update error-asserting tests in some_file.rs to use `some_helper_func`",
"status": "complete",
"priority": "high"
},
{
"content": "Run `cargo fmt` and `cargo test`",
"status": "pending",
"priority": "medium"
}
]
}
```
You might expect this output:
**Output:**
```text
High priority tasks have now been completed.
```
Some more text.Let’s say you want to match all input and output codeblocks.
Typically you could do it like:
[*]{2}(?:In|Out)put:[*]{2}\n```(?:[^`]+|`(?!``))+```This would match everything inside the codeblock which is not a backtick, or backticks which are not followed by another 2 backticks, until it reaches the 3 backticks marking the end of the codeblock. Generally this type of construct can be quite hard to follow and reason about, to be sure it won’t suffer from catastrophic backtracking.
With the absent repeater, the intention becomes a lot easier to understand - match anything that isn’t 3 backticks, followed by 3 backticks.
[*]{2}(?:In|Out)put:[*]{2}\n```(?~```)```Where it really shines is when you need more complicated expressions, like to match a variable number of backticks from the code fence boundaries, it becomes a lot easier to read than an expanded alternative which would avoid catastrophic backtracking.
use fancy_regex::Regex;
// Match a code fence: opening backticks (3+), content (absent the same backticks), closing backticks
let re = Regex::new(r"(?<!`)(`{3,}(?!`))\w*\n(?~\1)\n(\1)")?;
// A code fence with 4 backticks, where the inner code contains 3 backticks
let input = "````text\nsome code with ``` backticks\n````";
let captures = re.captures(input)?.expect("should match");
// The overall match spans the entire input
let m = captures.get(0).unwrap();
assert_eq!(m.start(), 0);
assert_eq!(m.end(), input.len());
assert_eq!(m.as_str(), input);
// Group 1: the opening 4 backticks
let open = captures.get(1).unwrap();
assert_eq!(open.as_str(), "````");
assert_eq!(open.start(), 0);
assert_eq!(open.end(), 4);
// Group 2: the closing 4 backticks
let close = captures.get(2).unwrap();
assert_eq!(close.as_str(), "````");
assert_eq!(close.start(), input.len() - 4);
assert_eq!(close.end(), input.len());
It also allows the engine to optimize it accordingly.
§Other ways of looking at it
The absent repeater can be considered shorthand for this:
(?((?!absent))\O|)*Essentially a conditional, which says when the absent expression doesn’t match, match a single character including newlines. When the absent expression does match, match nothing. Repeat greedily.
Structs§
- Capture
Matches - An iterator that yields all non-overlapping capture groups matching a particular regular expression.
- Capture
Names - An iterator over capture names in a Regex. The iterator returns the name of each group, or None if the group has no name. Because capture group 0 cannot have a name, the first item returned is always None.
- Captures
- A set of capture groups found for a regex.
- Expander
- A set of options for expanding a template string using the contents of capture groups.
- Match
- A single match of a regex or group in an input text
- Matches
- An iterator over all non-overlapping matches for a particular string.
- NoExpand
NoExpandindicates literal string replacement.- Regex
- A compiled regular expression.
- Regex
Builder - A builder for a
Regexto allow configuring options. - Regex
Options Builder - A builder for a
Regexto allow configuring options. - Replacer
Ref - By-reference adaptor for a
Replacer - Split
- An iterator over all substrings delimited by a regex.
- SplitN
- An iterator over at most
Nsubstrings delimited by a regex. - SubCapture
Matches - Iterator for captured groups in order in which they appear in the regex.
Enums§
- Absent
- Type of absent operator as used for Oniguruma’s absent functionality.
- Assertion
- Type of assertions
- AstNode
- Abstract Syntax Tree node - will be resolved into an Expr before analysis
- Backtracking
Control Verb - Type of backtracking control verb which affects how backtracking will behave. See https://www.regular-expressions.info/verb.html
- Capture
Group Target - Target of a backreference or subroutine call
- Compile
Error - An error as the result of compiling a regex.
- Error
- An error as the result of parsing, compiling or running a regex.
- Expr
- Regular expression AST. This is public for now but may change.
- Expr
Children Iter - An iterator over the immediate children of an
Expr. - Expr
Children Iter Mut - An iterator over the immediate children of an
Exprfor mutable access. - Look
Around - Type of look-around assertion as used for a look-around expression.
- Parse
Error - An error for the result of parsing a regex pattern.
- Runtime
Error - An error as the result of executing a regex.
Traits§
- Replacer
- Replacer describes types that can be used to replace matches in a string.
Functions§
- escape
- Escapes special characters in
textwith ‘\’. Returns a string which, when interpreted as a regex, matches exactlytext.
Type Aliases§
- Result
- Result type for this crate with specific error enum.