Struct regex_lite::RegexBuilder

source ·

pub struct RegexBuilder { /* private fields */ }

Expand description

A configurable builder for a Regex.

This builder can be used to programmatically set flags such as i (case insensitive) and x (for verbose mode). This builder can also be used to configure things like a size limit on the compiled regular expression.

Implementations§

source §

impl RegexBuilder

source

pub fn new(pattern: &str) -> RegexBuilder

Create a new builder with a default configuration for the given pattern.

If the pattern is invalid or exceeds the configured size limits, then an error will be returned when RegexBuilder::build is called.

source

pub fn build(&self) -> Result<Regex, Error>

Compiles the pattern given to RegexBuilder::new with the configuration set on this builder.

If the pattern isn’t a valid regex or if a configured size limit was exceeded, then an error is returned.

source

pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder

This configures whether to enable ASCII case insensitive matching for the entire pattern.

This setting can also be configured using the inline flag i in the pattern. For example, (?i:foo) matches foo case insensitively while (?-i:foo) matches foo case sensitively.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo(?-i:bar)quux")
    .case_insensitive(true)
    .build()
    .unwrap();
assert!(re.is_match("FoObarQuUx"));
// Even though case insensitive matching is enabled in the builder,
// it can be locally disabled within the pattern. In this case,
// `bar` is matched case sensitively.
assert!(!re.is_match("fooBARquux"));

source

pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures multi-line mode for the entire pattern.

Enabling multi-line mode changes the behavior of the ^ and $ anchor assertions. Instead of only matching at the beginning and end of a haystack, respectively, multi-line mode causes them to match at the beginning and end of a line in addition to the beginning and end of a haystack. More precisely, ^ will match at the position immediately following a \n and $ will match at the position immediately preceding a \n.

The behavior of this option is impacted by the RegexBuilder::crlf setting. Namely, CRLF mode changes the line terminator to be either \r or \n, but never at the position between a \r and \n.

This setting can also be configured using the inline flag m in the pattern.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .build()
    .unwrap();
assert_eq!(Some(1..4), re.find("\nfoo\n").map(|m| m.range()));

source

pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder

This configures dot-matches-new-line mode for the entire pattern.

Perhaps surprisingly, the default behavior for . is not to match any character, but rather, to match any character except for the line terminator (which is \n by default). When this mode is enabled, the behavior changes such that . truly matches any character.

This setting can also be configured using the inline flag s in the pattern.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"foo.bar")
    .dot_matches_new_line(true)
    .build()
    .unwrap();
let hay = "foo\nbar";
assert_eq!(Some("foo\nbar"), re.find(hay).map(|m| m.as_str()));

source

pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder

This configures CRLF mode for the entire pattern.

When CRLF mode is enabled, both \r (“carriage return” or CR for short) and \n (“line feed” or LF for short) are treated as line terminators. This results in the following:

Unless dot-matches-new-line mode is enabled, . will now match any character except for \n and \r.
When multi-line mode is enabled, ^ will match immediately following a \n or a \r. Similarly, $ will match immediately preceding a \n or a \r. Neither ^ nor $ will ever match between \r and \n.

This setting can also be configured using the inline flag R in the pattern.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^foo$")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\nfoo\r\n";
// If CRLF mode weren't enabled here, then '$' wouldn't match
// immediately after 'foo', and thus no match would be found.
assert_eq!(Some("foo"), re.find(hay).map(|m| m.as_str()));

This example demonstrates that ^ will never match at a position between \r and \n. ($ will similarly not match between a \r and a \n.)

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"^")
    .multi_line(true)
    .crlf(true)
    .build()
    .unwrap();
let hay = "\r\n\r\n";
let ranges: Vec<_> = re.find_iter(hay).map(|m| m.range()).collect();
assert_eq!(ranges, vec![0..0, 2..2, 4..4]);

source

pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder

This configures swap-greed mode for the entire pattern.

When swap-greed mode is enabled, patterns like a+ will become non-greedy and patterns like a+? will become greedy. In other words, the meanings of a+ and a+? are switched.

This setting can also be configured using the inline flag U in the pattern.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let re = RegexBuilder::new(r"a+")
    .swap_greed(true)
    .build()
    .unwrap();
assert_eq!(Some("a"), re.find("aaa").map(|m| m.as_str()));

source

pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder

This configures verbose mode for the entire pattern.

When enabled, whitespace will treated as insignifcant in the pattern and # can be used to start a comment until the next new line.

Normally, in most places in a pattern, whitespace is treated literally. For example + will match one or more ASCII whitespace characters.

When verbose mode is enabled, \# can be used to match a literal # and \ can be used to match a literal ASCII whitespace character.

Verbose mode is useful for permitting regexes to be formatted and broken up more nicely. This may make them more easily readable.

This setting can also be configured using the inline flag x in the pattern.

The default for this is false.

Example

use regex_lite::RegexBuilder;

let pat = r"
    \b
    (?<first>[A-Z]\w*)  # always start with uppercase letter
    \s+                 # whitespace should separate names
    (?: # middle name can be an initial!
        (?:(?<initial>[A-Z])\.|(?<middle>[A-Z]\w*))
        \s+
    )?
    (?<last>[A-Z]\w*)
    \b
";
let re = RegexBuilder::new(pat)
    .ignore_whitespace(true)
    .build()
    .unwrap();

let caps = re.captures("Harry Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry J. Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(Some("J"), caps.name("initial").map(|m| m.as_str()));
assert_eq!(None, caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

let caps = re.captures("Harry James Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(None, caps.name("initial").map(|m| m.as_str()));
assert_eq!(Some("James"), caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);

source

pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder

Sets the approximate size limit, in bytes, of the compiled regex.

This roughly corresponds to the number of heap memory, in bytes, occupied by a single regex. If the regex would otherwise approximately exceed this limit, then compiling that regex will fail.

The main utility of a method like this is to avoid compiling regexes that use an unexpected amount of resources, such as time and memory. Even if the memory usage of a large regex is acceptable, its search time may not be. Namely, worst case time complexity for search is `O(m

n), where m ~ len(pattern)andn ~ len(haystack)`. That is, search time depends, in part, on the size of the compiled regex. This means that putting a limit on the size of the regex limits how much a regex can impact search time.

The default for this is some reasonable number that permits most patterns to compile successfully.

Example

use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"\w").size_limit(100).build().is_err());

source

pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder

Set the nesting limit for this parser.

The nesting limit controls how deep the abstract syntax tree is allowed to be. If the AST exceeds the given limit (e.g., with too many nested groups), then an error is returned by the parser.

The purpose of this limit is to act as a heuristic to prevent stack overflow for consumers that do structural induction on an AST using explicit recursion. While this crate never does this (instead using constant stack space and moving the call stack to the heap), other crates may.

This limit is not checked until the entire AST is parsed. Therefore, if callers want to put a limit on the amount of heap space used, then they should impose a limit on the length, in bytes, of the concrete pattern string. In particular, this is viable since this parser implementation will limit itself to heap space proportional to the length of the pattern string. See also the untrusted inputs section in the top-level crate documentation for more information about this.

Note that a nest limit of 0 will return a nest limit error for most patterns but not all. For example, a nest limit of 0 permits a but not ab, since ab requires an explicit concatenation, which results in a nest depth of 1. In general, a nest limit is not something that manifests in an obvious way in the concrete syntax, therefore, it should not be used in a granular way.

Example

use regex_lite::RegexBuilder;

assert!(RegexBuilder::new(r"").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"a").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"(a)").nest_limit(0).build().is_err());