Struct RegexOptionsBuilder

Source

pub struct RegexOptionsBuilder { /* private fields */ }

Expand description

A builder for a Regex to allow configuring options.

Implementations§

Source §

impl RegexOptionsBuilder

Source

pub fn new() -> Self

Create a new regex options builder.

Source

pub fn build(&self, pattern: String) -> Result<Regex>

Build a Regex from the given pattern.

Returns an Error if the pattern could not be parsed.

Source

pub fn case_insensitive(&mut self, yes: bool) -> &mut Self

Override default case insensitive this is to enable/disable casing via builder instead of a flag within the raw string pattern which will be parsed

Default is false

Source

pub fn multi_line(&mut self, yes: bool) -> &mut Self

Enable multi-line regex

Source

pub fn ignore_whitespace(&mut self, yes: bool) -> &mut Self

Allow ignore whitespace

Source

pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut Self

Enable or disable the “dot matches any character” flag. When this is enabled, . will match any character. When it’s disabled, then . will match any character except for a new line character.

Source

pub fn crlf(&mut self, yes: bool) -> &mut Self

Enable or disable the CRLF mode flag (R).

When enabled, \r\n is treated as a single line ending for the purposes of ^ and $ in multi-line mode, instead of treating \r and \n as separate line endings.

By default, this is disabled. It may be selectively enabled in the regular expression by using the R flag, e.g. (?mR) or (?Rm).

Source

pub fn verbose_mode(&mut self, yes: bool) -> &mut Self

Enable verbose mode in the regular expression.

The same as ignore_whitespace

When enabled, verbose mode permits insigificant whitespace in many places in the regular expression, as well as comments. Comments are started using # and continue until the end of the line.

By default, this is disabled. It may be selectively enabled in the regular expression by using the x flag regardless of this setting.

Source

pub fn unicode_mode(&mut self, yes: bool) -> &mut Self

Enable or disable the Unicode flag (u) by default.

By default this is enabled. It may alternatively be selectively disabled in the regular expression itself via the u flag.

Note that unless “allow invalid UTF-8” is enabled (it’s disabled by default), a regular expression will fail to parse if Unicode mode is disabled and a sub-expression could possibly match invalid UTF-8.

WARNING: Unicode mode can greatly increase the size of the compiled DFA, which can noticeably impact both memory usage and compilation time. This is especially noticeable if your regex contains character classes like \w that are impacted by whether Unicode is enabled or not. If Unicode is not necessary, you are encouraged to disable it.

Source

pub fn backtrack_limit(&mut self, limit: usize) -> &mut Self

Limit for how many times backtracking should be attempted for fancy regexes (where backtracking is used). If this limit is exceeded, execution returns an error with Error::BacktrackLimitExceeded. This is for preventing a regex with catastrophic backtracking to run for too long.

Default is 1_000_000 (1 million).

Source

pub fn delegate_size_limit(&mut self, limit: usize) -> &mut Self

Set the approximate size limit of the compiled regular expression.

This option is forwarded from the wrapped regex crate. Note that depending on the used regex features there may be multiple delegated sub-regexes fed to the regex crate. As such the actual limit is closer to <number of delegated regexes> * delegate_size_limit.

Source

pub fn delegate_dfa_size_limit(&mut self, limit: usize) -> &mut Self

Set the approximate size of the cache used by the DFA.

This option is forwarded from the wrapped regex crate. Note that depending on the used regex features there may be multiple delegated sub-regexes fed to the regex crate. As such the actual limit is closer to <number of delegated regexes> * delegate_dfa_size_limit.

Source

pub fn find_not_empty(&mut self, yes: bool) -> &mut Self

Require that matches are non-empty (i.e. match at least one character).

When this is enabled, any match attempt that would result in a zero-length match is rejected.

Default is false.

N.B. When find_not_empty is set and analysis determines the pattern will only ever produce an empty match, compiling the regex will return CompileError::PatternCanNeverMatch instead of silently constructing a regex that can never return a result. This catches the user error at compile time rather than allowing the combination to execute pointlessly at runtime.

Source

pub fn ignore_numbered_groups_when_named_groups_exist( &mut self, yes: bool, ) -> &mut Self

Treat unnamed capture groups as non-capturing when named groups exist. Prevents accessing capture groups by number from within the pattern (backrefs, subroutine calls) when named groups are present.

Source

pub fn oniguruma_mode(&mut self, yes: bool) -> &mut Self

Attempts to better match Oniguruma’s default behavior

Currently this amounts to changing behavior with:

§Left and right word bounds

fancy-regex follows the default of other regex engines such as the regex crate itself where \< and \> correspond to a left and right word-bound respectively. This differs from Oniguruma’s defaults which treat them as matching the literals < and >. When this option is set using \< and \> in the pattern will match the literals < and > instead of word bounds.

§Repetition/Quantifiers on empty groups

fancy-regex would normally reject patterns like (?:)+ because the + has nothing to target. In Oniguruma mode, the empty repeat is silently dropped at parse time.

§Example

use fancy_regex::{Regex, RegexBuilder};

let haystack = "turbo::<Fish>";
let regex = r"\<\w*\>";

// By default `\<` and `\>` will match the start and end of a word boundary
let word_bounds_regex = Regex::new(regex).unwrap();
let word_bounds = word_bounds_regex.find(haystack).unwrap().unwrap();
assert_eq!(word_bounds.as_str(), "turbo");

// With the option set they instead match the literal `<` and `>` characters
let literals_regex = RegexBuilder::new(regex).oniguruma_mode(true).build().unwrap();
let literals = literals_regex.find(haystack).unwrap().unwrap();
assert_eq!(literals.as_str(), "<Fish>");