Struct regex_automata::meta::Config  
source · pub struct Config { /* private fields */ }Expand description
An object describing the configuration of a Regex.
This configuration only includes options for the
non-syntax behavior of a Regex, and can be applied via the
Builder::configure method. For configuring the syntax options, see
util::syntax::Config.
Example: lower the NFA size limit
In some cases, the default size limit might be too big. The size limit can be lowered, which will prevent large regex patterns from compiling.
use regex_automata::meta::Regex;
let result = Regex::builder()
    .configure(Regex::config().nfa_size_limit(Some(20 * (1<<10))))
    // Not even 20KB is enough to build a single large Unicode class!
    .build(r"\pL");
assert!(result.is_err());
Implementations§
source§impl Config
 
impl Config
sourcepub fn match_kind(self, kind: MatchKind) -> Config
 
pub fn match_kind(self, kind: MatchKind) -> Config
Set the match semantics for a Regex.
The default value is MatchKind::LeftmostFirst.
Example
use regex_automata::{meta::Regex, Match, MatchKind};
// By default, leftmost-first semantics are used, which
// disambiguates matches at the same position by selecting
// the one that corresponds earlier in the pattern.
let re = Regex::new("sam|samwise")?;
assert_eq!(Some(Match::must(0, 0..3)), re.find("samwise"));
// But with 'all' semantics, match priority is ignored
// and all match states are included. When coupled with
// a leftmost search, the search will report the last
// possible match.
let re = Regex::builder()
    .configure(Regex::config().match_kind(MatchKind::All))
    .build("sam|samwise")?;
assert_eq!(Some(Match::must(0, 0..7)), re.find("samwise"));
// Beware that this can lead to skipping matches!
// Usually 'all' is used for anchored reverse searches
// only, or for overlapping searches.
assert_eq!(Some(Match::must(0, 4..11)), re.find("sam samwise"));
sourcepub fn utf8_empty(self, yes: bool) -> Config
 
pub fn utf8_empty(self, yes: bool) -> Config
Toggles whether empty matches are permitted to occur between the code units of a UTF-8 encoded codepoint.
This should generally be enabled when search a &str or anything that
you otherwise know is valid UTF-8. It should be disabled in all other
cases. Namely, if the haystack is not valid UTF-8 and this is enabled,
then behavior is unspecified.
By default, this is enabled.
Example
use regex_automata::{meta::Regex, Match};
let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 3..3),
]);
let re = Regex::builder()
    .configure(Regex::config().utf8_empty(false))
    .build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);
Ok::<(), Box<dyn std::error::Error>>(())sourcepub fn auto_prefilter(self, yes: bool) -> Config
 
pub fn auto_prefilter(self, yes: bool) -> Config
Toggles whether automatic prefilter support is enabled.
If this is disabled and Config::prefilter is not set, then the
meta regex engine will not use any prefilters. This can sometimes
be beneficial in cases where you know (or have measured) that the
prefilter leads to overall worse search performance.
By default, this is enabled.
Example
use regex_automata::{meta::Regex, Match};
let re = Regex::builder()
    .configure(Regex::config().auto_prefilter(false))
    .build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
assert_eq!(Some(Match::must(0, 6..23)), re.find(hay));
Ok::<(), Box<dyn std::error::Error>>(())sourcepub fn prefilter(self, pre: Option<Prefilter>) -> Config
 
pub fn prefilter(self, pre: Option<Prefilter>) -> Config
Overrides and sets the prefilter to use inside a Regex.
This permits one to forcefully set a prefilter in cases where the caller knows better than whatever the automatic prefilter logic is capable of.
By default, this is set to None and an automatic prefilter will be
used if one could be built. (Assuming Config::auto_prefilter is
enabled, which it is by default.)
Example
This example shows how to set your own prefilter. In the case of a
pattern like Bruce \w+, the automatic prefilter is likely to be
constructed in a way that it will look for occurrences of Bruce .
In most cases, this is the best choice. But in some cases, it may be
the case that running memchr on B is the best choice. One can
achieve that behavior by overriding the automatic prefilter logic
and providing a prefilter that just matches B.
use regex_automata::{
    meta::Regex,
    util::prefilter::Prefilter,
    Match, MatchKind,
};
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["B"])
    .expect("a prefilter");
let re = Regex::builder()
    .configure(Regex::config().prefilter(Some(pre)))
    .build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
assert_eq!(Some(Match::must(0, 6..23)), re.find(hay));
Example: incorrect prefilters can lead to incorrect results!
Be warned that setting an incorrect prefilter can lead to missed matches. So if you use this option, ensure your prefilter can never report false negatives. (A false positive is, on the other hand, quite okay and generally unavoidable.)
use regex_automata::{
    meta::Regex,
    util::prefilter::Prefilter,
    Match, MatchKind,
};
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["Z"])
    .expect("a prefilter");
let re = Regex::builder()
    .configure(Regex::config().prefilter(Some(pre)))
    .build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
// Oops! No match found, but there should be one!
assert_eq!(None, re.find(hay));
sourcepub fn nfa_size_limit(self, limit: Option<usize>) -> Config
 
pub fn nfa_size_limit(self, limit: Option<usize>) -> Config
Sets the size limit, in bytes, to enforce on the construction of every NFA build by the meta regex engine.
Setting it to None disables the limit. This is not recommended if
you’re compiling untrusted patterns.
Note that this limit is applied to each NFA built, and if any of them excceed the limit, then construction will fail. This limit does not correspond to the total memory used by all NFAs in the meta regex engine.
This defaults to some reasonable number that permits most reasonable patterns.
Example
use regex_automata::meta::Regex;
let result = Regex::builder()
    .configure(Regex::config().nfa_size_limit(Some(20 * (1<<10))))
    // Not even 20KB is enough to build a single large Unicode class!
    .build(r"\pL");
assert!(result.is_err());
// But notice that building such a regex with the exact same limit
// can succeed depending on other aspects of the configuration. For
// example, a single *forward* NFA will (at time of writing) fit into
// the 20KB limit, but a *reverse* NFA of the same pattern will not.
// So if one configures a meta regex such that a reverse NFA is never
// needed and thus never built, then the 20KB limit will be enough for
// a pattern like \pL!
let result = Regex::builder()
    .configure(Regex::config()
        .nfa_size_limit(Some(20 * (1<<10)))
        // The DFAs are the only thing that (currently) need a reverse
        // NFA. So if both are disabled, the meta regex engine will
        // skip building the reverse NFA. Note that this isn't an API
        // guarantee. A future semver compatible version may introduce
        // new use cases for a reverse NFA.
        .hybrid(false)
        .dfa(false)
    )
    // Not even 20KB is enough to build a single large Unicode class!
    .build(r"\pL");
assert!(result.is_ok());
sourcepub fn onepass_size_limit(self, limit: Option<usize>) -> Config
 
pub fn onepass_size_limit(self, limit: Option<usize>) -> Config
Sets the size limit, in bytes, for the one-pass DFA.
Setting it to None disables the limit. Disabling the limit is
strongly discouraged when compiling untrusted patterns. Even if the
patterns are trusted, it still may not be a good idea, since a one-pass
DFA can use a lot of memory. With that said, as the size of a regex
increases, the likelihood of it being one-pass likely decreases.
This defaults to some reasonable number that permits most reasonable one-pass patterns.
Example
This shows how to set the one-pass DFA size limit. Note that since
a one-pass DFA is an optional component of the meta regex engine,
this size limit only impacts what is built internally and will never
determine whether a Regex itself fails to build.
use regex_automata::meta::Regex;
let result = Regex::builder()
    .configure(Regex::config().onepass_size_limit(Some(2 * (1<<20))))
    .build(r"\pL{5}");
assert!(result.is_ok());sourcepub fn hybrid_cache_capacity(self, limit: usize) -> Config
 
pub fn hybrid_cache_capacity(self, limit: usize) -> Config
Set the cache capacity, in bytes, for the lazy DFA.
The cache capacity of the lazy DFA determines approximately how much heap memory it is allowed to use to store its state transitions. The state transitions are computed at search time, and if the cache fills up it, it is cleared. At this point, any previously generated state transitions are lost and are re-generated if they’re needed again.
This sort of cache filling and clearing works quite well so long as cache clearing happens infrequently. If it happens too often, then the meta regex engine will stop using the lazy DFA and switch over to a different regex engine.
In cases where the cache is cleared too often, it may be possible to give the cache more space and reduce (or eliminate) how often it is cleared. Similarly, sometimes a regex is so big that the lazy DFA isn’t used at all if its cache capacity isn’t big enough.
The capacity set here is a limit on how much memory is used. The actual memory used is only allocated as it’s needed.
Determining the right value for this is a little tricky and will likely
required some profiling. Enabling the logging feature and setting the
log level to trace will also tell you how often the cache is being
cleared.
Example
use regex_automata::meta::Regex;
let result = Regex::builder()
    .configure(Regex::config().hybrid_cache_capacity(20 * (1<<20)))
    .build(r"\pL{5}");
assert!(result.is_ok());sourcepub fn dfa_size_limit(self, limit: Option<usize>) -> Config
 
pub fn dfa_size_limit(self, limit: Option<usize>) -> Config
Sets the size limit, in bytes, for heap memory used for a fully compiled DFA.
NOTE: If you increase this, you’ll likely also need to increase
Config::dfa_state_limit.
In contrast to the lazy DFA, building a full DFA requires computing
all of its state transitions up front. This can be a very expensive
process, and runs in worst case 2^n time and space (where n is
proportional to the size of the regex). However, a full DFA unlocks
some additional optimization opportunities.
Because full DFAs can be so expensive, the default limits for them are
incredibly small. Generally speaking, if your regex is moderately big
or if you’re using Unicode features (\w is Unicode-aware by default
for example), then you can expect that the meta regex engine won’t even
attempt to build a DFA for it.
If this and Config::dfa_state_limit are set to None, then the
meta regex will not use any sort of limits when deciding whether to
build a DFA. This in turn makes construction of a Regex take
worst case exponential time and space. Even short patterns can result
in huge space blow ups. So it is strongly recommended to keep some kind
of limit set!
The default is set to a small number that permits some simple regexes to get compiled into DFAs in reasonable time.
Example
use regex_automata::meta::Regex;
let result = Regex::builder()
    // 100MB is much bigger than the default.
    .configure(Regex::config()
        .dfa_size_limit(Some(100 * (1<<20)))
        // We don't care about size too much here, so just
        // remove the NFA state limit altogether.
        .dfa_state_limit(None))
    .build(r"\pL{5}");
assert!(result.is_ok());sourcepub fn dfa_state_limit(self, limit: Option<usize>) -> Config
 
pub fn dfa_state_limit(self, limit: Option<usize>) -> Config
Sets a limit on the total number of NFA states, beyond which, a full DFA is not attempted to be compiled.
This limit works in concert with Config::dfa_size_limit. Namely,
where as Config::dfa_size_limit is applied by attempting to construct
a DFA, this limit is used to avoid the attempt in the first place. This
is useful to avoid hefty initialization costs associated with building
a DFA for cases where it is obvious the DFA will ultimately be too big.
By default, this is set to a very small number.
Example
use regex_automata::meta::Regex;
let result = Regex::builder()
    .configure(Regex::config()
        // Sometimes the default state limit rejects DFAs even
        // if they would fit in the size limit. Here, we disable
        // the check on the number of NFA states and just rely on
        // the size limit.
        .dfa_state_limit(None))
    .build(r"(?-u)\w{30}");
assert!(result.is_ok());sourcepub fn byte_classes(self, yes: bool) -> Config
 
pub fn byte_classes(self, yes: bool) -> Config
Whether to attempt to shrink the size of the alphabet for the regex pattern or not. When enabled, the alphabet is shrunk into a set of equivalence classes, where every byte in the same equivalence class cannot discriminate between a match or non-match.
WARNING: This is only useful for debugging DFAs. Disabling this does not yield any speed advantages. Indeed, disabling it can result in much higher memory usage. Disabling byte classes is useful for debugging the actual generated transitions because it lets one see the transitions defined on actual bytes instead of the equivalence classes.
This option is enabled by default and should never be disabled unless one is debugging the meta regex engine’s internals.
Example
use regex_automata::{meta::Regex, Match};
let re = Regex::builder()
    .configure(Regex::config().byte_classes(false))
    .build(r"[a-z]+")?;
let hay = "!!quux!!";
assert_eq!(Some(Match::must(0, 2..6)), re.find(hay));
sourcepub fn line_terminator(self, byte: u8) -> Config
 
pub fn line_terminator(self, byte: u8) -> Config
Set the line terminator to be used by the ^ and $ anchors in
multi-line mode.
This option has no effect when CRLF mode is enabled. That is,
regardless of this setting, (?Rm:^) and (?Rm:$) will always treat
\r and \n as line terminators (and will never match between a \r
and a \n).
By default, \n is the line terminator.
Warning: This does not change the behavior of .. To do that,
you’ll need to configure the syntax option
syntax::Config::line_terminator
in addition to this. Otherwise, . will continue to match any
character other than \n.
Example
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
    .syntax(syntax::Config::new().multi_line(true))
    .configure(Regex::config().line_terminator(b'\x00'))
    .build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
sourcepub fn hybrid(self, yes: bool) -> Config
 
pub fn hybrid(self, yes: bool) -> Config
Toggle whether the hybrid NFA/DFA (also known as the “lazy DFA”) should be available for use by the meta regex engine.
Enabling this does not necessarily mean that the lazy DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the hybrid crate feature is enabled, then this is enabled by
default. Otherwise, if the crate feature is disabled, then this is
always disabled, regardless of its setting by the caller.
sourcepub fn dfa(self, yes: bool) -> Config
 
pub fn dfa(self, yes: bool) -> Config
Toggle whether a fully compiled DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the dfa-build crate feature is enabled, then this is enabled by
default. Otherwise, if the crate feature is disabled, then this is
always disabled, regardless of its setting by the caller.
sourcepub fn onepass(self, yes: bool) -> Config
 
pub fn onepass(self, yes: bool) -> Config
Toggle whether a one-pass DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a one-pass DFA will
definitely be used. It just means that it will be available for
use if the meta regex engine thinks it will be useful. (Indeed, a
one-pass DFA can only be used when the regex is one-pass. See the
dfa::onepass module for more details.)
When the dfa-onepass crate feature is enabled, then this is enabled
by default. Otherwise, if the crate feature is disabled, then this is
always disabled, regardless of its setting by the caller.
sourcepub fn backtrack(self, yes: bool) -> Config
 
pub fn backtrack(self, yes: bool) -> Config
Toggle whether a bounded backtracking regex engine should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a bounded backtracker will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the nfa-backtrack crate feature is enabled, then this is enabled
by default. Otherwise, if the crate feature is disabled, then this is
always disabled, regardless of its setting by the caller.
sourcepub fn get_match_kind(&self) -> MatchKind
 
pub fn get_match_kind(&self) -> MatchKind
Returns the match kind on this configuration, as set by
Config::match_kind.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_utf8_empty(&self) -> bool
 
pub fn get_utf8_empty(&self) -> bool
Returns whether empty matches must fall on valid UTF-8 boundaries, as
set by Config::utf8_empty.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_auto_prefilter(&self) -> bool
 
pub fn get_auto_prefilter(&self) -> bool
Returns whether automatic prefilters are enabled, as set by
Config::auto_prefilter.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_prefilter(&self) -> Option<&Prefilter>
 
pub fn get_prefilter(&self) -> Option<&Prefilter>
Returns a manually set prefilter, if one was set by
Config::prefilter.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_nfa_size_limit(&self) -> Option<usize>
 
pub fn get_nfa_size_limit(&self) -> Option<usize>
Returns NFA size limit, as set by Config::nfa_size_limit.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_onepass_size_limit(&self) -> Option<usize>
 
pub fn get_onepass_size_limit(&self) -> Option<usize>
Returns one-pass DFA size limit, as set by
Config::onepass_size_limit.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_hybrid_cache_capacity(&self) -> usize
 
pub fn get_hybrid_cache_capacity(&self) -> usize
Returns hybrid NFA/DFA cache capacity, as set by
Config::hybrid_cache_capacity.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_dfa_size_limit(&self) -> Option<usize>
 
pub fn get_dfa_size_limit(&self) -> Option<usize>
Returns DFA size limit, as set by Config::dfa_size_limit.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_dfa_state_limit(&self) -> Option<usize>
 
pub fn get_dfa_state_limit(&self) -> Option<usize>
Returns DFA size limit in terms of the number of states in the NFA, as
set by Config::dfa_state_limit.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_byte_classes(&self) -> bool
 
pub fn get_byte_classes(&self) -> bool
Returns whether byte classes are enabled, as set by
Config::byte_classes.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_line_terminator(&self) -> u8
 
pub fn get_line_terminator(&self) -> u8
Returns the line terminator for this configuration, as set by
Config::line_terminator.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_hybrid(&self) -> bool
 
pub fn get_hybrid(&self) -> bool
Returns whether the hybrid NFA/DFA regex engine may be used, as set by
Config::hybrid.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_dfa(&self) -> bool
 
pub fn get_dfa(&self) -> bool
Returns whether the DFA regex engine may be used, as set by
Config::dfa.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_onepass(&self) -> bool
 
pub fn get_onepass(&self) -> bool
Returns whether the one-pass DFA regex engine may be used, as set by
Config::onepass.
If it was not explicitly set, then a default value is returned.
sourcepub fn get_backtrack(&self) -> bool
 
pub fn get_backtrack(&self) -> bool
Returns whether the bounded backtracking regex engine may be used, as
set by Config::backtrack.
If it was not explicitly set, then a default value is returned.