pub struct Regex<'a> { /* private fields */ }
regex-automata
and regex-cp
only.Expand description
A compiled regular expression for searching Unicode haystacks.
A Regex
can be used to search haystacks, split haystacks into substrings
or replace substrings in a haystack with a different substring. All
searching is done with an implicit (?s:.)*?
at the beginning and end of
an pattern. To force an expression to match the whole string (or a prefix
or a suffix), you can use anchored search or an anchor like ^
or $
(or \A
and \z
).
§Overview
The most important methods are as follows:
Regex::new
compiles a regex using the default configuration. ABuilder
permits setting a non-default configuration. (For example, case insensitive matching, verbose mode and others.)Regex::is_match
reports whether a match exists in a particular haystack.Regex::find
reports the byte offsets of a match in a haystack, if one exists.Regex::find_iter
returns an iterator over all such matches.Regex::captures
returns aCaptures
, which reports both the byte offsets of a match in a haystack and the byte offsets of each matching capture group from the regex in the haystack.Regex::captures_iter
returns an iterator over all such matches.
§Example
use ib_matcher::regex::cp::Regex;
let re = Regex::new(r"^[0-9]{4}-[0-9]{2}-[0-9]{2}$")?;
assert!(re.is_match("2010-03-14"));
With IbMatcher
’s Chinese pinyin and Japanese romaji matching:
// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
regex::{cp::Regex, Match},
};
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("raki.suta")
.unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));
let re = Regex::builder()
.ib(config.shallow_clone())
.build("pysou.*?(any|every)thing")
.unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));
let config = MatchConfig::builder()
.pinyin(PinyinMatchConfig::default())
.romaji(RomajiMatchConfig::default())
.mix_lang(true)
.build();
let re = Regex::builder()
.ib(config.shallow_clone())
.build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
.unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));
For more examples and the syntax, see crate::regex
.
§Case insensitivity
To enable case insensitivity:
use ib_matcher::{matcher::{PinyinMatchConfig, PlainMatchConfig, MatchConfig}, regex::cp::Regex};
let re = Regex::builder().ib(MatchConfig::default()).build("foo").unwrap();
assert!(re.is_match("FOO"));
// Alternatively, with `case_insensitive()`:
let re = Regex::builder()
.ib(MatchConfig::builder()
.case_insensitive(true)
.pinyin(PinyinMatchConfig::default())
.build())
.build("pyss")
.unwrap();
assert!(re.is_match("PY搜索"));
Note that enabling syntax.case_insensitive
will make ib
(i.e. pinyin and romaji match) doesn’t work at the moment. You should only set MatchConfigBuilder::case_insensitive
(PlainMatchConfigBuilder::case_insensitive
).
If you need case insensitive character classes, you need to write (?i:[a-z])
instead at the moment.
§Custom matching callbacks
Custom matching callbacks can be used to implement ad hoc look-around, backreferences, balancing groups/recursion/subroutines, combining domain-specific parsers, etc.
Basic usage:
// cargo add ib-matcher --features regex,regex-callback
use ib_matcher::regex::cp::Regex;
let re = Regex::builder()
.callback("ascii", |input, at, push| {
let haystack = &input.haystack()[at..];
if haystack.len() > 0 && haystack[0].is_ascii() {
push(1);
}
})
.build(r"(ascii)+\d(ascii)+")
.unwrap();
let hay = "that4U this4me";
assert_eq!(&hay[re.find(hay).unwrap().span()], " this4me");
§Look-around
use ib_matcher::regex::cp::Regex;
let re = Regex::builder()
.callback("lookahead_is_ascii", |input, at, push| {
let haystack = &input.haystack()[at..];
if haystack.len() > 0 && haystack[0].is_ascii() {
push(0);
}
})
.build(r"[\x00-\x7f]+?\d(lookahead_is_ascii)")
.unwrap();
let hay = "that4U,this4me1plz";
assert_eq!(
re.find_iter(hay).map(|m| &hay[m.span()]).collect::<Vec<_>>(),
vec![",this4", "me1"]
);
§Balancing groups
use std::{cell::RefCell, rc::Rc};
use ib_matcher::regex::cp::Regex;
let count = Rc::new(RefCell::new(0));
let re = Regex::builder()
.callback("open_quote", {
let count = count.clone();
move |input, at, push| {
if at < 2 || input.haystack()[at - 2] != b'\\' {
let mut count = count.borrow_mut();
*count += 1;
push(0);
}
}
})
.callback("close_quote", move |input, at, push| {
if at < 2 || input.haystack()[at - 2] != b'\\' {
let mut count = count.borrow_mut();
if *count > 0 {
push(0);
}
*count -= 1;
}
})
.build(r"'(open_quote).*?'(close_quote)")
.unwrap();
let hay = r"'one' 'two\'three' 'four'";
assert_eq!(
re.find_iter(hay).map(|m| &hay[m.span()]).collect::<Vec<_>>(),
vec!["'one'", r"'two\'three'", "'four'"]
);
(In this simple example, just using '([^'\\]+?|\\')*'
is actually enough, but there are more complex cases where balancing groups (or recursion/subroutines) are necessary.)
§Synchronization and cloning
In order to make the Regex
API convenient, most of the routines hide
the fact that a Cache
is needed at all. To achieve this, a memory
pool is used internally to retrieve Cache
values in a thread safe way that also permits reuse. This in turn implies
that every such search call requires some form of synchronization. Usually
this synchronization is fast enough to not notice, but in some cases, it
can be a bottleneck. This typically occurs when all of the following are
true:
- The same
Regex
is shared across multiple threads simultaneously, usually via autil::lazy::Lazy
or something similar from theonce_cell
orlazy_static
crates. - The primary unit of work in each thread is a regex search.
- Searches are run on very short haystacks.
This particular case can lead to high contention on the pool used by a
Regex
internally, which can in turn increase latency to a noticeable
effect. This cost can be mitigated in one of the following ways:
- Use a distinct copy of a
Regex
in each thread, usually by cloning it. Cloning aRegex
does not do a deep copy of its read-only component. But it does lead to eachRegex
having its own memory pool, which in turn eliminates the problem of contention. In general, this technique should not result in any additional memory usage when compared to sharing the sameRegex
across multiple threads simultaneously. - Use lower level APIs, like [
Regex::try_find
], which permit passing aCache
explicitly. In this case, it is up to you to determine how best to provide aCache
. For example, you might put aCache
in thread-local storage if your use case allows for it.
Overall, this is an issue that happens rarely in practice, but it can happen.
§Warning: spin-locks may be used in alloc-only mode
When this crate is built without the std
feature and the high level APIs
on a Regex
are used, then a spin-lock will be used to synchronize access
to an internal pool of Cache
values. This may be undesirable because
a spin-lock is effectively impossible to implement correctly in user
space. That is, more concretely, the spin-lock could
result in a deadlock.
If one wants to avoid the use of spin-locks when the std
feature is
disabled, then you must use APIs that accept a Cache
value explicitly.
For example, [Regex::try_find
].
Implementations§
Source§impl<'a> Regex<'a>
impl<'a> Regex<'a>
pub fn new(pattern: &str) -> Result<Self, BuildError>
pub fn config() -> Config
Sourcepub fn builder<'f1>() -> Builder<'a, 'f1>
pub fn builder<'f1>() -> Builder<'a, 'f1>
Return a builder for configuring the construction of a Regex
.
This is a convenience routine to avoid needing to import the
Builder
type in common cases.
§Example: change the line terminator
This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:
use ib_matcher::regex::{cp::Regex, util::{syntax, look::LookMatcher}, Match};
let mut lookm = LookMatcher::new();
lookm.set_line_terminator(b'\x00');
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().look_matcher(lookm))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
Source§impl<'a> Regex<'a>
High level convenience routines for using a regex to search a haystack.
impl<'a> Regex<'a>
High level convenience routines for using a regex to search a haystack.
Sourcepub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. (Consider how this might make
a difference given the regex a+
on the haystack aaaaaaaaaaaaaaa
.
This routine may stop after it sees the first a
, but routines like
find
need to continue searching because +
is greedy by default.)
§Example
use ib_matcher::regex::cp::Regex;
let re = Regex::new("foo[0-9]+bar")?;
assert!(re.is_match("foo12345bar"));
assert!(!re.is_match("foobar"));
§Example: consistency with search APIs
is_match
is guaranteed to return true
whenever find
returns a
match. This includes searches that are executed entirely within a
codepoint:
use ib_matcher::regex::{cp::Regex, Input};
let re = Regex::new("a*")?;
// This doesn't match because the default configuration bans empty
// matches from splitting a codepoint.
assert!(!re.is_match(Input::new("☃").span(1..2)));
assert_eq!(None, re.find(Input::new("☃").span(1..2)));
Notice that when UTF-8 mode is disabled, then the above reports a match because the restriction against zero-width matches that split a codepoint has been lifted:
use ib_matcher::regex::{cp::Regex, Input, Match};
let re = Regex::builder()
.configure(Regex::config().utf8(false))
.build("a*")?;
assert!(re.is_match(Input::new("☃").span(1..2)));
assert_eq!(
Some(Match::must(0, 1..1)),
re.find(Input::new("☃").span(1..2)),
);
A similar idea applies when using line anchors with CRLF mode enabled,
which prevents them from matching between a \r
and a \n
.
use ib_matcher::regex::{cp::Regex, Input, Match};
let re = Regex::new(r"(?Rm:$)")?;
assert!(!re.is_match(Input::new("\r\n").span(1..1)));
// A regular line anchor, which only considers \n as a
// line terminator, will match.
let re = Regex::new(r"(?m:$)")?;
assert!(re.is_match(Input::new("\r\n").span(1..1)));
Sourcepub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
Executes a leftmost search and returns the first match that is found, if one exists.
§Example
use ib_matcher::regex::{cp::Regex, Match};
let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(Match::must(0, 0..8)), re.find("foo12345"));
Sourcepub fn captures<'h, I: Into<Input<'h>>>(
&self,
input: I,
caps: &mut Captures,
) -> Result<(), MatchError>
pub fn captures<'h, I: Into<Input<'h>>>( &self, input: I, caps: &mut Captures, ) -> Result<(), MatchError>
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
§Example
use ib_matcher::regex::{cp::Regex, Span};
let re = Regex::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let mut caps = re.create_captures();
re.captures("2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
Sourcepub fn find_iter<'h, I: Into<Input<'h>>>(
&'h self,
input: I,
) -> impl Iterator<Item = Match> + 'h
pub fn find_iter<'h, I: Into<Input<'h>>>( &'h self, input: I, ) -> impl Iterator<Item = Match> + 'h
Returns an iterator over all non-overlapping leftmost matches in the given haystack. If no match exists, then the iterator yields no elements.
§Example
use ib_matcher::regex::{cp::Regex, Match};
let re = Regex::new("foo[0-9]+")?;
let haystack = "foo1 foo12 foo123";
let matches: Vec<Match> = re.find_iter(haystack).collect();
assert_eq!(matches, vec![
Match::must(0, 0..4),
Match::must(0, 5..10),
Match::must(0, 11..17),
]);
Sourcepub fn captures_iter<'h, I: Into<Input<'h>>>(
&'h self,
input: I,
) -> impl Iterator<Item = Captures> + 'h
pub fn captures_iter<'h, I: Into<Input<'h>>>( &'h self, input: I, ) -> impl Iterator<Item = Captures> + 'h
Returns an iterator over all non-overlapping Captures
values. If no
match exists, then the iterator yields no elements.
This yields the same matches as Regex::find_iter
, but it includes
the spans of all capturing groups that participate in each match.
Tip: See util::iter::Searcher
for
how to correctly iterate over all matches in a haystack while avoiding
the creation of a new Captures
value for every match. (Which you are
forced to do with an Iterator
.)
§Example
use ib_matcher::regex::{cp::Regex, Span};
let re = Regex::new("foo(?P<numbers>[0-9]+)")?;
let haystack = "foo1 foo12 foo123";
let matches: Vec<Span> = re
.captures_iter(haystack)
// The unwrap is OK since 'numbers' matches if the pattern matches.
.map(|caps| caps.get_group_by_name("numbers").unwrap())
.collect();
assert_eq!(matches, vec![
Span::from(3..4),
Span::from(8..10),
Span::from(14..17),
]);
Methods from Deref<Target = BoundedBacktracker>§
Sourcepub fn create_cache(&self) -> Cache
Available on crate feature regex-nfa
only.
pub fn create_cache(&self) -> Cache
regex-nfa
only.Create a new cache for this regex.
The cache returned should only be used for searches for this
regex. If you want to reuse the cache for another regex, then you
must call Cache::reset
with that regex (or, equivalently,
BoundedBacktracker::reset_cache
).
Sourcepub fn create_captures(&self) -> Captures
Available on crate feature regex-nfa
only.
pub fn create_captures(&self) -> Captures
regex-nfa
only.Create a new empty set of capturing groups that is guaranteed to be
valid for the search APIs on this BoundedBacktracker
.
A Captures
value created for a specific BoundedBacktracker
cannot
be used with any other BoundedBacktracker
.
This is a convenience function for Captures::all
. See the
Captures
documentation for an explanation of its alternative
constructors that permit the BoundedBacktracker
to do less work
during a search, and thus might make it faster.
Sourcepub fn reset_cache(&self, cache: &mut Cache)
Available on crate feature regex-nfa
only.
pub fn reset_cache(&self, cache: &mut Cache)
regex-nfa
only.Reset the given cache such that it can be used for searching with the
this BoundedBacktracker
(and only this BoundedBacktracker
).
A cache reset permits reusing memory already allocated in this cache
with a different BoundedBacktracker
.
§Example
This shows how to re-purpose a cache for use with a different
BoundedBacktracker
.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re1 = BoundedBacktracker::new(r"\w")?;
let re2 = BoundedBacktracker::new(r"\W")?;
let mut cache = re1.create_cache();
assert_eq!(
Some(Ok(Match::must(0, 0..2))),
re1.try_find_iter(&mut cache, "Δ").next(),
);
// Using 'cache' with re2 is not allowed. It may result in panics or
// incorrect results. In order to re-purpose the cache, we must reset
// it with the BoundedBacktracker we'd like to use it with.
//
// Similarly, after this reset, using the cache with 're1' is also not
// allowed.
cache.reset(&re2);
assert_eq!(
Some(Ok(Match::must(0, 0..3))),
re2.try_find_iter(&mut cache, "☃").next(),
);
Sourcepub fn pattern_len(&self) -> usize
Available on crate feature regex-nfa
only.
pub fn pattern_len(&self) -> usize
regex-nfa
only.Returns the total number of patterns compiled into this
BoundedBacktracker
.
In the case of a BoundedBacktracker
that contains no patterns, this
returns 0
.
§Example
This example shows the pattern length for a BoundedBacktracker
that
never matches:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::never_match()?;
assert_eq!(re.pattern_len(), 0);
And another example for a BoundedBacktracker
that matches at every
position:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::always_match()?;
assert_eq!(re.pattern_len(), 1);
And finally, a BoundedBacktracker
that was constructed from multiple
patterns:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::new_many(&["[0-9]+", "[a-z]+", "[A-Z]+"])?;
assert_eq!(re.pattern_len(), 3);
Sourcepub fn get_config(&self) -> &Config
Available on crate feature regex-nfa
only.
pub fn get_config(&self) -> &Config
regex-nfa
only.Return the config for this BoundedBacktracker
.
Sourcepub fn get_nfa(&self) -> &NFA
Available on crate feature regex-nfa
only.
pub fn get_nfa(&self) -> &NFA
regex-nfa
only.Returns a reference to the underlying NFA.
Sourcepub fn max_haystack_len(&self) -> usize
Available on crate feature regex-nfa
only.
pub fn max_haystack_len(&self) -> usize
regex-nfa
only.Returns the maximum haystack length supported by this backtracker.
This routine is a function of both Config::visited_capacity
and the
internal size of the backtracker’s NFA.
§Example
This example shows how the maximum haystack length can vary depending on the size of the regex itself. Note though that the specific maximum values here are not an API guarantee. The default visited capacity is subject to change and not covered by semver.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, MatchError,
};
// If you're only using ASCII, you get a big budget.
let re = BoundedBacktracker::new(r"(?-u)\w+")?;
let mut cache = re.create_cache();
assert_eq!(re.max_haystack_len(), 299_592);
// Things work up to the max.
let mut haystack = "a".repeat(299_592);
let expected = Some(Ok(Match::must(0, 0..299_592)));
assert_eq!(expected, re.try_find_iter(&mut cache, &haystack).next());
// But you'll get an error if you provide a haystack that's too big.
// Notice that we use the 'try_find_iter' routine instead, which
// yields Result<Match, MatchError> instead of Match.
haystack.push('a');
let expected = Some(Err(MatchError::haystack_too_long(299_593)));
assert_eq!(expected, re.try_find_iter(&mut cache, &haystack).next());
// Unicode inflates the size of the underlying NFA quite a bit, and
// thus means that the backtracker can only handle smaller haystacks,
// assuming that the visited capacity remains unchanged.
let re = BoundedBacktracker::new(r"\w+")?;
assert!(re.max_haystack_len() <= 7_000);
// But we can increase the visited capacity to handle bigger haystacks!
let re = BoundedBacktracker::builder()
.configure(BoundedBacktracker::config().visited_capacity(1<<20))
.build(r"\w+")?;
assert!(re.max_haystack_len() >= 25_000);
assert!(re.max_haystack_len() <= 28_000);
Sourcepub fn try_is_match<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
) -> Result<bool, MatchError>
Available on crate feature regex-nfa
only.
pub fn try_is_match<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> Result<bool, MatchError>
regex-nfa
only.Returns true if and only if this regex matches the given haystack.
In the case of a backtracking regex engine, and unlike most other regex engines in this crate, short circuiting isn’t practical. However, this routine may still be faster because it instructs backtracking to not keep track of any capturing groups.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();
assert!(re.try_is_match(&mut cache, "foo12345bar")?);
assert!(!re.try_is_match(&mut cache, "foobar")?);
§Example: consistency with search APIs
is_match
is guaranteed to return true
whenever find
returns a
match. This includes searches that are executed entirely within a
codepoint:
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Input,
};
let re = BoundedBacktracker::new("a*")?;
let mut cache = re.create_cache();
assert!(!re.try_is_match(&mut cache, Input::new("☃").span(1..2))?);
Notice that when UTF-8 mode is disabled, then the above reports a match because the restriction against zero-width matches that split a codepoint has been lifted:
use regex_automata::{
nfa::thompson::{backtrack::BoundedBacktracker, NFA},
Input,
};
let re = BoundedBacktracker::builder()
.thompson(NFA::config().utf8(false))
.build("a*")?;
let mut cache = re.create_cache();
assert!(re.try_is_match(&mut cache, Input::new("☃").span(1..2))?);
Sourcepub fn try_find<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
) -> Result<Option<Match>, MatchError>
Available on crate feature regex-nfa
only.
pub fn try_find<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> Result<Option<Match>, MatchError>
regex-nfa
only.Executes a leftmost forward search and returns a Match
if one exists.
This routine only includes the overall match span. To get
access to the individual spans of each capturing group, use
BoundedBacktracker::try_captures
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re = BoundedBacktracker::new("foo[0-9]+")?;
let mut cache = re.create_cache();
let expected = Match::must(0, 0..8);
assert_eq!(Some(expected), re.try_find(&mut cache, "foo12345")?);
Sourcepub fn try_captures<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
caps: &mut Captures,
) -> Result<(), MatchError>
Available on crate feature regex-nfa
only.
pub fn try_captures<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, caps: &mut Captures, ) -> Result<(), MatchError>
regex-nfa
only.Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Span,
};
let re = BoundedBacktracker::new(
r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.try_captures(&mut cache, "2010-03-14", &mut caps)?;
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
Sourcepub fn try_find_iter<'r, 'c, 'h, I: Into<Input<'h>>>(
&'r self,
cache: &'c mut Cache,
input: I,
) -> TryFindMatches<'r, 'c, 'h> ⓘ
Available on crate feature regex-nfa
only.
pub fn try_find_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I, ) -> TryFindMatches<'r, 'c, 'h> ⓘ
regex-nfa
only.Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.
If the regex engine returns an error at any point, then the iterator will yield that error.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, MatchError,
};
let re = BoundedBacktracker::new("foo[0-9]+")?;
let mut cache = re.create_cache();
let text = "foo1 foo12 foo123";
let result: Result<Vec<Match>, MatchError> = re
.try_find_iter(&mut cache, text)
.collect();
let matches = result?;
assert_eq!(matches, vec![
Match::must(0, 0..4),
Match::must(0, 5..10),
Match::must(0, 11..17),
]);
Sourcepub fn try_captures_iter<'r, 'c, 'h, I: Into<Input<'h>>>(
&'r self,
cache: &'c mut Cache,
input: I,
) -> TryCapturesMatches<'r, 'c, 'h> ⓘ
Available on crate feature regex-nfa
only.
pub fn try_captures_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I, ) -> TryCapturesMatches<'r, 'c, 'h> ⓘ
regex-nfa
only.Returns an iterator over all non-overlapping Captures
values. If no
match exists, then the iterator yields no elements.
This yields the same matches as BoundedBacktracker::try_find_iter
,
but it includes the spans of all capturing groups that participate in
each match.
If the regex engine returns an error at any point, then the iterator will yield that error.
Tip: See util::iter::Searcher
for
how to correctly iterate over all matches in a haystack while avoiding
the creation of a new Captures
value for every match. (Which you are
forced to do with an Iterator
.)
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Span,
};
let re = BoundedBacktracker::new("foo(?P<numbers>[0-9]+)")?;
let mut cache = re.create_cache();
let text = "foo1 foo12 foo123";
let mut spans = vec![];
for result in re.try_captures_iter(&mut cache, text) {
let caps = result?;
// The unwrap is OK since 'numbers' matches if the pattern matches.
spans.push(caps.get_group_by_name("numbers").unwrap());
}
assert_eq!(spans, vec![
Span::from(3..4),
Span::from(8..10),
Span::from(14..17),
]);
Sourcepub fn try_search(
&self,
cache: &mut Cache,
input: &Input<'_>,
caps: &mut Captures,
) -> Result<(), MatchError>
Available on crate feature regex-nfa
only.
pub fn try_search( &self, cache: &mut Cache, input: &Input<'_>, caps: &mut Captures, ) -> Result<(), MatchError>
regex-nfa
only.Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
This is like BoundedBacktracker::try_captures
, but it accepts a
concrete &Input
instead of an Into<Input>
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example: specific pattern search
This example shows how to build a multi bounded backtracker that permits searching for specific patterns.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Anchored, Input, Match, PatternID,
};
let re = BoundedBacktracker::new_many(&[
"[a-z0-9]{6}",
"[a-z][a-z0-9]{5}",
])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123";
// Since we are using the default leftmost-first match and both
// patterns match at the same starting position, only the first pattern
// will be returned in this case when doing a search for any of the
// patterns.
let expected = Some(Match::must(0, 0..6));
re.try_search(&mut cache, &Input::new(haystack), &mut caps)?;
assert_eq!(expected, caps.get_match());
// But if we want to check whether some other pattern matches, then we
// can provide its pattern ID.
let expected = Some(Match::must(1, 0..6));
let input = Input::new(haystack)
.anchored(Anchored::Pattern(PatternID::must(1)));
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(expected, caps.get_match());
§Example: specifying the bounds of a search
This example shows how providing the bounds of a search can produce different results than simply sub-slicing the haystack.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, Input,
};
let re = BoundedBacktracker::new(r"\b[0-9]{3}\b")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123bar";
// Since we sub-slice the haystack, the search doesn't know about
// the larger context and assumes that `123` is surrounded by word
// boundaries. And of course, the match position is reported relative
// to the sub-slice as well, which means we get `0..3` instead of
// `3..6`.
let expected = Some(Match::must(0, 0..3));
re.try_search(&mut cache, &Input::new(&haystack[3..6]), &mut caps)?;
assert_eq!(expected, caps.get_match());
// But if we provide the bounds of the search within the context of the
// entire haystack, then the search can take the surrounding context
// into account. (And if we did find a match, it would be reported
// as a valid offset into `haystack` instead of its sub-slice.)
let expected = None;
re.try_search(
&mut cache, &Input::new(haystack).range(3..6), &mut caps,
)?;
assert_eq!(expected, caps.get_match());
Sourcepub fn try_search_slots(
&self,
cache: &mut Cache,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Result<Option<PatternID>, MatchError>
Available on crate feature regex-nfa
only.
pub fn try_search_slots( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Result<Option<PatternID>, MatchError>
regex-nfa
only.Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided slots
, and
returns the matching pattern ID. The contents of the slots for patterns
other than the matching pattern are unspecified. If no match was found,
then None
is returned and the contents of all slots
is unspecified.
This is like BoundedBacktracker::try_search
, but it accepts a raw
slots slice instead of a Captures
value. This is useful in contexts
where you don’t want or need to allocate a Captures
.
It is legal to pass any number of slots to this routine. If the regex engine would otherwise write a slot offset that doesn’t fit in the provided slice, then it is simply skipped. In general though, there are usually three slice lengths you might want to use:
- An empty slice, if you only care about which pattern matched.
- A slice with
pattern_len() * 2
slots, if you only care about the overall match spans for each matching pattern. - A slice with
slot_len()
slots, which permits recording match offsets for every capturing group in every pattern.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
This example shows how to find the overall match offsets in a
multi-pattern search without allocating a Captures
value. Indeed, we
can put our slots right on the stack.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
PatternID, Input,
};
let re = BoundedBacktracker::new_many(&[
r"\pL+",
r"\d+",
])?;
let mut cache = re.create_cache();
let input = Input::new("!@#123");
// We only care about the overall match offsets here, so we just
// allocate two slots for each pattern. Each slot records the start
// and end of the match.
let mut slots = [None; 4];
let pid = re.try_search_slots(&mut cache, &input, &mut slots)?;
assert_eq!(Some(PatternID::must(1)), pid);
// The overall match offsets are always at 'pid * 2' and 'pid * 2 + 1'.
// See 'GroupInfo' for more details on the mapping between groups and
// slot indices.
let slot_start = pid.unwrap().as_usize() * 2;
let slot_end = slot_start + 1;
assert_eq!(Some(3), slots[slot_start].map(|s| s.get()));
assert_eq!(Some(6), slots[slot_end].map(|s| s.get()));
Trait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for Regex<'a>
impl<'a> !RefUnwindSafe for Regex<'a>
impl<'a> !Send for Regex<'a>
impl<'a> !Sync for Regex<'a>
impl<'a> Unpin for Regex<'a>
impl<'a> !UnwindSafe for Regex<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more