Struct regex_cursor::Input
source · pub struct Input<C: Cursor> { /* private fields */ }
Implementations§
source§impl<C: Cursor> Input<C>
impl<C: Cursor> Input<C>
sourcepub fn new<T: IntoCursor<Cursor = C>>(cursor: T) -> Self
pub fn new<T: IntoCursor<Cursor = C>>(cursor: T) -> Self
Create a new search configuration for the given cursor.
sourcepub fn chunk(&self) -> &[u8] ⓘ
pub fn chunk(&self) -> &[u8] ⓘ
Return a borrow of the current underlying chunk as a slice of bytes.
§Example
use ropey_regex::Input;
let input = Input::new("foobar".into());
assert_eq!(b"foobar", input.chunk());
sourcepub fn chunk_offset(&self) -> usize
pub fn chunk_offset(&self) -> usize
Return a borrow of the current underlying chunk as a slice of bytes.
§Example
use ropey_regex::Input;
let input = Input::new("foobar".into());
assert_eq!(b"foobar", input.chunk());
sourcepub fn start(&self) -> usize
pub fn start(&self) -> usize
Return the start position of this search.
This is a convenience routine for search.get_span().start()
.
When Input::is_done
is false
, this is guaranteed to return
an offset that is less than or equal to Input::end
. Otherwise,
the offset is one greater than Input::end
.
§Example
use regex_automata::Input;
let input = Input::new("foobar");
assert_eq!(0, input.start());
let input = Input::new("foobar").span(2..4);
assert_eq!(2, input.start());
pub fn clear_look_behind(&mut self)
sourcepub fn end(&self) -> usize
pub fn end(&self) -> usize
Return the end position of this search.
This is a convenience routine for search.get_span().end()
.
This is guaranteed to return an offset that is a valid exclusive end bound for this input’s haystack.
§Example
use regex_automata::Input;
let input = Input::new("foobar");
assert_eq!(6, input.end());
let input = Input::new("foobar").span(2..4);
assert_eq!(4, input.end());
pub fn get_chunk_end(&self) -> usize
sourcepub fn get_span(&self) -> Span
pub fn get_span(&self) -> Span
Return the span for this search configuration.
If one was not explicitly set, then the span corresponds to the entire range of the haystack.
When Input::is_done
is false
, the span returned is guaranteed
to correspond to valid bounds for this input’s haystack.
§Example
use regex_automata::{Input, Span};
let input = Input::new("foobar");
assert_eq!(Span { start: 0, end: 6 }, input.get_span());
pub fn look_around(&mut self) -> (&[u8], usize)
sourcepub fn anchored(&mut self, mode: Anchored) -> &mut Self
pub fn anchored(&mut self, mode: Anchored) -> &mut Self
Sets the anchor mode of a search.
When a search is anchored (so that’s Anchored::Yes
or
Anchored::Pattern
), a match must begin at the start of a search.
When a search is not anchored (that’s Anchored::No
), regex engines
will behave as if the pattern started with a (?:s-u.)*?
. This prefix
permits a match to appear anywhere.
By default, the anchored mode is Anchored::No
.
WARNING: this is subtly different than using a ^
at the start of
your regex. A ^
forces a regex to match exclusively at the start of
a chunk, regardless of where you begin your search. In contrast,
anchoring a search will allow your regex to match anywhere in your
chunk, but the match must start at the beginning of a search.
For example, consider the chunk aba
and the following searches:
- The regex
^a
is compiled withAnchored::No
and searchesaba
starting at position2
. Since^
requires the match to start at the beginning of the chunk and2 > 0
, no match is found. - The regex
a
is compiled withAnchored::Yes
and searchesaba
starting at position2
. This reports a match at[2, 3]
since the match starts where the search started. Since there is no^
, there is no requirement for the match to start at the beginning of the chunk. - The regex
a
is compiled withAnchored::Yes
and searchesaba
starting at position1
. Sinceb
corresponds to position1
and since the search is anchored, it finds no match. While the regex matches at other positions, configuring the search to be anchored requires that it only report a match that begins at the same offset as the beginning of the search. - The regex
a
is compiled withAnchored::No
and searchesaba
startting at position1
. Since the search is not anchored and the regex does not start with^
, the search executes as if there is a(?s:.)*?
prefix that permits it to match anywhere. Thus, it reports a match at[2, 3]
.
Note that the Anchored::Pattern
mode is like Anchored::Yes
,
except it only reports matches for a particular pattern.
§Example
This demonstrates the differences between an anchored search and
a pattern that begins with ^
(as described in the above warning
message).
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
Anchored, Match, Input,
};
let chunk = "aba";
let re = PikeVM::new(r"^a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(2..3).anchored(Anchored::No);
re.search(&mut cache, &input, &mut caps);
// No match is found because 2 is not the beginning of the chunk,
// which is what ^ requires.
assert_eq!(None, caps.get_match());
let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(2..3).anchored(Anchored::Yes);
re.search(&mut cache, &input, &mut caps);
// An anchored search can still match anywhere in the chunk, it just
// must begin at the start of the search which is '2' in this case.
assert_eq!(Some(Match::must(0, 2..3)), caps.get_match());
let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(1..3).anchored(Anchored::Yes);
re.search(&mut cache, &input, &mut caps);
// No match is found since we start searching at offset 1 which
// corresponds to 'b'. Since there is no '(?s:.)*?' prefix, no match
// is found.
assert_eq!(None, caps.get_match());
let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(1..3).anchored(Anchored::No);
re.search(&mut cache, &input, &mut caps);
// Since anchored=no, an implicit '(?s:.)*?' prefix was added to the
// pattern. Even though the search starts at 'b', the 'match anything'
// prefix allows the search to match 'a'.
let expected = Some(Match::must(0, 2..3));
assert_eq!(expected, caps.get_match());
sourcepub fn earliest(&mut self, yes: bool) -> &mut Self
pub fn earliest(&mut self, yes: bool) -> &mut Self
Whether to execute an “earliest” search or not.
When running a non-overlapping search, an “earliest” search will return
the match location as early as possible. For example, given a pattern
of foo[0-9]+
and a chunk of foo12345
, a normal leftmost search
will return foo12345
as a match. But an “earliest” search for regex
engines that support “earliest” semantics will return foo1
as a
match, since as soon as the first digit following foo
is seen, it is
known to have found a match.
Note that “earliest” semantics generally depend on the regex engine. Different regex engines may determine there is a match at different points. So there is no guarantee that “earliest” matches will always return the same offsets for all regex engines. The “earliest” notion is really about when the particular regex engine determines there is a match rather than a consistent semantic unto itself. This is often useful for implementing “did a match occur or not” predicates, but sometimes the offset is useful as well.
This is disabled by default.
§Example
This example shows the difference between “earliest” searching and normal searching.
use regex_automata::{nfa::thompson::pikevm::PikeVM, Match, Input};
let re = PikeVM::new(r"foo[0-9]+")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
// A normal search implements greediness like you expect.
let input = Input::new("foo12345");
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..8)), caps.get_match());
// When 'earliest' is enabled and the regex engine supports
// it, the search will bail once it knows a match has been
// found.
let input = Input::new("foo12345").earliest(true);
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..4)), caps.get_match());
sourcepub fn set_anchored(&mut self, mode: Anchored)
pub fn set_anchored(&mut self, mode: Anchored)
Set the anchor mode of a search.
This is like Input::anchored
, except it mutates the search
configuration in place.
§Example
use regex_automata::{Anchored, Input, PatternID};
let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());
let pid = PatternID::must(5);
input.set_anchored(Anchored::Pattern(pid));
assert_eq!(Anchored::Pattern(pid), input.get_anchored());
sourcepub fn set_earliest(&mut self, yes: bool)
pub fn set_earliest(&mut self, yes: bool)
Set whether the search should execute in “earliest” mode or not.
This is like Input::earliest
, except it mutates the search
configuration in place.
§Example
use regex_automata::Input;
let mut input = Input::new("foobar");
assert!(!input.get_earliest());
input.set_earliest(true);
assert!(input.get_earliest());
sourcepub fn span<S: Into<Span>>(&mut self, span: S) -> &mut Input<C>
pub fn span<S: Into<Span>>(&mut self, span: S) -> &mut Input<C>
Set the span for this search.
This routine does not panic if the span given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.
This routine is generic over how a span is provided. While
a Span
may be given directly, one may also provide a
std::ops::Range<usize>
. To provide anything supported by range
syntax, use the Input::range
method.
The default span is the entire haystack.
Note that Input::range
overrides this method and vice versa.
§Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
§Example
This example shows how the span of the search can impact whether a match is reported or not. This is particularly relevant for look-around operators, which might take things outside of the span into account when determining whether they match.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
Match, Input,
};
// Look for 'at', but as a distinct word.
let re = PikeVM::new(r"\bat\b")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
// Our haystack contains 'at', but not as a distinct word.
let haystack = "batter";
// A standard search finds nothing, as expected.
let input = Input::new(haystack);
re.search(&mut cache, &input, &mut caps);
assert_eq!(None, caps.get_match());
// But if we wanted to search starting at position '1', we might
// slice the haystack. If we do this, it's impossible for the \b
// anchors to take the surrounding context into account! And thus,
// a match is produced.
let input = Input::new(&haystack[1..3]);
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..2)), caps.get_match());
// But if we specify the span of the search instead of slicing the
// haystack, then the regex engine can "see" outside of the span
// and resolve the anchors correctly.
let input = Input::new(haystack).span(1..3);
re.search(&mut cache, &input, &mut caps);
assert_eq!(None, caps.get_match());
This may seem a little ham-fisted, but this scenario tends to come up if some other regex engine found the match span and now you need to re-process that span to look for capturing groups. (e.g., Run a faster DFA first, find a match, then run the PikeVM on just the match span to resolve capturing groups.) In order to implement that sort of logic correctly, you need to set the span on the search instead of slicing the haystack directly.
The other advantage of using this routine to specify the bounds of the
search is that the match offsets are still reported in terms of the
original haystack. For example, the second search in the example above
reported a match at position 0
, even though at
starts at offset
1
because we sliced the haystack.
sourcepub fn set_start(&mut self, start: usize)
pub fn set_start(&mut self, start: usize)
Set the starting offset for the span for this search configuration.
This is a convenience routine for only mutating the start of a span without having to set the entire span.
§Panics
This panics if the span resulting from the new start position does not correspond to valid bounds in the haystack or the termination of a search.
sourcepub fn set_end(&mut self, end: usize)
pub fn set_end(&mut self, end: usize)
Set the ending offset for the span for this search configuration.
This is a convenience routine for only mutating the end of a span without having to set the entire span.
§Panics
This panics if the span resulting from the new end position does not correspond to valid bounds in the haystack or the termination of a search.
sourcepub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<C>
pub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<C>
Like Input::span
, but accepts any range instead.
This routine does not panic if the range given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.
The default range is the entire haystack.
Note that Input::span
overrides this method and vice versa.
§Panics
This routine will panic if the given range could not be converted
to a valid [Range
]. For example, this would panic when given
0..=usize::MAX
since it cannot be represented using a half-open
interval in terms of usize
.
This also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.
§Example
use regex_automata::Input;
let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
let input = Input::new("foobar").range(2..=4);
assert_eq!(2..5, input.get_range());
sourcepub fn set_span<S: Into<Span>>(&mut self, span: S)
pub fn set_span<S: Into<Span>>(&mut self, span: S)
Set the span for this search configuration.
This is like the Input::span
method, except this mutates the
span in place.
This routine is generic over how a span is provided. While
a Span
may be given directly, one may also provide a
std::ops::Range<usize>
.
§Panics
This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
§Example
use regex_automata::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_span(2..4);
assert_eq!(2..4, input.get_range());
sourcepub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)
pub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)
Set the span for this search configuration given any range.
This is like the Input::range
method, except this mutates the
span in place.
This routine does not panic if the range given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.
§Panics
This routine will panic if the given range could not be converted
to a valid [Range
]. For example, this would panic when given
0..=usize::MAX
since it cannot be represented using a half-open
interval in terms of usize
.
This also panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.
§Example
use regex_automata::Input;
let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_range(2..=4);
assert_eq!(2..5, input.get_range());
sourcepub fn get_anchored(&self) -> Anchored
pub fn get_anchored(&self) -> Anchored
Return the anchored mode for this search configuration.
If no anchored mode was set, then it defaults to Anchored::No
.
§Example
use regex_automata::{Anchored, Input, PatternID};
let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());
let pid = PatternID::must(5);
input.set_anchored(Anchored::Pattern(pid));
assert_eq!(Anchored::Pattern(pid), input.get_anchored());
sourcepub fn get_earliest(&self) -> bool
pub fn get_earliest(&self) -> bool
Return whether this search should execute in “earliest” mode.
§Example
use regex_automata::Input;
let input = Input::new("foobar");
assert!(!input.get_earliest());
sourcepub fn is_done(&self) -> bool
pub fn is_done(&self) -> bool
Return true if and only if this search can never return any other matches.
This occurs when the start position of this search is greater than the end position of the search.
§Example
use regex_automata::Input;
let mut input = Input::new("foobar");
assert!(!input.is_done());
input.set_start(6);
assert!(!input.is_done());
input.set_start(7);
assert!(input.is_done());
sourcepub fn is_char_boundary(&mut self) -> bool
pub fn is_char_boundary(&mut self) -> bool
Returns true if and only if the given offset in this search’s chunk falls on a valid UTF-8 encoded codepoint boundary.
If the chunk is not valid UTF-8, then the behavior of this routine is unspecified.
§Example
This shows where codepoint bounardies do and don’t exist in valid UTF-8.
use regex_automata::Input;
let input = Input::new("☃");
assert!(input.is_char_boundary(0));
assert!(!input.is_char_boundary(1));
assert!(!input.is_char_boundary(2));
assert!(input.is_char_boundary(3));
assert!(!input.is_char_boundary(4));