Struct Input

Source

pub struct Input<C: Cursor> { /* private fields */ }

Implementations§

Source §

impl<C: Cursor> Input<C>

Source

pub fn new<T: IntoCursor<Cursor = C>>(cursor: T) -> Self

Create a new search configuration for the given cursor.

Source

pub fn chunk(&self) -> &[u8] ⓘ

Return a borrow of the current underlying chunk as a slice of bytes.

§Example

use regex_cursor::Input;

let input = Input::new("foobar");
assert_eq!(b"foobar", input.chunk());

Source

pub fn chunk_offset(&self) -> usize

Return a borrow of the current underlying chunk as a slice of bytes.

§Example

use regex_cursor::Input;

let input = Input::new("foobar");
assert_eq!(b"foobar", input.chunk());

Source

pub fn start(&self) -> usize

Return the start position of this search.

This is a convenience routine for search.get_span().start().

When Input::is_done is false, this is guaranteed to return an offset that is less than or equal to Input::end. Otherwise, the offset is one greater than Input::end.

§Example

use regex_automata::Input;

let input = Input::new("foobar");
assert_eq!(0, input.start());

let input = Input::new("foobar").span(2..4);
assert_eq!(2, input.start());

Source

pub fn clear_look_behind(&mut self)

Source

pub fn end(&self) -> usize

Return the end position of this search.

This is a convenience routine for search.get_span().end().

This is guaranteed to return an offset that is a valid exclusive end bound for this input’s haystack.

§Example

use regex_automata::Input;

let input = Input::new("foobar");
assert_eq!(6, input.end());

let input = Input::new("foobar").span(2..4);
assert_eq!(4, input.end());

Source

pub fn get_chunk_end(&self) -> usize

Source

pub fn get_span(&self) -> Span

Return the span for this search configuration.

If one was not explicitly set, then the span corresponds to the entire range of the haystack.

When Input::is_done is false, the span returned is guaranteed to correspond to valid bounds for this input’s haystack.

§Example

use regex_automata::{Input, Span};

let input = Input::new("foobar");
assert_eq!(Span { start: 0, end: 6 }, input.get_span());

Source

pub fn look_around(&mut self) -> (&[u8], usize)

Source

pub fn anchored(&mut self, mode: Anchored) -> &mut Self

Sets the anchor mode of a search.

When a search is anchored (so that’s Anchored::Yes or Anchored::Pattern), a match must begin at the start of a search. When a search is not anchored (that’s Anchored::No), regex engines will behave as if the pattern started with a (?:s-u.)*?. This prefix permits a match to appear anywhere.

By default, the anchored mode is Anchored::No.

WARNING: this is subtly different than using a ^ at the start of your regex. A ^ forces a regex to match exclusively at the start of a chunk, regardless of where you begin your search. In contrast, anchoring a search will allow your regex to match anywhere in your chunk, but the match must start at the beginning of a search.

For example, consider the chunk aba and the following searches:

The regex ^a is compiled with Anchored::No and searches aba starting at position 2. Since ^ requires the match to start at the beginning of the chunk and 2 > 0, no match is found.
The regex a is compiled with Anchored::Yes and searches aba starting at position 2. This reports a match at [2, 3] since the match starts where the search started. Since there is no ^, there is no requirement for the match to start at the beginning of the chunk.
The regex a is compiled with Anchored::Yes and searches aba starting at position 1. Since b corresponds to position 1 and since the search is anchored, it finds no match. While the regex matches at other positions, configuring the search to be anchored requires that it only report a match that begins at the same offset as the beginning of the search.
The regex a is compiled with Anchored::No and searches aba startting at position 1. Since the search is not anchored and the regex does not start with ^, the search executes as if there is a (?s:.)*? prefix that permits it to match anywhere. Thus, it reports a match at [2, 3].

Note that the Anchored::Pattern mode is like Anchored::Yes, except it only reports matches for a particular pattern.

§Example

This demonstrates the differences between an anchored search and a pattern that begins with ^ (as described in the above warning message).

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Anchored, Match, Input,
};

let chunk = "aba";

let re = PikeVM::new(r"^a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(2..3).anchored(Anchored::No);
re.search(&mut cache, &input, &mut caps);
// No match is found because 2 is not the beginning of the chunk,
// which is what ^ requires.
assert_eq!(None, caps.get_match());

let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(2..3).anchored(Anchored::Yes);
re.search(&mut cache, &input, &mut caps);
// An anchored search can still match anywhere in the chunk, it just
// must begin at the start of the search which is '2' in this case.
assert_eq!(Some(Match::must(0, 2..3)), caps.get_match());

let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(1..3).anchored(Anchored::Yes);
re.search(&mut cache, &input, &mut caps);
// No match is found since we start searching at offset 1 which
// corresponds to 'b'. Since there is no '(?s:.)*?' prefix, no match
// is found.
assert_eq!(None, caps.get_match());

let re = PikeVM::new(r"a")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let input = Input::new(chunk).span(1..3).anchored(Anchored::No);
re.search(&mut cache, &input, &mut caps);
// Since anchored=no, an implicit '(?s:.)*?' prefix was added to the
// pattern. Even though the search starts at 'b', the 'match anything'
// prefix allows the search to match 'a'.
let expected = Some(Match::must(0, 2..3));
assert_eq!(expected, caps.get_match());

Source

pub fn earliest(&mut self, yes: bool) -> &mut Self

Whether to execute an “earliest” search or not.

When running a non-overlapping search, an “earliest” search will return the match location as early as possible. For example, given a pattern of foo[0-9]+ and a chunk of foo12345, a normal leftmost search will return foo12345 as a match. But an “earliest” search for regex engines that support “earliest” semantics will return foo1 as a match, since as soon as the first digit following foo is seen, it is known to have found a match.

Note that “earliest” semantics generally depend on the regex engine. Different regex engines may determine there is a match at different points. So there is no guarantee that “earliest” matches will always return the same offsets for all regex engines. The “earliest” notion is really about when the particular regex engine determines there is a match rather than a consistent semantic unto itself. This is often useful for implementing “did a match occur or not” predicates, but sometimes the offset is useful as well.

This is disabled by default.

§Example

This example shows the difference between “earliest” searching and normal searching.

use regex_automata::{nfa::thompson::pikevm::PikeVM, Match, Input};

let re = PikeVM::new(r"foo[0-9]+")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

// A normal search implements greediness like you expect.
let input = Input::new("foo12345");
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..8)), caps.get_match());

// When 'earliest' is enabled and the regex engine supports
// it, the search will bail once it knows a match has been
// found.
let input = Input::new("foo12345").earliest(true);
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..4)), caps.get_match());

Source

pub fn set_anchored(&mut self, mode: Anchored)

Set the anchor mode of a search.

This is like Input::anchored, except it mutates the search configuration in place.

§Example

use regex_automata::{Anchored, Input, PatternID};

let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());

let pid = PatternID::must(5);
input.set_anchored(Anchored::Pattern(pid));
assert_eq!(Anchored::Pattern(pid), input.get_anchored());

Source

pub fn set_earliest(&mut self, yes: bool)

Set whether the search should execute in “earliest” mode or not.

This is like Input::earliest, except it mutates the search configuration in place.

§Example

use regex_automata::Input;

let mut input = Input::new("foobar");
assert!(!input.get_earliest());
input.set_earliest(true);
assert!(input.get_earliest());

Source

pub fn span<S: Into>(&mut self, span: S) -> &mut Input<C>

Set the span for this search.

This routine does not panic if the span given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.

This routine is generic over how a span is provided. While a Span may be given directly, one may also provide a std::ops::Range<usize>. To provide anything supported by range syntax, use the Input::range method.

The default span is the entire haystack.

Note that Input::range overrides this method and vice versa.

§Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

§Example

This example shows how the span of the search can impact whether a match is reported or not. This is particularly relevant for look-around operators, which might take things outside of the span into account when determining whether they match.

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Match, Input,
};

// Look for 'at', but as a distinct word.
let re = PikeVM::new(r"\bat\b")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

// Our haystack contains 'at', but not as a distinct word.
let haystack = "batter";

// A standard search finds nothing, as expected.
let input = Input::new(haystack);
re.search(&mut cache, &input, &mut caps);
assert_eq!(None, caps.get_match());

// But if we wanted to search starting at position '1', we might
// slice the haystack. If we do this, it's impossible for the \b
// anchors to take the surrounding context into account! And thus,
// a match is produced.
let input = Input::new(&haystack[1..3]);
re.search(&mut cache, &input, &mut caps);
assert_eq!(Some(Match::must(0, 0..2)), caps.get_match());

// But if we specify the span of the search instead of slicing the
// haystack, then the regex engine can "see" outside of the span
// and resolve the anchors correctly.
let input = Input::new(haystack).span(1..3);
re.search(&mut cache, &input, &mut caps);
assert_eq!(None, caps.get_match());

This may seem a little ham-fisted, but this scenario tends to come up if some other regex engine found the match span and now you need to re-process that span to look for capturing groups. (e.g., Run a faster DFA first, find a match, then run the PikeVM on just the match span to resolve capturing groups.) In order to implement that sort of logic correctly, you need to set the span on the search instead of slicing the haystack directly.

The other advantage of using this routine to specify the bounds of the search is that the match offsets are still reported in terms of the original haystack. For example, the second search in the example above reported a match at position 0, even though at starts at offset 1 because we sliced the haystack.

Source

pub fn set_start(&mut self, start: usize)

Set the starting offset for the span for this search configuration.

This is a convenience routine for only mutating the start of a span without having to set the entire span.

§Panics

This panics if the span resulting from the new start position does not correspond to valid bounds in the haystack or the termination of a search.

Source

pub fn set_end(&mut self, end: usize)

Set the ending offset for the span for this search configuration.

This is a convenience routine for only mutating the end of a span without having to set the entire span.

§Panics

This panics if the span resulting from the new end position does not correspond to valid bounds in the haystack or the termination of a search.

Source

pub fn range<R: RangeBounds<usize>>(self, range: R) -> Input<C>

Like Input::span, but accepts any range instead.

This routine does not panic if the range given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.

The default range is the entire haystack.

Note that Input::span overrides this method and vice versa.

§Panics

This routine will panic if the given range could not be converted to a valid [Range]. For example, this would panic when given 0..=usize::MAX since it cannot be represented using a half-open interval in terms of usize.

This also panics if the given range does not correspond to valid bounds in the haystack or the termination of a search.

§Example

use regex_automata::Input;

let input = Input::new("foobar");
assert_eq!(0..6, input.get_range());

let input = Input::new("foobar").range(2..=4);
assert_eq!(2..5, input.get_range());

Source

pub fn set_span<S: Into>(&mut self, span: S)

Set the span for this search configuration.

This is like the Input::span method, except this mutates the span in place.

This routine is generic over how a span is provided. While a Span may be given directly, one may also provide a std::ops::Range<usize>.

§Panics

This panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

§Example

use regex_automata::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_span(2..4);
assert_eq!(2..4, input.get_range());

Source

pub fn slice_span<S: Into>(&mut self, span: S) -> &mut Input<C>

Source

pub fn slice<R: RangeBounds<usize>>(&mut self, range: R) -> &mut Input<C>

Source

pub fn set_range<R: RangeBounds<usize>>(&mut self, range: R)

Set the span for this search configuration given any range.

This is like the Input::range method, except this mutates the span in place.

This routine does not panic if the range given is not a valid range for this search’s haystack. If this search is run with an invalid range, then the most likely outcome is that the actual search execution will panic.

§Panics

This routine will panic if the given range could not be converted to a valid [Range]. For example, this would panic when given 0..=usize::MAX since it cannot be represented using a half-open interval in terms of usize.

This also panics if the given span does not correspond to valid bounds in the haystack or the termination of a search.

§Example

use regex_automata::Input;

let mut input = Input::new("foobar");
assert_eq!(0..6, input.get_range());
input.set_range(2..=4);
assert_eq!(2..5, input.get_range());

Source

pub fn get_anchored(&self) -> Anchored

Return the anchored mode for this search configuration.

If no anchored mode was set, then it defaults to Anchored::No.

§Example

use regex_automata::{Anchored, Input, PatternID};

let mut input = Input::new("foobar");
assert_eq!(Anchored::No, input.get_anchored());

let pid = PatternID::must(5);
input.set_anchored(Anchored::Pattern(pid));
assert_eq!(Anchored::Pattern(pid), input.get_anchored());

Source

pub fn get_earliest(&self) -> bool

Return whether this search should execute in “earliest” mode.

§Example

use regex_automata::Input;

let input = Input::new("foobar");
assert!(!input.get_earliest());

Source

pub fn is_done(&self) -> bool

Return true if and only if this search can never return any other matches.

This occurs when the start position of this search is greater than the end position of the search.

§Example

use regex_automata::Input;

let mut input = Input::new("foobar");
assert!(!input.is_done());
input.set_start(6);
assert!(!input.is_done());
input.set_start(7);
assert!(input.is_done());

Source

pub fn is_char_boundary(&mut self) -> bool

Returns true if and only if the given offset in this search’s chunk falls on a valid UTF-8 encoded codepoint boundary.

If the chunk is not valid UTF-8, then the behavior of this routine is unspecified.

§Example

This shows where codepoint bounardies do and don’t exist in valid UTF-8.

use regex_automata::Input;

let input = Input::new("☃");
assert!(input.is_char_boundary(0));
assert!(!input.is_char_boundary(1));
assert!(!input.is_char_boundary(2));
assert!(input.is_char_boundary(3));
assert!(!input.is_char_boundary(4));