Trait grep_matcher::Matcher[][src]

pub trait Matcher {
    type Captures: Captures;
    type Error: Display;
Show methods fn find_at(
        &self,
        haystack: &[u8],
        at: usize
    ) -> Result<Option<Match>, Self::Error>;
fn new_captures(&self) -> Result<Self::Captures, Self::Error>; fn capture_count(&self) -> usize { ... }
fn capture_index(&self, _name: &str) -> Option<usize> { ... }
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> { ... }
fn find_iter<F>(
        &self,
        haystack: &[u8],
        matched: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(Match) -> bool
, { ... }
fn find_iter_at<F>(
        &self,
        haystack: &[u8],
        at: usize,
        matched: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(Match) -> bool
, { ... }
fn try_find_iter<F, E>(
        &self,
        haystack: &[u8],
        matched: F
    ) -> Result<Result<(), E>, Self::Error>
    where
        F: FnMut(Match) -> Result<bool, E>
, { ... }
fn try_find_iter_at<F, E>(
        &self,
        haystack: &[u8],
        at: usize,
        matched: F
    ) -> Result<Result<(), E>, Self::Error>
    where
        F: FnMut(Match) -> Result<bool, E>
, { ... }
fn captures(
        &self,
        haystack: &[u8],
        caps: &mut Self::Captures
    ) -> Result<bool, Self::Error> { ... }
fn captures_iter<F>(
        &self,
        haystack: &[u8],
        caps: &mut Self::Captures,
        matched: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(&Self::Captures) -> bool
, { ... }
fn captures_iter_at<F>(
        &self,
        haystack: &[u8],
        at: usize,
        caps: &mut Self::Captures,
        matched: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(&Self::Captures) -> bool
, { ... }
fn try_captures_iter<F, E>(
        &self,
        haystack: &[u8],
        caps: &mut Self::Captures,
        matched: F
    ) -> Result<Result<(), E>, Self::Error>
    where
        F: FnMut(&Self::Captures) -> Result<bool, E>
, { ... }
fn try_captures_iter_at<F, E>(
        &self,
        haystack: &[u8],
        at: usize,
        caps: &mut Self::Captures,
        matched: F
    ) -> Result<Result<(), E>, Self::Error>
    where
        F: FnMut(&Self::Captures) -> Result<bool, E>
, { ... }
fn captures_at(
        &self,
        _haystack: &[u8],
        _at: usize,
        _caps: &mut Self::Captures
    ) -> Result<bool, Self::Error> { ... }
fn replace<F>(
        &self,
        haystack: &[u8],
        dst: &mut Vec<u8>,
        append: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(Match, &mut Vec<u8>) -> bool
, { ... }
fn replace_with_captures<F>(
        &self,
        haystack: &[u8],
        caps: &mut Self::Captures,
        dst: &mut Vec<u8>,
        append: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool
, { ... }
fn replace_with_captures_at<F>(
        &self,
        haystack: &[u8],
        at: usize,
        caps: &mut Self::Captures,
        dst: &mut Vec<u8>,
        append: F
    ) -> Result<(), Self::Error>
    where
        F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool
, { ... }
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> { ... }
fn is_match_at(
        &self,
        haystack: &[u8],
        at: usize
    ) -> Result<bool, Self::Error> { ... }
fn shortest_match(
        &self,
        haystack: &[u8]
    ) -> Result<Option<usize>, Self::Error> { ... }
fn shortest_match_at(
        &self,
        haystack: &[u8],
        at: usize
    ) -> Result<Option<usize>, Self::Error> { ... }
fn non_matching_bytes(&self) -> Option<&ByteSet> { ... }
fn line_terminator(&self) -> Option<LineTerminator> { ... }
fn find_candidate_line(
        &self,
        haystack: &[u8]
    ) -> Result<Option<LineMatchKind>, Self::Error> { ... }
}
Expand description

A matcher defines an interface for regular expression implementations.

While this trait is large, there are only two required methods that implementors must provide: find_at and new_captures. If captures aren’t supported by your implementation, then new_captures can be implemented with NoCaptures. If your implementation does support capture groups, then you should also implement the other capture related methods, as dictated by the documentation. Crucially, this includes captures_at.

The rest of the methods on this trait provide default implementations on top of find_at and new_captures. It is not uncommon for implementations to be able to provide faster variants of some methods; in those cases, simply override the default implementation.

Associated Types

The concrete type of capturing groups used for this matcher.

If this implementation does not support capturing groups, then set this to NoCaptures.

The error type used by this matcher.

For matchers in which an error is not possible, they are encouraged to use the NoError type in this crate. In the future, when the “never” (spelled !) type is stabilized, then it should probably be used instead.

Required methods

Returns the start and end byte range of the first match in haystack after at, where the byte offsets are relative to that start of haystack (and not at). If no match exists, then None is returned.

The text encoding of haystack is not strictly specified. Matchers are advised to assume UTF-8, or at worst, some ASCII compatible encoding.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Creates an empty group of captures suitable for use with the capturing APIs of this trait.

Implementations that don’t support capturing groups should use the NoCaptures type and implement this method by calling NoCaptures::new().

Provided methods

Returns the total number of capturing groups in this matcher.

If a matcher supports capturing groups, then this value must always be at least 1, where the first capturing group always corresponds to the overall match.

If a matcher does not support capturing groups, then this should always return 0.

By default, capturing groups are not supported, so this always returns 0.

Maps the given capture group name to its corresponding capture group index, if one exists. If one does not exist, then None is returned.

If the given capture group name maps to multiple indices, then it is not specified which one is returned. However, it is guaranteed that one of them is returned.

By default, capturing groups are not supported, so this always returns None.

Returns the start and end byte range of the first match in haystack. If no match exists, then None is returned.

The text encoding of haystack is not strictly specified. Matchers are advised to assume UTF-8, or at worst, some ASCII compatible encoding.

Executes the given function over successive non-overlapping matches in haystack. If no match exists, then the given function is never called. If the function returns false, then iteration stops.

Executes the given function over successive non-overlapping matches in haystack. If no match exists, then the given function is never called. If the function returns false, then iteration stops.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Executes the given function over successive non-overlapping matches in haystack. If no match exists, then the given function is never called. If the function returns false, then iteration stops. Similarly, if the function returns an error then iteration stops and the error is yielded. If an error occurs while executing the search, then it is converted to E.

Executes the given function over successive non-overlapping matches in haystack. If no match exists, then the given function is never called. If the function returns false, then iteration stops. Similarly, if the function returns an error then iteration stops and the error is yielded. If an error occurs while executing the search, then it is converted to E.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Populates the first set of capture group matches from haystack into caps. If no match exists, then false is returned.

The text encoding of haystack is not strictly specified. Matchers are advised to assume UTF-8, or at worst, some ASCII compatible encoding.

Executes the given function over successive non-overlapping matches in haystack with capture groups extracted from each match. If no match exists, then the given function is never called. If the function returns false, then iteration stops.

Executes the given function over successive non-overlapping matches in haystack with capture groups extracted from each match. If no match exists, then the given function is never called. If the function returns false, then iteration stops.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Executes the given function over successive non-overlapping matches in haystack with capture groups extracted from each match. If no match exists, then the given function is never called. If the function returns false, then iteration stops. Similarly, if the function returns an error then iteration stops and the error is yielded. If an error occurs while executing the search, then it is converted to E.

Executes the given function over successive non-overlapping matches in haystack with capture groups extracted from each match. If no match exists, then the given function is never called. If the function returns false, then iteration stops. Similarly, if the function returns an error then iteration stops and the error is yielded. If an error occurs while executing the search, then it is converted to E.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Populates the first set of capture group matches from haystack into matches after at, where the byte offsets in each capturing group are relative to the start of haystack (and not at). If no match exists, then false is returned and the contents of the given capturing groups are unspecified.

The text encoding of haystack is not strictly specified. Matchers are advised to assume UTF-8, or at worst, some ASCII compatible encoding.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

By default, capturing groups aren’t supported, and this implementation will always behave as if a match were impossible.

Implementors that provide support for capturing groups must guarantee that when a match occurs, the first capture match (at index 0) is always set to the overall match offsets.

Note that if implementors seek to support capturing groups, then they should implement this method. Other methods that match based on captures will then work automatically.

Replaces every match in the given haystack with the result of calling append. append is given the start and end of a match, along with a handle to the dst buffer provided.

If the given append function returns false, then replacement stops.

Replaces every match in the given haystack with the result of calling append with the matching capture groups.

If the given append function returns false, then replacement stops.

Replaces every match in the given haystack with the result of calling append with the matching capture groups.

If the given append function returns false, then replacement stops.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Returns true if and only if the matcher matches the given haystack.

By default, this method is implemented by calling shortest_match.

Returns true if and only if the matcher matches the given haystack starting at the given position.

By default, this method is implemented by calling shortest_match_at.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

Returns an end location of the first match in haystack. If no match exists, then None is returned.

Note that the end location reported by this method may be less than the same end location reported by find. For example, running find with the pattern a+ on the haystack aaa should report a range of [0, 3), but shortest_match may report 1 as the ending location since that is the place at which a match is guaranteed to occur.

This method should never report false positives or false negatives. The point of this method is that some implementors may be able to provide a faster implementation of this than what find does.

By default, this method is implemented by calling find.

Returns an end location of the first match in haystack starting at the given position. If no match exists, then None is returned.

Note that the end location reported by this method may be less than the same end location reported by find. For example, running find with the pattern a+ on the haystack aaa should report a range of [0, 3), but shortest_match may report 1 as the ending location since that is the place at which a match is guaranteed to occur.

This method should never report false positives or false negatives. The point of this method is that some implementors may be able to provide a faster implementation of this than what find does.

By default, this method is implemented by calling find_at.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when at == 0.

If available, return a set of bytes that will never appear in a match produced by an implementation.

Specifically, if such a set can be determined, then it’s possible for callers to perform additional operations on the basis that certain bytes may never match.

For example, if a search is configured to possibly produce results that span multiple lines but a caller provided pattern can never match across multiple lines, then it may make sense to divert to more optimized line oriented routines that don’t need to handle the multi-line match case.

Implementations that produce this set must never report false positives, but may produce false negatives. That is, is a byte is in this set then it must be guaranteed that it is never in a match. But, if a byte is not in this set, then callers cannot assume that a match exists with that byte.

By default, this returns None.

If this matcher was compiled as a line oriented matcher, then this method returns the line terminator if and only if the line terminator never appears in any match produced by this matcher. If this wasn’t compiled as a line oriented matcher, or if the aforementioned guarantee cannot be made, then this must return None, which is the default. It is never wrong to return None, but returning a line terminator when it can appear in a match results in unspecified behavior.

The line terminator is typically b'\n', but can be any single byte or CRLF.

By default, this returns None.

Return one of the following: a confirmed line match, a candidate line match (which may be a false positive) or no match at all (which must not be a false negative). When reporting a confirmed or candidate match, the position returned can be any position in the line.

By default, this never returns a candidate match, and always either returns a confirmed match or no match at all.

When a matcher can match spans over multiple lines, then the behavior of this method is unspecified. Namely, use of this method only makes sense in a context where the caller is looking for the next matching line. That is, callers should only use this method when line_terminator does not return None.

Design rationale

A line matcher is, fundamentally, a normal matcher with the addition of one optional method: finding a line. By default, this routine is implemented via the matcher’s shortest_match method, which always yields either no match or a LineMatchKind::Confirmed. However, implementors may provide a routine for this that can return candidate lines that need subsequent verification to be confirmed as a match. This can be useful in cases where it may be quicker to find candidate lines via some other means instead of relying on the more general implementations for find and shortest_match.

For example, consider the regex \w+foo\s+. Both find and shortest_match must consider the entire regex, including the \w+ and \s+, while searching. However, this method could look for lines containing foo and return them as candidates. Finding foo might be implemented as a highly optimized substring search routine (like memmem), which is likely to be faster than whatever more generalized routine is required for resolving \w+foo\s+. The caller is then responsible for confirming whether a match exists or not.

Note that while this method may report false positives, it must never report false negatives. That is, it can never skip over lines that contain a match.

Implementations on Foreign Types

Implementors