regex_automata/util/iter.rs
1/*!
2Generic helpers for iteration of matches from a regex engine in a haystack.
3
4The principle type in this module is a [`Searcher`]. A `Searcher` provides
5its own lower level iterator-like API in addition to methods for constructing
6types that implement `Iterator`. The documentation for `Searcher` explains a
7bit more about why these different APIs exist.
8
9Currently, this module supports iteration over any regex engine that works
10with the [`HalfMatch`], [`Match`] or [`Captures`] types.
11*/
12
13#[cfg(feature = "alloc")]
14use crate::util::captures::Captures;
15use crate::util::search::{HalfMatch, Input, Match, MatchError};
16
17/// A searcher for creating iterators and performing lower level iteration.
18///
19/// This searcher encapsulates the logic required for finding all successive
20/// non-overlapping matches in a haystack. In theory, iteration would look
21/// something like this:
22///
23/// 1. Setting the start position to `0`.
24/// 2. Execute a regex search. If no match, end iteration.
25/// 3. Report the match and set the start position to the end of the match.
26/// 4. Go back to (2).
27///
28/// And if this were indeed the case, it's likely that `Searcher` wouldn't
29/// exist. Unfortunately, because a regex may match the empty string, the above
30/// logic won't work for all possible regexes. Namely, if an empty match is
31/// found, then step (3) would set the start position of the search to the
32/// position it was at. Thus, iteration would never end.
33///
34/// Instead, a `Searcher` knows how to detect these cases and forcefully
35/// advance iteration in the case of an empty match that overlaps with a
36/// previous match.
37///
38/// If you know that your regex cannot match any empty string, then the simple
39/// algorithm described above will work correctly.
40///
41/// When possible, prefer the iterators defined on the regex engine you're
42/// using. This tries to abstract over the regex engine and is thus a bit more
43/// unwieldy to use.
44///
45/// In particular, a `Searcher` is not itself an iterator. Instead, it provides
46/// `advance` routines that permit moving the search along explicitly. It also
47/// provides various routines, like [`Searcher::into_matches_iter`], that
48/// accept a closure (representing how a regex engine executes a search) and
49/// returns a conventional iterator.
50///
51/// The lifetime parameters come from the [`Input`] type passed to
52/// [`Searcher::new`]:
53///
54/// * `'h` is the lifetime of the underlying haystack.
55///
56/// # Searcher vs Iterator
57///
58/// Why does a search type with "advance" APIs exist at all when we also have
59/// iterators? Unfortunately, the reasoning behind this split is a complex
60/// combination of the following things:
61///
62/// 1. While many of the regex engines expose their own iterators, it is also
63/// nice to expose this lower level iteration helper because it permits callers
64/// to provide their own `Input` configuration. Moreover, a `Searcher` can work
65/// with _any_ regex engine instead of only the ones defined in this crate.
66/// This way, everyone benefits from a shared iteration implementation.
67/// 2. There are many different regex engines that, while they have the same
68/// match semantics, they have slightly different APIs. Iteration is just
69/// complex enough to want to share code, and so we need a way of abstracting
70/// over those different regex engines. While we could define a new trait that
71/// describes any regex engine search API, it would wind up looking very close
72/// to a closure. While there may still be reasons for the more generic trait
73/// to exist, for now and for the purposes of iteration, we use a closure.
74/// Closures also provide a lot of easy flexibility at the call site, in that
75/// they permit the caller to borrow any kind of state they want for use during
76/// each search call.
77/// 3. As a result of using closures, and because closures are anonymous types
78/// that cannot be named, it is difficult to encapsulate them without both
79/// costs to speed and added complexity to the public API. For example, in
80/// defining an iterator type like
81/// [`dfa::regex::FindMatches`](crate::dfa::regex::FindMatches),
82/// if we use a closure internally, it's not possible to name this type in the
83/// return type of the iterator constructor. Thus, the only way around it is
84/// to erase the type by boxing it and turning it into a `Box<dyn FnMut ...>`.
85/// This boxed closure is unlikely to be inlined _and_ it infects the public
86/// API in subtle ways. Namely, unless you declare the closure as implementing
87/// `Send` and `Sync`, then the resulting iterator type won't implement it
88/// either. But there are practical issues with requiring the closure to
89/// implement `Send` and `Sync` that result in other API complexities that
90/// are beyond the scope of this already long exposition.
91/// 4. Some regex engines expose more complex match information than just
92/// "which pattern matched" and "at what offsets." For example, the PikeVM
93/// exposes match spans for each capturing group that participated in the
94/// match. In such cases, it can be quite beneficial to reuse the capturing
95/// group allocation on subsequent searches. A proper iterator doesn't permit
96/// this API due to its interface, so it's useful to have something a bit lower
97/// level that permits callers to amortize allocations while also reusing a
98/// shared implementation of iteration. (See the documentation for
99/// [`Searcher::advance`] for an example of using the "advance" API with the
100/// PikeVM.)
101///
102/// What this boils down to is that there are "advance" APIs which require
103/// handing a closure to it for every call, and there are also APIs to create
104/// iterators from a closure. The former are useful for _implementing_
105/// iterators or when you need more flexibility, while the latter are useful
106/// for conveniently writing custom iterators on-the-fly.
107///
108/// # Example: iterating with captures
109///
110/// Several regex engines in this crate over convenient iterator APIs over
111/// [`Captures`] values. To do so, this requires allocating a new `Captures`
112/// value for each iteration step. This can perhaps be more costly than you
113/// might want. Instead of implementing your own iterator to avoid that
114/// cost (which can be a little subtle if you want to handle empty matches
115/// correctly), you can use this `Searcher` to do it for you:
116///
117/// ```
118/// use regex_automata::{
119/// nfa::thompson::pikevm::PikeVM,
120/// util::iter::Searcher,
121/// Input, Span,
122/// };
123///
124/// let re = PikeVM::new("foo(?P<numbers>[0-9]+)")?;
125/// let haystack = "foo1 foo12 foo123";
126///
127/// let mut caps = re.create_captures();
128/// let mut cache = re.create_cache();
129/// let mut matches = vec![];
130/// let mut searcher = Searcher::new(Input::new(haystack));
131/// while let Some(_) = searcher.advance(|input| {
132/// re.search(&mut cache, input, &mut caps);
133/// Ok(caps.get_match())
134/// }) {
135/// // The unwrap is OK since 'numbers' matches if the pattern matches.
136/// matches.push(caps.get_group_by_name("numbers").unwrap());
137/// }
138/// assert_eq!(matches, vec![
139/// Span::from(3..4),
140/// Span::from(8..10),
141/// Span::from(14..17),
142/// ]);
143///
144/// # Ok::<(), Box<dyn std::error::Error>>(())
145/// ```
146#[derive(Clone, Debug)]
147pub struct Searcher<'h> {
148 /// The input parameters to give to each regex engine call.
149 ///
150 /// The start position of the search is mutated during iteration.
151 input: Input<'h>,
152 /// Records the end offset of the most recent match. This is necessary to
153 /// handle a corner case for preventing empty matches from overlapping with
154 /// the ending bounds of a prior match.
155 last_match_end: Option<usize>,
156}
157
158impl<'h> Searcher<'h> {
159 /// Create a new fallible non-overlapping matches iterator.
160 ///
161 /// The given `input` provides the parameters (including the haystack),
162 /// while the `finder` represents a closure that calls the underlying regex
163 /// engine. The closure may borrow any additional state that is needed,
164 /// such as a prefilter scanner.
165 pub fn new(input: Input<'h>) -> Searcher<'h> {
166 Searcher { input, last_match_end: None }
167 }
168
169 /// Returns the current `Input` used by this searcher.
170 ///
171 /// The `Input` returned is generally equivalent to the one given to
172 /// [`Searcher::new`], but its start position may be different to reflect
173 /// the start of the next search to be executed.
174 pub fn input<'s>(&'s self) -> &'s Input<'h> {
175 &self.input
176 }
177
178 /// Return the next half match for an infallible search if one exists, and
179 /// advance to the next position.
180 ///
181 /// This is like `try_advance_half`, except errors are converted into
182 /// panics.
183 ///
184 /// # Panics
185 ///
186 /// If the given closure returns an error, then this panics. This is useful
187 /// when you know your underlying regex engine has been configured to not
188 /// return an error.
189 ///
190 /// # Example
191 ///
192 /// This example shows how to use a `Searcher` to iterate over all matches
193 /// when using a DFA, which only provides "half" matches.
194 ///
195 /// ```
196 /// use regex_automata::{
197 /// hybrid::dfa::DFA,
198 /// util::iter::Searcher,
199 /// HalfMatch, Input,
200 /// };
201 ///
202 /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
203 /// let mut cache = re.create_cache();
204 ///
205 /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
206 /// let mut it = Searcher::new(input);
207 ///
208 /// let expected = Some(HalfMatch::must(0, 10));
209 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
210 /// assert_eq!(expected, got);
211 ///
212 /// let expected = Some(HalfMatch::must(0, 21));
213 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
214 /// assert_eq!(expected, got);
215 ///
216 /// let expected = Some(HalfMatch::must(0, 32));
217 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
218 /// assert_eq!(expected, got);
219 ///
220 /// let expected = None;
221 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
222 /// assert_eq!(expected, got);
223 ///
224 /// # Ok::<(), Box<dyn std::error::Error>>(())
225 /// ```
226 ///
227 /// This correctly moves iteration forward even when an empty match occurs:
228 ///
229 /// ```
230 /// use regex_automata::{
231 /// hybrid::dfa::DFA,
232 /// util::iter::Searcher,
233 /// HalfMatch, Input,
234 /// };
235 ///
236 /// let re = DFA::new(r"a|")?;
237 /// let mut cache = re.create_cache();
238 ///
239 /// let input = Input::new("abba");
240 /// let mut it = Searcher::new(input);
241 ///
242 /// let expected = Some(HalfMatch::must(0, 1));
243 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
244 /// assert_eq!(expected, got);
245 ///
246 /// let expected = Some(HalfMatch::must(0, 2));
247 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
248 /// assert_eq!(expected, got);
249 ///
250 /// let expected = Some(HalfMatch::must(0, 4));
251 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
252 /// assert_eq!(expected, got);
253 ///
254 /// let expected = None;
255 /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
256 /// assert_eq!(expected, got);
257 ///
258 /// # Ok::<(), Box<dyn std::error::Error>>(())
259 /// ```
260 #[inline]
261 pub fn advance_half<F>(&mut self, finder: F) -> Option<HalfMatch>
262 where
263 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
264 {
265 match self.try_advance_half(finder) {
266 Ok(m) => m,
267 Err(err) => panic!(
268 "unexpected regex half find error: {err}\n\
269 to handle find errors, use 'try' or 'search' methods",
270 ),
271 }
272 }
273
274 /// Return the next match for an infallible search if one exists, and
275 /// advance to the next position.
276 ///
277 /// The search is advanced even in the presence of empty matches by
278 /// forbidding empty matches from overlapping with any other match.
279 ///
280 /// This is like `try_advance`, except errors are converted into panics.
281 ///
282 /// # Panics
283 ///
284 /// If the given closure returns an error, then this panics. This is useful
285 /// when you know your underlying regex engine has been configured to not
286 /// return an error.
287 ///
288 /// # Example
289 ///
290 /// This example shows how to use a `Searcher` to iterate over all matches
291 /// when using a regex based on lazy DFAs:
292 ///
293 /// ```
294 /// use regex_automata::{
295 /// hybrid::regex::Regex,
296 /// util::iter::Searcher,
297 /// Match, Input,
298 /// };
299 ///
300 /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
301 /// let mut cache = re.create_cache();
302 ///
303 /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
304 /// let mut it = Searcher::new(input);
305 ///
306 /// let expected = Some(Match::must(0, 0..10));
307 /// let got = it.advance(|input| re.try_search(&mut cache, input));
308 /// assert_eq!(expected, got);
309 ///
310 /// let expected = Some(Match::must(0, 11..21));
311 /// let got = it.advance(|input| re.try_search(&mut cache, input));
312 /// assert_eq!(expected, got);
313 ///
314 /// let expected = Some(Match::must(0, 22..32));
315 /// let got = it.advance(|input| re.try_search(&mut cache, input));
316 /// assert_eq!(expected, got);
317 ///
318 /// let expected = None;
319 /// let got = it.advance(|input| re.try_search(&mut cache, input));
320 /// assert_eq!(expected, got);
321 ///
322 /// # Ok::<(), Box<dyn std::error::Error>>(())
323 /// ```
324 ///
325 /// This example shows the same as above, but with the PikeVM. This example
326 /// is useful because it shows how to use this API even when the regex
327 /// engine doesn't directly return a `Match`.
328 ///
329 /// ```
330 /// use regex_automata::{
331 /// nfa::thompson::pikevm::PikeVM,
332 /// util::iter::Searcher,
333 /// Match, Input,
334 /// };
335 ///
336 /// let re = PikeVM::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
337 /// let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
338 ///
339 /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
340 /// let mut it = Searcher::new(input);
341 ///
342 /// let expected = Some(Match::must(0, 0..10));
343 /// let got = it.advance(|input| {
344 /// re.search(&mut cache, input, &mut caps);
345 /// Ok(caps.get_match())
346 /// });
347 /// // Note that if we wanted to extract capturing group spans, we could
348 /// // do that here with 'caps'.
349 /// assert_eq!(expected, got);
350 ///
351 /// let expected = Some(Match::must(0, 11..21));
352 /// let got = it.advance(|input| {
353 /// re.search(&mut cache, input, &mut caps);
354 /// Ok(caps.get_match())
355 /// });
356 /// assert_eq!(expected, got);
357 ///
358 /// let expected = Some(Match::must(0, 22..32));
359 /// let got = it.advance(|input| {
360 /// re.search(&mut cache, input, &mut caps);
361 /// Ok(caps.get_match())
362 /// });
363 /// assert_eq!(expected, got);
364 ///
365 /// let expected = None;
366 /// let got = it.advance(|input| {
367 /// re.search(&mut cache, input, &mut caps);
368 /// Ok(caps.get_match())
369 /// });
370 /// assert_eq!(expected, got);
371 ///
372 /// # Ok::<(), Box<dyn std::error::Error>>(())
373 /// ```
374 #[inline]
375 pub fn advance<F>(&mut self, finder: F) -> Option<Match>
376 where
377 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
378 {
379 match self.try_advance(finder) {
380 Ok(m) => m,
381 Err(err) => panic!(
382 "unexpected regex find error: {err}\n\
383 to handle find errors, use 'try' or 'search' methods",
384 ),
385 }
386 }
387
388 /// Return the next half match for a fallible search if one exists, and
389 /// advance to the next position.
390 ///
391 /// This is like `advance_half`, except it permits callers to handle errors
392 /// during iteration.
393 #[inline]
394 pub fn try_advance_half<F>(
395 &mut self,
396 mut finder: F,
397 ) -> Result<Option<HalfMatch>, MatchError>
398 where
399 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
400 {
401 let mut m = match finder(&self.input)? {
402 None => return Ok(None),
403 Some(m) => m,
404 };
405 if Some(m.offset()) == self.last_match_end {
406 m = match self.handle_overlapping_empty_half_match(m, finder)? {
407 None => return Ok(None),
408 Some(m) => m,
409 };
410 }
411 self.input.set_start(m.offset());
412 self.last_match_end = Some(m.offset());
413 Ok(Some(m))
414 }
415
416 /// Return the next match for a fallible search if one exists, and advance
417 /// to the next position.
418 ///
419 /// This is like `advance`, except it permits callers to handle errors
420 /// during iteration.
421 #[inline]
422 pub fn try_advance<F>(
423 &mut self,
424 mut finder: F,
425 ) -> Result<Option<Match>, MatchError>
426 where
427 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
428 {
429 let mut m = match finder(&self.input)? {
430 None => return Ok(None),
431 Some(m) => m,
432 };
433 if m.is_empty() && Some(m.end()) == self.last_match_end {
434 m = match self.handle_overlapping_empty_match(m, finder)? {
435 None => return Ok(None),
436 Some(m) => m,
437 };
438 }
439 self.input.set_start(m.end());
440 self.last_match_end = Some(m.end());
441 Ok(Some(m))
442 }
443
444 /// Given a closure that executes a single search, return an iterator over
445 /// all successive non-overlapping half matches.
446 ///
447 /// The iterator returned yields result values. If the underlying regex
448 /// engine is configured to never return an error, consider calling
449 /// [`TryHalfMatchesIter::infallible`] to convert errors into panics.
450 ///
451 /// # Example
452 ///
453 /// This example shows how to use a `Searcher` to create a proper
454 /// iterator over half matches.
455 ///
456 /// ```
457 /// use regex_automata::{
458 /// hybrid::dfa::DFA,
459 /// util::iter::Searcher,
460 /// HalfMatch, Input,
461 /// };
462 ///
463 /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
464 /// let mut cache = re.create_cache();
465 ///
466 /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
467 /// let mut it = Searcher::new(input).into_half_matches_iter(|input| {
468 /// re.try_search_fwd(&mut cache, input)
469 /// });
470 ///
471 /// let expected = Some(Ok(HalfMatch::must(0, 10)));
472 /// assert_eq!(expected, it.next());
473 ///
474 /// let expected = Some(Ok(HalfMatch::must(0, 21)));
475 /// assert_eq!(expected, it.next());
476 ///
477 /// let expected = Some(Ok(HalfMatch::must(0, 32)));
478 /// assert_eq!(expected, it.next());
479 ///
480 /// let expected = None;
481 /// assert_eq!(expected, it.next());
482 ///
483 /// # Ok::<(), Box<dyn std::error::Error>>(())
484 /// ```
485 #[inline]
486 pub fn into_half_matches_iter<F>(
487 self,
488 finder: F,
489 ) -> TryHalfMatchesIter<'h, F>
490 where
491 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
492 {
493 TryHalfMatchesIter { it: self, finder }
494 }
495
496 /// Given a closure that executes a single search, return an iterator over
497 /// all successive non-overlapping matches.
498 ///
499 /// The iterator returned yields result values. If the underlying regex
500 /// engine is configured to never return an error, consider calling
501 /// [`TryMatchesIter::infallible`] to convert errors into panics.
502 ///
503 /// # Example
504 ///
505 /// This example shows how to use a `Searcher` to create a proper
506 /// iterator over matches.
507 ///
508 /// ```
509 /// use regex_automata::{
510 /// hybrid::regex::Regex,
511 /// util::iter::Searcher,
512 /// Match, Input,
513 /// };
514 ///
515 /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
516 /// let mut cache = re.create_cache();
517 ///
518 /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
519 /// let mut it = Searcher::new(input).into_matches_iter(|input| {
520 /// re.try_search(&mut cache, input)
521 /// });
522 ///
523 /// let expected = Some(Ok(Match::must(0, 0..10)));
524 /// assert_eq!(expected, it.next());
525 ///
526 /// let expected = Some(Ok(Match::must(0, 11..21)));
527 /// assert_eq!(expected, it.next());
528 ///
529 /// let expected = Some(Ok(Match::must(0, 22..32)));
530 /// assert_eq!(expected, it.next());
531 ///
532 /// let expected = None;
533 /// assert_eq!(expected, it.next());
534 ///
535 /// # Ok::<(), Box<dyn std::error::Error>>(())
536 /// ```
537 #[inline]
538 pub fn into_matches_iter<F>(self, finder: F) -> TryMatchesIter<'h, F>
539 where
540 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
541 {
542 TryMatchesIter { it: self, finder }
543 }
544
545 /// Given a closure that executes a single search, return an iterator over
546 /// all successive non-overlapping `Captures` values.
547 ///
548 /// The iterator returned yields result values. If the underlying regex
549 /// engine is configured to never return an error, consider calling
550 /// [`TryCapturesIter::infallible`] to convert errors into panics.
551 ///
552 /// Unlike the other iterator constructors, this accepts an initial
553 /// `Captures` value. This `Captures` value is reused for each search, and
554 /// the iterator implementation clones it before returning it. The caller
555 /// must provide this value because the iterator is purposely ignorant
556 /// of the underlying regex engine and thus doesn't know how to create
557 /// one itself. More to the point, a `Captures` value itself has a few
558 /// different constructors, which change which kind of information is
559 /// available to query in exchange for search performance.
560 ///
561 /// # Example
562 ///
563 /// This example shows how to use a `Searcher` to create a proper iterator
564 /// over `Captures` values, which provides access to all capturing group
565 /// spans for each match.
566 ///
567 /// ```
568 /// use regex_automata::{
569 /// nfa::thompson::pikevm::PikeVM,
570 /// util::iter::Searcher,
571 /// Input,
572 /// };
573 ///
574 /// let re = PikeVM::new(
575 /// r"(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})",
576 /// )?;
577 /// let (mut cache, caps) = (re.create_cache(), re.create_captures());
578 ///
579 /// let haystack = "2010-03-14 2016-10-08 2020-10-22";
580 /// let input = Input::new(haystack);
581 /// let mut it = Searcher::new(input)
582 /// .into_captures_iter(caps, |input, caps| {
583 /// re.search(&mut cache, input, caps);
584 /// Ok(())
585 /// });
586 ///
587 /// let got = it.next().expect("first date")?;
588 /// let year = got.get_group_by_name("y").expect("must match");
589 /// assert_eq!("2010", &haystack[year]);
590 ///
591 /// let got = it.next().expect("second date")?;
592 /// let month = got.get_group_by_name("m").expect("must match");
593 /// assert_eq!("10", &haystack[month]);
594 ///
595 /// let got = it.next().expect("third date")?;
596 /// let day = got.get_group_by_name("d").expect("must match");
597 /// assert_eq!("22", &haystack[day]);
598 ///
599 /// assert!(it.next().is_none());
600 ///
601 /// # Ok::<(), Box<dyn std::error::Error>>(())
602 /// ```
603 #[cfg(feature = "alloc")]
604 #[inline]
605 pub fn into_captures_iter<F>(
606 self,
607 caps: Captures,
608 finder: F,
609 ) -> TryCapturesIter<'h, F>
610 where
611 F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
612 {
613 TryCapturesIter { it: self, caps, finder }
614 }
615
616 /// Handles the special case of a match that begins where the previous
617 /// match ended. Without this special handling, it'd be possible to get
618 /// stuck where an empty match never results in forward progress. This
619 /// also makes it more consistent with how presiding general purpose regex
620 /// engines work.
621 #[cold]
622 #[inline(never)]
623 fn handle_overlapping_empty_half_match<F>(
624 &mut self,
625 _: HalfMatch,
626 mut finder: F,
627 ) -> Result<Option<HalfMatch>, MatchError>
628 where
629 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
630 {
631 // Since we are only here when 'm.offset()' matches the offset of the
632 // last match, it follows that this must have been an empty match.
633 // Since we both need to make progress *and* prevent overlapping
634 // matches, we discard this match and advance the search by 1.
635 //
636 // Note that this may start a search in the middle of a codepoint. The
637 // regex engines themselves are expected to deal with that and not
638 // report any matches within a codepoint if they are configured in
639 // UTF-8 mode.
640 self.input.set_start(self.input.start().checked_add(1).unwrap());
641 finder(&self.input)
642 }
643
644 /// Handles the special case of an empty match by ensuring that 1) the
645 /// iterator always advances and 2) empty matches never overlap with other
646 /// matches.
647 ///
648 /// (1) is necessary because we principally make progress by setting the
649 /// starting location of the next search to the ending location of the last
650 /// match. But if a match is empty, then this results in a search that does
651 /// not advance and thus does not terminate.
652 ///
653 /// (2) is not strictly necessary, but makes intuitive sense and matches
654 /// the presiding behavior of most general purpose regex engines. The
655 /// "intuitive sense" here is that we want to report NON-overlapping
656 /// matches. So for example, given the regex 'a|(?:)' against the haystack
657 /// 'a', without the special handling, you'd get the matches [0, 1) and [1,
658 /// 1), where the latter overlaps with the end bounds of the former.
659 ///
660 /// Note that we mark this cold and forcefully prevent inlining because
661 /// handling empty matches like this is extremely rare and does require
662 /// quite a bit of code, comparatively. Keeping this code out of the main
663 /// iterator function keeps it smaller and more amenable to inlining
664 /// itself.
665 #[cold]
666 #[inline(never)]
667 fn handle_overlapping_empty_match<F>(
668 &mut self,
669 m: Match,
670 mut finder: F,
671 ) -> Result<Option<Match>, MatchError>
672 where
673 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
674 {
675 assert!(m.is_empty());
676 self.input.set_start(self.input.start().checked_add(1).unwrap());
677 finder(&self.input)
678 }
679}
680
681/// An iterator over all non-overlapping half matches for a fallible search.
682///
683/// The iterator yields a `Result<HalfMatch, MatchError>` value until no more
684/// matches could be found.
685///
686/// The type parameters are as follows:
687///
688/// * `F` represents the type of a closure that executes the search.
689///
690/// The lifetime parameters come from the [`Input`] type:
691///
692/// * `'h` is the lifetime of the underlying haystack.
693///
694/// When possible, prefer the iterators defined on the regex engine you're
695/// using. This tries to abstract over the regex engine and is thus a bit more
696/// unwieldy to use.
697///
698/// This iterator is created by [`Searcher::into_half_matches_iter`].
699pub struct TryHalfMatchesIter<'h, F> {
700 it: Searcher<'h>,
701 finder: F,
702}
703
704impl<'h, F> TryHalfMatchesIter<'h, F> {
705 /// Return an infallible version of this iterator.
706 ///
707 /// Any item yielded that corresponds to an error results in a panic. This
708 /// is useful if your underlying regex engine is configured in a way that
709 /// it is guaranteed to never return an error.
710 pub fn infallible(self) -> HalfMatchesIter<'h, F> {
711 HalfMatchesIter(self)
712 }
713
714 /// Returns the current `Input` used by this iterator.
715 ///
716 /// The `Input` returned is generally equivalent to the one used to
717 /// construct this iterator, but its start position may be different to
718 /// reflect the start of the next search to be executed.
719 pub fn input<'i>(&'i self) -> &'i Input<'h> {
720 self.it.input()
721 }
722}
723
724impl<'h, F> Iterator for TryHalfMatchesIter<'h, F>
725where
726 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
727{
728 type Item = Result<HalfMatch, MatchError>;
729
730 #[inline]
731 fn next(&mut self) -> Option<Result<HalfMatch, MatchError>> {
732 self.it.try_advance_half(&mut self.finder).transpose()
733 }
734}
735
736impl<'h, F> core::fmt::Debug for TryHalfMatchesIter<'h, F> {
737 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
738 f.debug_struct("TryHalfMatchesIter")
739 .field("it", &self.it)
740 .field("finder", &"<closure>")
741 .finish()
742 }
743}
744
745/// An iterator over all non-overlapping half matches for an infallible search.
746///
747/// The iterator yields a [`HalfMatch`] value until no more matches could be
748/// found.
749///
750/// The type parameters are as follows:
751///
752/// * `F` represents the type of a closure that executes the search.
753///
754/// The lifetime parameters come from the [`Input`] type:
755///
756/// * `'h` is the lifetime of the underlying haystack.
757///
758/// When possible, prefer the iterators defined on the regex engine you're
759/// using. This tries to abstract over the regex engine and is thus a bit more
760/// unwieldy to use.
761///
762/// This iterator is created by [`Searcher::into_half_matches_iter`] and
763/// then calling [`TryHalfMatchesIter::infallible`].
764#[derive(Debug)]
765pub struct HalfMatchesIter<'h, F>(TryHalfMatchesIter<'h, F>);
766
767impl<'h, F> HalfMatchesIter<'h, F> {
768 /// Returns the current `Input` used by this iterator.
769 ///
770 /// The `Input` returned is generally equivalent to the one used to
771 /// construct this iterator, but its start position may be different to
772 /// reflect the start of the next search to be executed.
773 pub fn input<'i>(&'i self) -> &'i Input<'h> {
774 self.0.it.input()
775 }
776}
777
778impl<'h, F> Iterator for HalfMatchesIter<'h, F>
779where
780 F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
781{
782 type Item = HalfMatch;
783
784 #[inline]
785 fn next(&mut self) -> Option<HalfMatch> {
786 match self.0.next()? {
787 Ok(m) => Some(m),
788 Err(err) => panic!(
789 "unexpected regex half find error: {err}\n\
790 to handle find errors, use 'try' or 'search' methods",
791 ),
792 }
793 }
794}
795
796/// An iterator over all non-overlapping matches for a fallible search.
797///
798/// The iterator yields a `Result<Match, MatchError>` value until no more
799/// matches could be found.
800///
801/// The type parameters are as follows:
802///
803/// * `F` represents the type of a closure that executes the search.
804///
805/// The lifetime parameters come from the [`Input`] type:
806///
807/// * `'h` is the lifetime of the underlying haystack.
808///
809/// When possible, prefer the iterators defined on the regex engine you're
810/// using. This tries to abstract over the regex engine and is thus a bit more
811/// unwieldy to use.
812///
813/// This iterator is created by [`Searcher::into_matches_iter`].
814pub struct TryMatchesIter<'h, F> {
815 it: Searcher<'h>,
816 finder: F,
817}
818
819impl<'h, F> TryMatchesIter<'h, F> {
820 /// Return an infallible version of this iterator.
821 ///
822 /// Any item yielded that corresponds to an error results in a panic. This
823 /// is useful if your underlying regex engine is configured in a way that
824 /// it is guaranteed to never return an error.
825 pub fn infallible(self) -> MatchesIter<'h, F> {
826 MatchesIter(self)
827 }
828
829 /// Returns the current `Input` used by this iterator.
830 ///
831 /// The `Input` returned is generally equivalent to the one used to
832 /// construct this iterator, but its start position may be different to
833 /// reflect the start of the next search to be executed.
834 pub fn input<'i>(&'i self) -> &'i Input<'h> {
835 self.it.input()
836 }
837}
838
839impl<'h, F> Iterator for TryMatchesIter<'h, F>
840where
841 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
842{
843 type Item = Result<Match, MatchError>;
844
845 #[inline]
846 fn next(&mut self) -> Option<Result<Match, MatchError>> {
847 self.it.try_advance(&mut self.finder).transpose()
848 }
849}
850
851impl<'h, F> core::fmt::Debug for TryMatchesIter<'h, F> {
852 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
853 f.debug_struct("TryMatchesIter")
854 .field("it", &self.it)
855 .field("finder", &"<closure>")
856 .finish()
857 }
858}
859
860/// An iterator over all non-overlapping matches for an infallible search.
861///
862/// The iterator yields a [`Match`] value until no more matches could be found.
863///
864/// The type parameters are as follows:
865///
866/// * `F` represents the type of a closure that executes the search.
867///
868/// The lifetime parameters come from the [`Input`] type:
869///
870/// * `'h` is the lifetime of the underlying haystack.
871///
872/// When possible, prefer the iterators defined on the regex engine you're
873/// using. This tries to abstract over the regex engine and is thus a bit more
874/// unwieldy to use.
875///
876/// This iterator is created by [`Searcher::into_matches_iter`] and
877/// then calling [`TryMatchesIter::infallible`].
878#[derive(Debug)]
879pub struct MatchesIter<'h, F>(TryMatchesIter<'h, F>);
880
881impl<'h, F> MatchesIter<'h, F> {
882 /// Returns the current `Input` used by this iterator.
883 ///
884 /// The `Input` returned is generally equivalent to the one used to
885 /// construct this iterator, but its start position may be different to
886 /// reflect the start of the next search to be executed.
887 pub fn input<'i>(&'i self) -> &'i Input<'h> {
888 self.0.it.input()
889 }
890}
891
892impl<'h, F> Iterator for MatchesIter<'h, F>
893where
894 F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
895{
896 type Item = Match;
897
898 #[inline]
899 fn next(&mut self) -> Option<Match> {
900 match self.0.next()? {
901 Ok(m) => Some(m),
902 Err(err) => panic!(
903 "unexpected regex find error: {err}\n\
904 to handle find errors, use 'try' or 'search' methods",
905 ),
906 }
907 }
908}
909
910/// An iterator over all non-overlapping captures for a fallible search.
911///
912/// The iterator yields a `Result<Captures, MatchError>` value until no more
913/// matches could be found.
914///
915/// The type parameters are as follows:
916///
917/// * `F` represents the type of a closure that executes the search.
918///
919/// The lifetime parameters come from the [`Input`] type:
920///
921/// * `'h` is the lifetime of the underlying haystack.
922///
923/// When possible, prefer the iterators defined on the regex engine you're
924/// using. This tries to abstract over the regex engine and is thus a bit more
925/// unwieldy to use.
926///
927/// This iterator is created by [`Searcher::into_captures_iter`].
928#[cfg(feature = "alloc")]
929pub struct TryCapturesIter<'h, F> {
930 it: Searcher<'h>,
931 caps: Captures,
932 finder: F,
933}
934
935#[cfg(feature = "alloc")]
936impl<'h, F> TryCapturesIter<'h, F> {
937 /// Return an infallible version of this iterator.
938 ///
939 /// Any item yielded that corresponds to an error results in a panic. This
940 /// is useful if your underlying regex engine is configured in a way that
941 /// it is guaranteed to never return an error.
942 pub fn infallible(self) -> CapturesIter<'h, F> {
943 CapturesIter(self)
944 }
945}
946
947#[cfg(feature = "alloc")]
948impl<'h, F> Iterator for TryCapturesIter<'h, F>
949where
950 F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
951{
952 type Item = Result<Captures, MatchError>;
953
954 #[inline]
955 fn next(&mut self) -> Option<Result<Captures, MatchError>> {
956 let TryCapturesIter { ref mut it, ref mut caps, ref mut finder } =
957 *self;
958 let result = it
959 .try_advance(|input| {
960 (finder)(input, caps)?;
961 Ok(caps.get_match())
962 })
963 .transpose()?;
964 match result {
965 Ok(_) => Some(Ok(caps.clone())),
966 Err(err) => Some(Err(err)),
967 }
968 }
969}
970
971#[cfg(feature = "alloc")]
972impl<'h, F> core::fmt::Debug for TryCapturesIter<'h, F> {
973 fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
974 f.debug_struct("TryCapturesIter")
975 .field("it", &self.it)
976 .field("caps", &self.caps)
977 .field("finder", &"<closure>")
978 .finish()
979 }
980}
981
982/// An iterator over all non-overlapping captures for an infallible search.
983///
984/// The iterator yields a [`Captures`] value until no more matches could be
985/// found.
986///
987/// The type parameters are as follows:
988///
989/// * `F` represents the type of a closure that executes the search.
990///
991/// The lifetime parameters come from the [`Input`] type:
992///
993/// * `'h` is the lifetime of the underlying haystack.
994///
995/// When possible, prefer the iterators defined on the regex engine you're
996/// using. This tries to abstract over the regex engine and is thus a bit more
997/// unwieldy to use.
998///
999/// This iterator is created by [`Searcher::into_captures_iter`] and then
1000/// calling [`TryCapturesIter::infallible`].
1001#[cfg(feature = "alloc")]
1002#[derive(Debug)]
1003pub struct CapturesIter<'h, F>(TryCapturesIter<'h, F>);
1004
1005#[cfg(feature = "alloc")]
1006impl<'h, F> Iterator for CapturesIter<'h, F>
1007where
1008 F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
1009{
1010 type Item = Captures;
1011
1012 #[inline]
1013 fn next(&mut self) -> Option<Captures> {
1014 match self.0.next()? {
1015 Ok(m) => Some(m),
1016 Err(err) => panic!(
1017 "unexpected regex captures error: {err}\n\
1018 to handle find errors, use 'try' or 'search' methods",
1019 ),
1020 }
1021 }
1022}