regex_cursor/lib.rs
1/*!
2This crate provides routines for searching **discontiguous strings** for matches
3of a regular expression (aka "regex"). It is based on regex-automata and
4most of the code is adapted from the various crates in the
5[regex](https://github.com/rust-lang/regex) repository.
6
7It is intended as a prototype for upstream support for "streaming regex". The
8cursor based API in this crate is very similar to the API already exposed by
9`regex`/`regex-automata`. To that end a generic `Cursor` trait is provided that
10collections can implement.
11
12A sketch of the cursor API is shown below. The string is yielded in multiple
13byte chunks. Calling advance moves the cursor to the next chunk. Calling
14backtrack moves the cursor a chunk back. Backtracking is required by this
15crate. That makes it unsuitable for searching fully unbuffered streams like
16bytes send over a TCP connection.
17
18```rust_ignore
19pub trait Cursor {
20 fn chunk(&self) -> &[u8] { .. }
21 fn advance(&mut self) -> bool { .. }
22 fn bracktrack(&mut self) -> bool { .. }
23}
24```
25
26Working on this crate showed me that regex backtracks a lot more than expected
27with most functionality fundamentally requiring backtracking. For network
28usecases that do not buffer their input the primary usecase would likely be
29detecting a match (without necessarily requiring the matched byte range).
30Such usecases can be covered by manually feeding bytes into the hybrid and DFA
31engines from the regex-automata crate. This approach also has the advantage
32of allowing the caller to pause the match (async) while waiting for more data
33allowing the caller to drive the search instead of the engine itself.
34
35The only part of this crate that could be applied to the fully streaming case is
36the streaming PikeVM implementation. However, there are some limitations:
37* only a single search can be run since the PikeVM may look ahead multiple bytes
38to disambiguate alternative matches
39* Prefilters longer than one byte can not work
40* utf-8 mode can not be supported (empty matches may occur between unicode
41boundaries)
42
43Currently, the PikeVM implementation is not written with this use case in mind
44and may call backtrack unnecessarily, but that could be addressed in the future,
45but especially the first point is very limiting. The pikevm also does not allow
46the user to drive the search and would block on network calls for example (no
47async).
48*/
49
50#[cfg(feature = "ropey")]
51pub use cursor::RopeyCursor;
52pub use cursor::{Cursor, IntoCursor};
53pub use input::Input;
54pub use regex_automata;
55
56mod cursor;
57pub mod engines;
58mod input;
59mod literal;
60mod util;
61
62#[cfg(test)]
63mod test_rope;
64#[cfg(test)]
65mod tests;