Reggy
Friendly regular expressions for text analytics. Typical regex features are removed/adjusted to make natural language queries easier. Able to incrementally match streaming text.
API Usage
// use the high-level Pattern API for simple use cases
let mut p = new.unwrap;
assert_eq!
// transpile to normal (https://docs.rs/regex/) syntax
let ast = parse.unwrap;
assert_eq!;
// perform an incremental search with several patterns at once
let money = parse.unwrap;
let people = parse.unwrap;
let mut search = new;
// call step() to begin searching a stream
let jane_match = Match ;
assert_eq!;
// call step() again to continue with the same search state
// note "John Doe" matches across the step boundary
let john_match = Match ;
let money_match_1 = Match ;
assert_eq!;
// call finish() to retrieve any pending matches once the stream is done
let money_match_2 = Match ;
assert_eq!;
Pattern Language
Reggy is case-insensitive by default. Spaces match any amount of whitespace (i.e. \s+). All the reserved characters mentioned below (\, (, ), ?, |, *, +, and !) may be escaped with a backslash for a literal match. Patterns are surrounded by implicit unicode word boundaries (i.e. \b).
Examples
Make a letter optional with ?
dogs? matches dog and dogs
Create two or more options with |
dog|cat matches dog and cat
Perform operations on groups of characters with (...)
the qualit(y|ies) required matches the quality required and the qualities required
the only( one)? around matches the only around and the only one around
Create a case-sensitive group with (!...)
United States of America|(!USA) matches USA, not usa
Match digits with \d
\d.\d\d matches 3.14
Match zero-or-more characters with *, or one-or-more characters with +
$(\d?\d?\d,)*\d?\d?\d.\d\d matches $20.66 and $4,670,055.32