[][src]Crate grex

1. What does this tool do?

grex is a library as well as a command-line utility that is meant to simplify the often complicated and tedious task of creating regular expressions. It does so by automatically generating regular expressions from user-provided test cases.

This project has started as a Rust port of the JavaScript tool regexgen written by Devon Govett. Although a lot of further useful features could be added to it, its development was apparently ceased several years ago. The plan is now to add these new features to grex as Rust really shines when it comes to command-line tools. grex offers all features that regexgen provides, and more.

The philosophy of this project is to generate the most specific regular expression possible by default which exactly matches the given input only and nothing else. With the use of command-line flags (in the CLI tool) or preprocessing methods (in the library), more generalized expressions can be created.

2. Current features

  • literals
  • character classes
  • detection of common prefixes and suffixes
  • detection of repeated substrings and conversion to {min,max} quantifier notation
  • alternation using | operator
  • optionality using ? quantifier
  • escaping of non-ascii characters, with optional conversion of astral code points to surrogate pairs
  • concatenation of all of the former
  • reading input strings from the command-line or from a file

3. How to use?

The code snippets below show how to use the public api.

For more detailed examples, please take a look at the project's readme file on GitHub.

3.1 Default settings

let regexp = grex::RegExpBuilder::from(&["a", "aa", "aaa"]).build();
assert_eq!(regexp, "^a(aa?)?$");

3.2 Convert repeated substrings

let regexp = grex::RegExpBuilder::from(&["a", "aa", "aaa"])
    .with_converted_repetitions()
    .build();
assert_eq!(regexp, "^a{1,3}$");

3.3 Escape non-ascii characters

let regexp = grex::RegExpBuilder::from(&["You smell like 💩."])
    .with_escaped_non_ascii_chars(false)
    .build();
assert_eq!(regexp, "^You smell like \\u{1f4a9}\\.$");

3.4 Escape astral code points using surrogate pairs

Old versions of JavaScript do not support unicode escape sequences for the astral code planes (range U+010000 to U+10FFFF). In order to support these symbols in JavaScript regular expressions, the conversion to surrogate pairs is necessary. More information on that matter can be found here.

let regexp = grex::RegExpBuilder::from(&["You smell like 💩."])
    .with_escaped_non_ascii_chars(true)
    .build();
assert_eq!(regexp, "^You smell like \\u{d83d}\\u{dca9}\\.$");

3.5 Combine multiple features

let regexp = grex::RegExpBuilder::from(&["You smell like 💩💩💩."])
    .with_converted_repetitions()
    .with_escaped_non_ascii_chars(false)
    .build();
assert_eq!(regexp, "^You smel{2} like \\u{1f4a9}{3}\\.$");

Structs

RegExpBuilder

This struct builds regular expressions from user-provided test cases.