human_regex 0.1.3

A regex library for humans
Documentation
⚠️ This package is under active development which will include breaking changes. ⚠️

Regex for Humans

The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.

Example usage

Matching a date

If you want to match a date of the format 2021-10-30, you would use the following code to generate a regex:

use human_regex::{begin, digit, end, exactly, text};

fn main() {
    let regex_string = begin()
        + exactly(4, digit())
        + text("-")
        + exactly(2, digit())
        + text("-")
        + exactly(2, digit())
        + end();
    println!("{}", regex_string.to_regex().is_match("2014-01-01"))
}

Roadmap

The eventual goal of this crate is to support all the syntax in the core Rust regex library through a human-readable API. Here is where we currently stand:

Character Classes

Single Character

Implemented? Expression Description
any() . any character except new line (includes new line with s flag)
digit() \d digit (\p{Nd})
non_digit() \D not digit
\pN One-letter name Unicode character class
\p{Greek} Unicode character class (general category or script)
\PN Negated one-letter name Unicode character class
\P{Greek} negated Unicode character class (general category or script)

Perl Character Classes

Implemented? Expression Description
digit() \d digit (\p{Nd})
non_digit() \D not digit
whitespace() \s whitespace (\p{White_Space})
non_whitespace() \S not whitespace
word() \w word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
non_word() \W not word character

ASCII Character Classes

Implemented? Expression Description
[[:alnum:]] alphanumeric ([0-9A-Za-z])
[[:alpha:]] alphabetic ([A-Za-z])
[[:ascii:]] ASCII ([\x00-\x7F])
[[:blank:]] blank ([\t ])
[[:cntrl:]] control ([\x00-\x1F\x7F])
digit() [[:digit:]] digits ([0-9])
[[:graph:]] graphical ([!-~])
[[:lower:]] lower case ([a-z])
[[:print:]] printable ([ -~])
[[:punct:]] punctuation ([!-/:-@[-`{-~])
[[:space:]] whitespace ([\t\n\v\f\r ])
[[:upper:]] upper case ([A-Z])
word() [[:word:]] word characters ([0-9A-Za-z_])
[[:xdigit:]] hex digit ([0-9A-Fa-f])

Repetitions

Implemented? Expression Description
zero_or_more(x) x* zero or more of x (greedy)
one_or_more(x) x+ one or more of x (greedy)
zero_or_one(x) x? zero or one of x (greedy)
zero_or_more(x) x*? zero or more of x (ungreedy/lazy)
one_or_more(x).lazy() x+? one or more of x (ungreedy/lazy)
zero_or_more(x).lazy() x?? zero or one of x (ungreedy/lazy)
between(n, m, x) x{n,m} at least n x and at most m x (greedy)
at_least(n, x) x{n,} at least n x (greedy)
exactly(n, x) x{n} exactly n x
between(n, m, x).lazy() x{n,m}? at least n x and at most m x (ungreedy/lazy)
at_least(n, x).lazy() x{n,}? at least n x (ungreedy/lazy)

Composites

Implemented? Expression Description
+ xy concatenation (x followed by y)
or() x|y alternation (x or y, prefer x)

Empty matches

Implemented? Expression Description
begin() ^ the beginning of text (or start-of-line with multi-line mode)
end() $ the end of text (or end-of-line with multi-line mode)
\A only the beginning of text (even with multi-line mode enabled)
\z only the end of text (even with multi-line mode enabled)
word_boundary() \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
non_word_boundary() \B not a Unicode word boundary

Groupings and Flags

Implemented? Expression Description
(exp) numbered capture group (indexed by opening parenthesis)
(?P<name>exp) named (also numbered) capture group
Handled implicitly through functional composition (?:exp) non-capturing group
(?flags) set flags within current group
(?flags:exp) set flags for exp (non-capturing)
Implemented? Expression Description
i case-insensitive: letters match both upper and lower case
m multi-line mode: ^ and $ match begin/end of line
s allow . to match \n
U swap the meaning of x* and x*?
u Unicode support (enabled by default)
x ignore whitespace and allow line comments (starting with #)