
§Regex for Humans
The goal of this crate is simple: give everybody the power of regular expressions without having
to learn the complicated syntax. It is inspired by ReadableRegex.jl.
This crate is a wrapper around the core Rust regex library.
§Example usage
If you want to match a date of the format 2021-10-30
, you could use the following code to generate a regex:
use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
+ exactly(4, digit())
+ text("-")
+ exactly(2, digit())
+ text("-")
+ exactly(2, digit())
+ end();
assert!(regex_string.to_regex().is_match("2014-01-01"));
The to_regex()
method returns a standard Rust regex. We can do this another way with slightly less repetition though!
use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
+ exactly(4, digit())
+ exactly(2, first_regex_string)
+ end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));
For a more extensive set of examples, please see The Cookbook.
§Features
This crate currently supports the vast majority of syntax available in the core Rust regex library through a human-readable API.
§Single Character
Implemented? | Expression | Description |
any() | . | any character except new line (includes new line with s flag) |
digit() | \d | digit (\p{Nd} ) |
non_digit() | \D | not digit |
unicode_category(UnicodeCategory) | \p{L} | Unicode non-script category |
unicode_script(UnicodeScript) | \p{Greek} | Unicode script category |
non_unicode_category(UnicodeCategory) | \P{L} | Negated one-letter name Unicode character class |
non_unicode_script(UnicodeCategory) | \P{Greek} | negated Unicode character class (general category or script) |
§Character Classes
Implemented? | Expression | Description |
or(&['x', 'y', 'z']) | [xyz] | A character class matching either x, y or z (union). |
nor(&['x', 'y', 'z']) | [^xyz] | A character class matching any character except x, y and z. |
within('a'..='z') | [a-z] | A character class matching any character in range a-z. |
without('a'..='z') | [^a-z] | A character class matching any character outside range a-z. |
See below | [[:alpha:]] | ASCII character class ([A-Za-z] ) |
non_alphanumeric() | [[:^alpha:]] | Negated ASCII character class ([^A-Za-z] ) |
or() | [x[^xyz]] | Nested/grouping character class (matching any character except y and z) |
and(&[]) /& | [a-y&&xyz] | Intersection (a-y AND xyz = xy) |
(or[1,2,3,4] & nor(3)) | [0-9&&[^4]] | Subtraction using intersection and negation (matching 0-9 except 4) |
subtract(&[],&[]) | [0-9--4] | Direct subtraction (matching 0-9 except 4). Use .collect::<Vec> to use ranges. |
xor(&[],&[]) | [a-g~~b-h] | Symmetric difference (matching a and h only). Requires .collect() for ranges. |
or(&escape_all(&['[',']'])) | [\[\]] | Escaping in character classes (matching [ or ] ) |
§Perl Character Classes
Implemented? | Expression | Description |
digit() | \d | digit (\p{Nd} ) |
non_digit() | \D | not digit |
whitespace() | \s | whitespace (\p{White_Space} ) |
non_whitespace() | \S | not whitespace |
word() | \w | word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control} ) |
non_word() | \W | not word character |
§ASCII Character Classes
Implemented? | Expression | Description |
alphanumeric() | [[:alnum:]] | alphanumeric ([0-9A-Za-z] ) |
alphabetic() | [[:alpha:]] | alphabetic ([A-Za-z] ) |
ascii() | [[:ascii:]] | ASCII ([\x00-\x7F] ) |
blank() | [[:blank:]] | blank ([\t ] ) |
control() | [[:cntrl:]] | control ([\x00-\x1F\x7F] ) |
digit() | [[:digit:]] | digits ([0-9] ) |
graphical() | [[:graph:]] | graphical ([!-~] ) |
uppercase() | [[:lower:]] | lower case ([a-z] ) |
printable() | [[:print:]] | printable ([ -~] ) |
punctuation() | [[:punct:]] | punctuation ([!-/:-@\[-`{-~] ) |
whitespace() | [[:space:]] | whitespace ([\t\n\v\f\r ] ) |
lowercase() | [[:upper:]] | upper case ([A-Z] ) |
word() | [[:word:]] | word characters ([0-9A-Za-z_] ) |
hexdigit() | [[:xdigit:]] | hex digit ([0-9A-Fa-f] ) |
§Repetitions
Implemented? | Expression | Description |
zero_or_more(x) | x* | zero or more of x (greedy) |
one_or_more(x) | x+ | one or more of x (greedy) |
zero_or_one(x) | x? | zero or one of x (greedy) |
zero_or_more(x) | x*? | zero or more of x (ungreedy/lazy) |
one_or_more(x).lazy() | x+? | one or more of x (ungreedy/lazy) |
zero_or_more(x).lazy() | x?? | zero or one of x (ungreedy/lazy) |
between(n, m, x) | x{n,m} | at least n x and at most m x (greedy) |
at_least(n, x) | x{n,} | at least n x (greedy) |
exactly(n, x) | x{n} | exactly n x |
between(n, m, x).lazy() | x{n,m}? | at least n x and at most m x (ungreedy/lazy) |
at_least(n, x).lazy() | x{n,}? | at least n x (ungreedy/lazy) |
§Composites
Implemented? | Expression | Description |
+ | xy | concatenation (x followed by y) |
or() | x|y | alternation (x or y, prefer x) |
§Empty matches
Implemented? | Expression | Description |
beginning() | ^ | the beginning of text (or start-of-line with multi-line mode) |
end() | $ | the end of text (or end-of-line with multi-line mode) |
beginning_of_text() | \A | only the beginning of text (even with multi-line mode enabled) |
end_of_text() | \z | only the end of text (even with multi-line mode enabled) |
word_boundary() | \b | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
non_word_boundary() | \B | not a Unicode word boundary |
§Groupings
Implemented? | Expression | Description |
capture(exp) | (exp) | numbered capture group (indexed by opening parenthesis) |
named_capture(exp, name) | (?P<name>exp) | named (also numbered) capture group |
Handled implicitly through functional composition | (?:exp) | non-capturing group |
See below | (?flags) | set flags within current group |
See below | (?flags:exp) | set flags for exp (non-capturing) |
§Flags
Implemented? | Expression | Description |
case_insensitive(exp) | i | case-insensitive: letters match both upper and lower case |
multi_line_mode(exp) | m | multi-line mode: ^ and $ match begin/end of line |
dot_matches_newline_too(exp) | s | allow . to match \n |
will not be implemented1 | U | swap the meaning of x* and x*? |
disable_unicode(exp) | u | Unicode support (enabled by default) |
will not be implemented2 | x | ignore whitespace and allow line comments (starting with # ) |
- With the declarative nature of this library, use of this flag would just obfuscate meaning.
- When using
human_regex
, comments should be added in source code rather than in the regex string.