⚠️ This package is under active development which will include breaking changes. ⚠️

Regex for Humans

The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.

Example usage

Matching a date

If you want to match a date of the format 2021-10-30, you would use the following code to generate a regex:

use human_regex::{begin, digit, end, exactly, text};

fn main() {
    let regex_string = begin()
        + exactly(4, digit())
        + text("-")
        + exactly(2, digit())
        + text("-")
        + exactly(2, digit())
        + end();
    println!("{}", regex_string.to_regex().is_match("2014-01-01"))
}

Roadmap

The eventual goal of this crate is to support all the syntax in the core Rust regex library through a human-readable API. Here is where we currently stand:

Character Classes

Single Character

Implemented?	Expression	Description
`any()`	`.`	any character except new line (includes new line with s flag)
`digit()`	`\d`	digit (\p{Nd})
`non_digit()`	`\D`	not digit
	`\pN`	One-letter name Unicode character class
	`\p{Greek}`	Unicode character class (general category or script)
	`\PN`	Negated one-letter name Unicode character class
	`\P{Greek}`	negated Unicode character class (general category or script)

Perl Character Classes

Implemented?	Expression	Description
`digit()`	`\d`	digit (\p{Nd})
`non_digit()`	`\D`	not digit
`whitespace()`	`\s`	whitespace (\p{White_Space})
`non_whitespace()`	`\S`	not whitespace
`word()`	`\w`	word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
`non_word()`	`\W`	not word character

ASCII Character Classes

Implemented?	Expression	Description
	`[[:alnum:]]`	alphanumeric ([0-9A-Za-z])
	`[[:alpha:]]`	alphabetic ([A-Za-z])
	`[[:ascii:]]`	ASCII ([\x00-\x7F])
	`[[:blank:]]`	blank ([\t ])
	`[[:cntrl:]]`	control ([\x00-\x1F\x7F])
`digit()`	`[[:digit:]]`	digits ([0-9])
	`[[:graph:]]`	graphical ([!-~])
	`[[:lower:]]`	lower case ([a-z])
	`[[:print:]]`	printable ([ -~])
	`[[:punct:]]`	punctuation ([!-/:-@[-`{-~])
	`[[:space:]]`	whitespace ([\t\n\v\f\r ])
	`[[:upper:]]`	upper case ([A-Z])
`word()`	`[[:word:]]`	word characters ([0-9A-Za-z_])
	`[[:xdigit:]]`	hex digit ([0-9A-Fa-f])

Repetitions

Implemented?	Expression	Description
`zero_or_more(x)`	`x*`	zero or more of x (greedy)
`one_or_more(x)`	`x+`	one or more of x (greedy)
`zero_or_one(x)`	`x?`	zero or one of x (greedy)
`zero_or_more(x)`	`x*?`	zero or more of x (ungreedy/lazy)
`one_or_more(x).lazy()`	`x+?`	one or more of x (ungreedy/lazy)
`zero_or_more(x).lazy()`	`x??`	zero or one of x (ungreedy/lazy)
`between(n, m, x)`	`x{n,m}`	at least n x and at most m x (greedy)
`at_least(n, x)`	`x{n,}`	at least n x (greedy)
`exactly(n, x)`	`x{n}`	exactly n x
`between(n, m, x).lazy()`	`x{n,m}?`	at least n x and at most m x (ungreedy/lazy)
`at_least(n, x).lazy()`	`x{n,}?`	at least n x (ungreedy/lazy)

Composites

Implemented?	Expression	Description
`+`	`xy`	concatenation (x followed by y)
`or()`	`x\|y`	alternation (x or y, prefer x)

Empty matches

Implemented?	Expression	Description
`begin()`	`^`	the beginning of text (or start-of-line with multi-line mode)
`end()`	`$`	the end of text (or end-of-line with multi-line mode)
	`\A`	only the beginning of text (even with multi-line mode enabled)
	`\z`	only the end of text (even with multi-line mode enabled)
`word_boundary()`	`\b`	a Unicode word boundary (\w on one side and \W, \A, or \z on other)
`non_word_boundary()`	`\B`	not a Unicode word boundary

Groupings and Flags

Implemented?	Expression	Description
	`(exp)`	numbered capture group (indexed by opening parenthesis)
	`(?P<name>exp)`	named (also numbered) capture group
Handled implicitly through functional composition	`(?:exp)`	non-capturing group
	`(?flags)`	set flags within current group
	`(?flags:exp)`	set flags for exp (non-capturing)

Implemented?	Expression	Description
	`i`	case-insensitive: letters match both upper and lower case
	`m`	multi-line mode: `^` and `$` match begin/end of line
	`s`	allow `.` to match `\n`
	`U`	swap the meaning of `x` and `x`?
	`u`	Unicode support (enabled by default)
	`x`	ignore whitespace and allow line comments (starting with `#`)

human_regex 0.1.3