Crate human_regex

Expand description

§Regex for Humans

The goal of this crate is simple: give everybody the power of regular expressions without having to learn the complicated syntax. It is inspired by ReadableRegex.jl. This crate is a wrapper around the core Rust regex library.

§Example usage

If you want to match a date of the format 2021-10-30, you could use the following code to generate a regex:

use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
    + exactly(4, digit())
    + text("-")
    + exactly(2, digit())
    + text("-")
    + exactly(2, digit())
    + end();
assert!(regex_string.to_regex().is_match("2014-01-01"));

The to_regex() method returns a standard Rust regex. We can do this another way with slightly less repetition though!

use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
    + exactly(4, digit())
    + exactly(2, first_regex_string)
    + end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));

For a more extensive set of examples, please see The Cookbook.

§Features

This crate currently supports the vast majority of syntax available in the core Rust regex library through a human-readable API.

§Single Character

Implemented?	Expression	Description
`any()`	`.`	any character except new line (includes new line with s flag)
`digit()`	`\d`	digit (`\p{Nd}`)
`non_digit()`	`\D`	not digit
`unicode_category(UnicodeCategory)`	`\p{L}`	Unicode non-script category
`unicode_script(UnicodeScript)`	`\p{Greek}`	Unicode script category
`non_unicode_category(UnicodeCategory)`	`\P{L}`	Negated one-letter name Unicode character class
`non_unicode_script(UnicodeCategory)`	`\P{Greek}`	negated Unicode character class (general category or script)

§Character Classes

Implemented?	Expression	Description
`or(&['x', 'y', 'z'])`	`[xyz]`	A character class matching either x, y or z (union).
`nor(&['x', 'y', 'z'])`	`[^xyz]`	A character class matching any character except x, y and z.
`within('a'..='z')`	`[a-z]`	A character class matching any character in range a-z.
`without('a'..='z')`	`[^a-z]`	A character class matching any character outside range a-z.
See below	`[[:alpha:]]`	ASCII character class (`[A-Za-z]`)
`non_alphanumeric()`	`[[:^alpha:]]`	Negated ASCII character class (`[^A-Za-z]`)
`or()`	`[x[^xyz]]`	Nested/grouping character class (matching any character except y and z)
`and(&[])`/`&`	`[a-y&&xyz]`	Intersection (a-y AND xyz = xy)
`(or[1,2,3,4] & nor(3))`	`[0-9&&[^4]]`	Subtraction using intersection and negation (matching 0-9 except 4)
`subtract(&[],&[])`	`[0-9--4]`	Direct subtraction (matching 0-9 except 4). Use .collect::<Vec> to use ranges.
`xor(&[],&[])`	`[a-g~~b-h]`	Symmetric difference (matching `a` and `h` only). Requires .collect() for ranges.
`or(&escape_all(&['[',']']))`	`[\[\]]`	Escaping in character classes (matching `[` or `]`)

§Perl Character Classes

Implemented?	Expression	Description
`digit()`	`\d`	digit (`\p{Nd}`)
`non_digit()`	`\D`	not digit
`whitespace()`	`\s`	whitespace (`\p{White_Space}`)
`non_whitespace()`	`\S`	not whitespace
`word()`	`\w`	word character (`\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}`)
`non_word()`	`\W`	not word character

§ASCII Character Classes

Implemented?	Expression	Description
`alphanumeric()`	`[[:alnum:]]`	alphanumeric (`[0-9A-Za-z]`)
`alphabetic()`	`[[:alpha:]]`	alphabetic (`[A-Za-z]`)
`ascii()`	`[[:ascii:]]`	ASCII (`[\x00-\x7F]`)
`blank()`	`[[:blank:]]`	blank (`[\t ]`)
`control()`	`[[:cntrl:]]`	control (`[\x00-\x1F\x7F]`)
`digit()`	`[[:digit:]]`	digits (`[0-9]`)
`graphical()`	`[[:graph:]]`	graphical (`[!-~]`)
`uppercase()`	`[[:lower:]]`	lower case (`[a-z]`)
`printable()`	`[[:print:]]`	printable (`[ -~]`)
`punctuation()`	`[[:punct:]]`	punctuation ([!-/:-@\[-`{-~])
`whitespace()`	`[[:space:]]`	whitespace (`[\t\n\v\f\r ]`)
`lowercase()`	`[[:upper:]]`	upper case (`[A-Z]`)
`word()`	`[[:word:]]`	word characters (`[0-9A-Za-z_]`)
`hexdigit()`	`[[:xdigit:]]`	hex digit (`[0-9A-Fa-f]`)

§Repetitions

Implemented?	Expression	Description
`zero_or_more(x)`	`x*`	zero or more of x (greedy)
`one_or_more(x)`	`x+`	one or more of x (greedy)
`zero_or_one(x)`	`x?`	zero or one of x (greedy)
`zero_or_more(x)`	`x*?`	zero or more of x (ungreedy/lazy)
`one_or_more(x).lazy()`	`x+?`	one or more of x (ungreedy/lazy)
`zero_or_more(x).lazy()`	`x??`	zero or one of x (ungreedy/lazy)
`between(n, m, x)`	`x{n,m}`	at least n x and at most m x (greedy)
`at_least(n, x)`	`x{n,}`	at least n x (greedy)
`exactly(n, x)`	`x{n}`	exactly n x
`between(n, m, x).lazy()`	`x{n,m}?`	at least n x and at most m x (ungreedy/lazy)
`at_least(n, x).lazy()`	`x{n,}?`	at least n x (ungreedy/lazy)

§Composites

Implemented?	Expression	Description
`+`	`xy`	concatenation (x followed by y)
`or()`	`x\|y`	alternation (x or y, prefer x)

§Empty matches

Implemented?	Expression	Description
`beginning()`	`^`	the beginning of text (or start-of-line with multi-line mode)
`end()`	`$`	the end of text (or end-of-line with multi-line mode)
`beginning_of_text()`	`\A`	only the beginning of text (even with multi-line mode enabled)
`end_of_text()`	`\z`	only the end of text (even with multi-line mode enabled)
`word_boundary()`	`\b`	a Unicode word boundary (\w on one side and \W, \A, or \z on other)
`non_word_boundary()`	`\B`	not a Unicode word boundary

§Groupings

Implemented?	Expression	Description
`capture(exp)`	`(exp)`	numbered capture group (indexed by opening parenthesis)
`named_capture(exp, name)`	`(?P<name>exp)`	named (also numbered) capture group
Handled implicitly through functional composition	`(?:exp)`	non-capturing group
See below	`(?flags)`	set flags within current group
See below	`(?flags:exp)`	set flags for exp (non-capturing)

§Flags

Implemented?	Expression	Description
`case_insensitive(exp)`	`i`	case-insensitive: letters match both upper and lower case
`multi_line_mode(exp)`	`m`	multi-line mode: `^` and `$` match begin/end of line
`dot_matches_newline_too(exp)`	`s`	allow `.` to match `\n`
will not be implemented¹	`U`	swap the meaning of `x` and `x?`
`disable_unicode(exp)`	`u`	Unicode support (enabled by default)
will not be implemented²	`x`	ignore whitespace and allow line comments (starting with `#`)

With the declarative nature of this library, use of this flag would just obfuscate meaning.
When using human_regex, comments should be added in source code rather than in the regex string.

Modules§

ascii: Functions for ASCII character classes
capturing: Functions for capturing matches
cookbook: A Cookbook of Common Tasks
direct: Functions for directly matching text or adding known regex strings
emptymatches: Functions for the empty matches
flags: Functions for adding flags
logical: Functions for performing logical operations
repetitions: Functions for matching repetitions
shorthand: Functions for general purpose matches

Structs§

HumanRegex: The HumanRegex struct which maintains and updates the regex string. For most use cases it will never be necessary to instantiate this directly.

Enums§

UnicodeCategory: An enum covering all Unicode character categories
UnicodeScript: An enum for covering all Unicode script categories

Functions§

alphabetic: A function to match any alphabetic character ([A-Za-z])
alphanumeric: A function to match any alphanumeric character ([0-9A-Za-z])
and: A function for establishing an AND relationship between two or more possible matches
any: A function for matching any character (except for \n)
ascii: A function to match any ascii digit ([\x00-\x7F])
at_least: Match at least n of a certain target
beginning: A function to match the beginning of text (or start-of-line with multi-line mode)
beginning_of_text: A function to match the beginning of text (even with multi-line mode enabled)
between: Match at least n and at most m of a certain target
blank: A function to match blank characters ([\t ])
capture: Add a numbered capturing group around an expression
case_insensitive: Makes all matches case insensitive, matching both upper and lowercase letters.
control: A function to match control characters ([\x00-\x1F\x7F])
digit: A function for the digit character class (i.e., the digits 0 through 9)
disable_unicode: A function to disable unicode support
dot_matches_newline_too: A function that will allow . to match newlines (\n)
end: A function to match the end of text (or end-of-line with multi-line mode)
end_of_text: A function to match the end of text (even with multi-line mode enabled)
escape_all: Escapes an entire list for use in something like an [or] or an [and] expression.
exactly: Match exactly n of a certain target
graphical: A function to match graphical characters ([!-~])
hexdigit: A function to match any digit that would appear in a hexadecimal number ([A-Fa-f0-9])
lowercase: A function to match any lowercase character ([a-z])
multi_line_mode: Enables multiline mode, which will allow beginning() and end() to match the beginning and end of lines
named_capture: Add a named capturing group around an expression
non_alphabetic: A function to match any non-alphabetic character ([^A-Za-z])
non_alphanumeric: A function to match any non-alphanumeric character ([^0-9A-Za-z])
non_ascii: A function to match any non-ascii digit ([^\x00-\x7F])
non_blank: A function to match non-blank characters ([^\t ])
non_control: A function to match non-control characters ([^\x00-\x1F\x7F])
non_digit: A function for the non-digit character class (i.e., everything BUT the digits 0-9)
non_graphical: A function to match non-graphical characters ([^!-~])
non_hexdigit: A function to match any digit that wouldn’t appear in a hexadecimal number ([^A-Fa-f0-9])
non_lowercase: A function to match any non-lowercase character ([^a-z])
non_printable: A function to match unprintable characters ([^ -~])
non_punctuation: A function to match non-punctuation ([^!-/:-@\[-{-~]`)
non_unicode_category: A function for not matching Unicode character categories. For matching script categories see non_unicode_script.
non_unicode_script: A function for matching Unicode characters not belonging to a certain script category. For matching other categories see non_unicode_category.
non_uppercase: A function to match any non-uppercase character ([^A-Z])
non_whitespace: A function for the whitespace character class (i.e., everything BUT space and tab)
non_word: A function for the non-word character class (i.e., everything BUT the alphanumeric characters plus underscore)
non_word_boundary: A function to match anything BUT a word boundary
nonescaped_text: This text is not escaped. You can use it, for instance, to add a regex string directly to the object.
nor: Negated or relationship between two or more possible matches
one_or_more: Match one or more of a certain target
or: A function for establishing an OR relationship between two or more possible matches
printable: A function to match printable characters ([ -~])
punctuation: A function to match punctuation ([!-/:-@\[-{-~]`)
subtract: Subtracts the second argument from the first
text: Add matching text to the regex string. Text that is added through this function is automatically escaped.
unicode_category: A function for matching Unicode character categories. For matching script categories see unicode_script.
unicode_script: A function for matching Unicode characters belonging to a certain script category. For matching other categories see unicode_category.
uppercase: A function to match any uppercase character ([A-Z])
whitespace: A constant for the whitespace character class (i.e., space and tab)
within: Matches anything within a range of characters
without: Matches anything outside of a range of characters
word: A function for the word character class (i.e., all alphanumeric characters plus underscore)
word_boundary: A function to match a word boundary
xor: Xor on two bracketed expressions, also known as symmetric difference.
zero_or_more: Match zero or more of a certain target
zero_or_one: Match zero or one of a certain target

Crate human_regexCopy item path