Crate rules [] [src]

Rules

Rules uses regular expressions to do pattern matching using syntax based upon the Perl 6 regex grammar. The Perl 6 grammar has been heavily revised from Perl 5 and should not be equated with it. This may look nothing like any regex you have seen before.

Note

The only real currently available method is is_match().

Syntax

Currently, this is designed for ASCII and may not behave properly with Unicode.

Whitespace is generally ignored so that a regex can be more readable and less dense.

r"fred"    // Normal way
r"f r e d" // Completely equivalent
// Will match `apples_oranges` or any other deliminator
r"apples . oranges"

Literals

Alphanumerics, underscores (_), and everything enclosed within quotes (") and ticks (') are the only literals.

hello_world   // Matches `hello_world`.
"carrot cake" // Matches `carrot cake`.
'apple pie'   // Matches `apple pie`.

Everything else must be escaped with a backslash (\*) to literally match.

it\'s\ my\ birthday // Matches `it's my birthday`.

Chevrons: <>

Chevrons are considered a metacharacter grouping operator whose behaviour changes depending on the first character found inside. The behavior for each different character is:

First character Example Result
Whitespace < big small > Alternative quotes matches [ 'big' | 'small' ]
alphabetic <alpha> Named character class which capture
? <?before foo> A positive zero width assertion
! <!before foo> A negative zero width assertion
[ <[ ab ]> A character class matches [ 'a' | 'b' ]
- <-[a] + [b]> Negated character class: [ab] negated
+ <+ [a] > Doesn't modify the class.

Lookaround

  • lookahead - foo <?after bar> matches foo in foobar
  • negative lookahead - foo <!after bar> matches foo in foobaz
  • lookbehind - <?before foo> bar matches bar in foobar
  • negative lookbehind - <!before foo> bar matches bar in sushibar

An example with both: <?before foo> bar <?after baz> matches bar in foobarbaz

Set operators

These operators can be applied to groups which will be analyzed later:

+       Union                // [123] + [345] = [12345]
|       Union                // Same
&       Intersection         // [123] & [345] = [3]
-       Difference           // [123] - [345] = [12]
^       Symmetric difference // [123] ^ [345] = [1245]

Character classes

Default character classes

Character Matches Inverse
. Any character N/A
\d Digit \D
\h Horizontal whitespace \H
\n Newline \N
\s Any whitespace \S
\t Tab \T
\w Alphanumeric or _ \W

Custom character classes

Characters inside a set of <[ ]> form a custom character class:

// Matches `a` or `b` or `c`
<[ a b c ]>

// `..` expresses a range so this matches
// from `a` to `g` or a digit
<[ a .. g \d ]>

// The `[]` bind the sets together into (non-capturing)
// groups so set operators can be used.
<[0-9] - [13579]> // Matches an even number
<\d - [13579]>    // Same

Comments

Comments are allowed inside a regex.

// This matches `myregex`
r"my // This is a comment which goes to the end of the line
regex"

Modules

re