Expand description
§Rules
Rules uses regular expressions to do pattern matching using syntax based upon the Perl 6 regex grammar. The Perl 6 grammar has been heavily revised from Perl 5 and should not be equated with it. This may look nothing like any regex you have seen before.
§Note
The only real currently available method is [is_match()
]
(re/struct.Regex.html#method.is_match).
§Syntax
Currently, this is designed for ASCII and may not behave properly with Unicode.
Whitespace is generally ignored so that a regex can be more readable and less dense.
r"fred" // Normal way
r"f r e d" // Completely equivalent
// Will match `apples_oranges` or any other deliminator
r"apples . oranges"
§Literals
Alphanumerics, underscores (_
), and everything enclosed within
quotes ("
) and ticks ('
) are the only literals.
hello_world // Matches `hello_world`.
"carrot cake" // Matches `carrot cake`.
'apple pie' // Matches `apple pie`.
Everything else must be escaped with a backslash (\*
) to literally match.
it\'s\ my\ birthday // Matches `it's my birthday`.
§Chevrons: <>
Chevrons are considered a metacharacter grouping operator whose behaviour changes depending on the first character found inside. The behavior for each different character is:
First character | Example | Result |
---|---|---|
Whitespace | < big small > | Alternative quotes matches `[ ‘big’ |
alphabetic | <alpha> | Named character class which capture |
? | <?before foo> | A positive zero width assertion |
! | <!before foo> | A negative zero width assertion |
[ | <[ ab ]> | A character class matches `[ ‘a’ |
- | <-[a] + [b]> | Negated character class: [ab] negated |
+ | <+ [a] > | Doesn’t modify the class. |
§Lookaround
- lookahead -
foo <?after bar>
matchesfoo
infoobar
- negative lookahead -
foo <!after bar>
matchesfoo
infoobaz
- lookbehind -
<?before foo> bar
matchesbar
infoobar
- negative lookbehind -
<!before foo> bar
matchesbar
insushibar
An example with both:
<?before foo> bar <?after baz>
matches bar
in foobarbaz
§Set operators
These operators can be applied to groups which will be analyzed later:
+ Union // [123] + [345] = [12345]
| Union // Same
& Intersection // [123] & [345] = [3]
- Difference // [123] - [345] = [12]
^ Symmetric difference // [123] ^ [345] = [1245]
§Character classes
§Default character classes
Character | Matches | Inverse |
---|---|---|
. | Any character | N/A |
\d | Digit | \D |
\h | Horizontal whitespace | \H |
\n | Newline | \N |
\s | Any whitespace | \S |
\t | Tab | \T |
\w | Alphanumeric or _ | \W |
§Custom character classes
Characters inside a set of <[ ]>
form a custom character
class:
// Matches `a` or `b` or `c`
<[ a b c ]>
// `..` expresses a range so this matches
// from `a` to `g` or a digit
<[ a .. g \d ]>
// The `[]` bind the sets together into (non-capturing)
// groups so set operators can be used.
<[0-9] - [13579]> // Matches an even number
<\d - [13579]> // Same
§Comments
Comments are allowed inside a regex.
// This matches `myregex`
r"my // This is a comment which goes to the end of the line
regex"