pest. The Elegant Parser

pest is a PEG parser built with simplicity and speed in mind.

This crate works in conjunction with the pest crate by deriving a grammar implementation based on a provided grammar.

`.pest` files

Grammar definitions reside in custom .pest files located in the src directory. Their path is relative to src and is specified between the derive attribute and empty struct that Parser will be derived on.

Because of a limitation in procedural macros, there is no way for Cargo to know that a module needs to be recompiled based on the file that the procedural macro is opening. This leads to the case where modifying a .pest file without touching the file where the derive is does not recompile it if it already has a working binary in the cache. To avoid this issue, the grammar file can be included in a dummy const definition while debugging.

const _GRAMMAR: &'static = include_str!("path/to/my_grammar.pest"); // relative to this file

#[derive(Parser)]
#[grammar = "path/to/my_grammar.pest"] // relative to src
struct MyParser;

Grammar

A grammar is a series of rules separated by whitespace, possibly containing comments.

Comments

Comments start with // and end at the end of the line.

// a comment

Rules

Rules have the following form:

name = optional_modifier { expression }

The name of the rule is formed from alphanumeric characters or _ with the condition that the first character is not a digit and is used to create token pairs. When the rule starts being parsed, the starting part of the token is being produced, with the ending part being produced when the rule finishes parsing.

The following token pair notation a(b(), c()) denotes the tokens: start a, start b, end b, start c, end c, end a.

Modifiers

Modifiers are optional and can be one of _, @, $, or !. These modifiers change the behavior of the rules.

Silent (_)

Silent rules do not create token pairs during parsing, nor are they error-reported.
```
a = _{ "a" }
b =  { a ~ "b" }
```
Parsing "ab" produces the token pair b().
Atomic (@)

Atomic rules do not accept whitespace or comments within their expressions and have a cascading effect on any rule they call. I.e. rules that are not atomic but are called by atomic rules behave atomically.

Any rules called by atomic rules do not generate token pairs.
```
a =  { "a" }
b = @{ a ~ "b" }

whitespace = _{ " " }
```
Parsing "ab" produces the token pair b(), while "a b" produces an error.
Compound-atomic ($)

Compound-atomic are identical to atomic rules with the exception that rules called by them are not forbidden from generating token pairs.
```
a =  { "a" }
b = ${ a ~ "b" }

whitespace = _{ " " }
```
Parsing "ab" produces the token pairs b(a()), while "a b" produces an error.
Non-atomic (!)

Non-atomic are identical to normal rules with the exception that they stop the cascading effect of atomic and compound-atomic rules.
```
a =  { "a" }
b = !{ a ~ "b" }
c = @{ b }

whitespace = _{ " " }
```
Parsing both "ab" and "a b" produce the token pairs c(b(a())).

Expressions

Expressions can be either terminals or non-terminals.

Terminals

Terminal Usage

"a" matches the exact string "a"

^"a" matches the exact string "a" case insensitively (ASCII only)

'a'..'z' matches one character between 'a' and 'z'

a matches rule a

Terminal	Usage
`"a"`	matches the exact string `"a"`
`^"a"`	matches the exact string `"a"` case insensitively (ASCII only)
`'a'..'z'`	matches one character between `'a'` and `'z'`
`a`	matches rule `a`

Non-terminals

Non-terminal	Usage
`(e)`	matches `e`
`e1 ~ e2`	matches the sequence `e1` `e2`
`e1	e2`
`e*`	matches `e` zero or more times
`e+`	matches `e` one or more times
`e?`	optionally matches `e`
`&e`	matches `e` without making progress
`!e`	matches if `e` doesn't match without making progress
`push(e)`	matches `e` and pushes it's captured string down the stack

where e, e1, and e2 are expressions.

Special rules

Special rules can be called within the grammar. They are:

whitespace - gets run between rules and sub-rules
comment - gets run between rules and sub-rules
any - matches exactly one char
soi - (start-of-input) matches only when a Parser is still at the starting position
eoi - (end-of-input) matches only when a Parser has reached its end
pop - pops a string from the stack and matches it
peek - peeks a string from the stack and matches it

whitespace and comment should be defined manually if needed. All other rules cannot be overridden.

`Rule`

All rules defined or used in the grammar populate a generated enum called Rule. This implements pest's RuleType and can be used throughout the API.

pest_derive 1.0.0-beta.2