Crate pest_derive [−] [src]
pest. The Elegant Parser
pest is a PEG parser built with simplicity and speed in mind.
This crate works in conjunction with the pest
crate by
deriving a grammar implementation based on a provided grammar.
.pest
files
Grammar definitions reside in custom .pest
files located in the src
directory. Their path is
relative to src
and is specified between the derive
attribute and empty struct
that
Parser
will be derived on.
Because of a limitation in procedural macros, there is no way for Cargo to know that a module
needs to be recompiled based on the file that the procedural macro is opening. This leads to the
case where modifying a .pest
file without touching the file where the derive
is does not
recompile it if it already has a working binary in the cache. To avoid this issue, the grammar
file can be included in a dummy const
definition while debugging.
const _GRAMMAR: &'static str = include_str!("path/to/my_grammar.pest"); // relative to this file #[derive(Parser)] #[grammar = "path/to/my_grammar.pest"] // relative to src struct MyParser;
Grammar
A grammar is a series of rules separated by whitespace, possibly containing comments.
Comments
Comments start with //
and end at the end of the line.
// a comment
Rules
Rules have the following form:
name = optional_modifier { expression }
The name of the rule is formed from alphanumeric characters or _
with the condition that the
first character is not a digit and is used to create token pairs. When the rule starts being
parsed, the starting part of the token is being produced, with the ending part being produced
when the rule finishes parsing.
The following token pair notation a(b(), c())
denotes the tokens: start a
, start b
, end
b
, start c
, end c
, end a
.
Modifiers
Modifiers are optional and can be one of _
, @
, $
, or !
. These modifiers change the
behavior of the rules.
Silent (
_
)Silent rules do not create token pairs during parsing, nor are they error-reported.
a = _{ "a" } b = { a ~ "b" }
Parsing
"ab"
produces the token pairb()
.Atomic (
@
)Atomic rules do not accept whitespace or comments within their expressions and have a cascading effect on any rule they call. I.e. rules that are not atomic but are called by atomic rules behave atomically.
Any rules called by atomic rules do not generate token pairs.
a = { "a" } b = @{ a ~ "b" } whitespace = _{ " " }
Parsing
"ab"
produces the token pairb()
, while"a b"
produces an error.Compound-atomic (
$
)Compound-atomic are identical to atomic rules with the exception that rules called by them are not forbidden from generating token pairs.
a = { "a" } b = ${ a ~ "b" } whitespace = _{ " " }
Parsing
"ab"
produces the token pairsb(a())
, while"a b"
produces an error.Non-atomic (
!
)Non-atomic are identical to normal rules with the exception that they stop the cascading effect of atomic and compound-atomic rules.
a = { "a" } b = !{ a ~ "b" } c = @{ b } whitespace = _{ " " }
Parsing both
"ab"
and"a b"
produce the token pairsc(a())
.
Expressions
Expressions can be either terminals or non-terminals.
Terminals
Terminal Usage "a"
matches the exact string "a"
^"a"
matches the exact string "a"
case insensitively (ASCII only)'a'..'z'
matches one character between 'a'
and'z'
a
matches rule a
Non-terminals
Non-terminal Usage (e)
matches e
e1 ~ e2
matches the sequence e1
e2
e1 | e2
matches either e1
ore1
e*
matches e
zero or more timese+
matches e
one or more timese?
optionally matches e
&e
matches e
without making progress!e
matches if e
doesn't match without making progresspush(e)
matches e
and pushes it's captured string down the stackwhere
e
,e1
, ande2
are expressions.
Special rules
Special rules can be called within the grammar. They are:
whitespace
- gets run between rules and sub-rulescomment
- gets run between rules and sub-rulesany
- matches exactly onechar
soi
- (start-of-input) matches only when aParser
is still at the starting positioneoi
- (end-of-input) matches only when aParser
has reached its endpop
- pops a string from the stack and matches itpeek
- peeks a string from the stack and matches it
whitespace
and comment
should be defined manually if needed. All other rules cannot be
overridden.
Rule
All rules defined or used in the grammar populate a generated enum
called Rule
. This
implements pest
's RuleType
and can be used throughout the API.
Functions
derive_parser |