[−][src]Crate peg
rust-peg
is a simple yet flexible parser generator based on the Parsing Expression
Grammar formalism. It provides the parser!{}
macro that builds a recursive
descent parser from a concise definition of the grammar.
The parser!{}
macro encloses a grammar
definition containing a set of rule
s which match
components of your language. It expands to a Rust mod
containing functions corresponding to
each rule
marked pub
.
peg::parser!{ grammar list_parser() for str { rule number() -> u32 = n:$(['0'..='9']+) { n.parse().unwrap() } pub rule list() -> Vec<u32> = "[" l:number() ** "," "]" { l } } } pub fn main() { assert_eq!(list_parser::list("[1,1,2,3,5,8]"), Ok(vec![1, 1, 2, 3, 5, 8])); }
Expressions
"keyword"
- Literal: match a literal string.['0'..='9']
- Pattern: match a single element that matches a Rustmatch
-style pattern. (details)some_rule()
- Rule: match a rule defined elsewhere in the grammar and return its result.e1 e2 e3
- Sequence: match expressions in sequence (e1
followed bye2
followed bye3
).e1 / e2 / e3
- Ordered choice: try to matche1
. If the match succeeds, return its result, otherwise trye2
, and so on.expression?
- Optional: match one or zero repetitions ofexpression
. Returns anOption
.expression*
- Repeat: match zero or more repetitions ofexpression
and return the results as aVec
.expression+
- One-or-more: match one or more repetitions ofexpression
and return the results as aVec
.expression*<n,m>
- Range repeat: match betweenn
andm
repetitions ofexpression
return the results as aVec
. (details)expression ** delim
- Delimited repeat: match zero or more repetitions ofexpression
delimited withdelim
and return the results as aVec
.&expression
- Positive lookahead: Match only ifexpression
matches at this position, without consuming any characters.!expression
- Negative lookahead: Match only ifexpression
does not match at this position, without consuming any characters.a:e1 b:e2 c:e3 { rust }
- Action: Matche1
,e2
,e3
in sequence. If they match successfully, run the Rust code in the block and return its return value. The variable names before the colons in the preceding sequence are bound to the results of the corresponding expressions.a:e1 b:e2 c:e3 {? rust }
- Like above, but the Rust block returns aResult<T, &str>
instead of a value directly. OnOk(v)
, it matches successfully and returnsv
. OnErr(e)
, the match of the entire expression fails and it tries alternatives or reports a parse error with the&str
e
.$(e)
- Slice: match the expressione
, and return the&str
slice of the input corresponding to the match.position!()
- return ausize
representing the current offset into the input, and consumes no characters.quiet!{ e }
- match expression, but don't report literals within it as "expected" in error messages.expected!("something")
- fail to match, and report the specified string as an expected symbol at the current location.precedence!{ ... }
- Parse infix, prefix, or postfix expressions by precedence climbing. (details)
Match expressions
The [pat]
syntax expands into a Rust match
pattern against the next character
(or element) of the input.
This is commonly used for matching sets of characters with Rust's ..=
inclusive range pattern
syntax and |
to match multiple patterns. For example ['a'..='z' | 'A'..='Z']
matches an
upper or lower case ASCII alphabet character.
If your input type is a slice of an enum type, a pattern could match an enum variant like
[Token::Operator('+')]
.
[_]
matches any single element. As this always matches except at end-of-file, combining it
with negative lookahead as ![_]
is the idiom for matching EOF in PEG.
Repeat ranges
The repeat operators *
and **
can be followed by an optional range specification of the
form <n>
(exact), <n,>
(min), <,m>
(max) or <n,m>
(range), where n
and m
are either
integers, or a Rust usize
expression enclosed in {}
.
Precedence climbing
precedence!{ rules... }
provides a convenient way to parse infix, prefix, and postfix
operators using the precedence
climbing
algorithm.
pub rule arithmetic() -> i64 = precedence!{ x:(@) "+" y:@ { x + y } x:(@) "-" y:@ { x - y } -- x:(@) "*" y:@ { x * y } x:(@) "/" y:@ { x / y } -- x:@ "^" y:(@) { x.pow(y as u32) } -- n:number() { n } }
Each --
introduces a new precedence level that binds more tightly than previous precedence
levels. The levels consist of one or more operator rules each followed by a Rust action
expression.
The (@)
and @
are the operands, and the parentheses indicate associativity. An operator
rule beginning and ending with @
is an infix expression. Prefix and postfix rules have one
@
at the beginning or end, and atoms do not include @
.
Custom input types
rust-peg
handles input types through a series of traits, and comes with implementations for
str
, [u8]
, and [T]
.
Parse
is the base trait for all inputs. The others are only required to use the corresponding expressions.ParseElem
implements the[_]
pattern operator, with a method returning the next item of the input to match.ParseLiteral
implements matching against a"string"
literal.ParseSlice
implements the$()
operator, returning a slice from a span of indexes.
Error reporting
When a match fails, position information is automatically recorded to report a set of "expected" tokens that would have allowed the parser to advance further.
Some rules should never appear in error messages, and can be suppressed with quiet!{e}
:
rule whitespace() = quiet!{[' ' | '\n' | '\t']+}
If you want the "expected" set to contain a more helpful string instead of character sets, you
can use quiet!{}
and expected!()
together:
rule identifier() = quiet!{[ 'a'..='z' | 'A'..='Z']['a'..='z' | 'A'..='Z' | '0'..='9' ]+} / expected!("identifier")
Imports
mod ast { pub struct Expr; } peg::parser!{grammar doc() for str { use self::ast::Expr; }}
The grammar may begin with a series of use
declarations, just like in Rust, which are
included in the generated module. Unlike normal mod {}
blocks, use super::*
is inserted by
default, so you don't have to deal with this most of the time.
Rustdoc comments
rustdoc
comments with ///
before a grammar
or pub rule
are propagated to the resulting
function:
/// Parse an array expression. pub rule array() -> Vec<i32> = "[...]" { vec![] }
As with all procedural macros, non-doc comments are ignored by the lexer and can be used like in any other Rust code.
Tracing
If you pass the peg/trace
feature to Cargo when building your project, a trace of the parsing
will be printed to stdout when parsing. For example,
$ cargo run --features peg/trace
...
[PEG_TRACE] Matched rule type at 8:5
[PEG_TRACE] Attempting to match rule ident at 8:12
[PEG_TRACE] Attempting to match rule letter at 8:12
[PEG_TRACE] Failed to match rule letter at 8:12
...
Modules
error | Parse error reporting |
str | Utilities for |
Macros
parser | The main macro for creating a PEG parser. |
Enums
RuleResult | The result type used internally in the parser. |
Traits
Parse | A type that can be used as input to a parser. |
ParseElem | A parser input type supporting the |
ParseLiteral | A parser input type supporting the |
ParseSlice | A parser input type supporting the |