Parsing Expression Grammars in Rust
This is a simple parser generator based on the Parsing Expression Grammar formalism.
Please see the release notes for breaking changes between rust-peg 0.3.x and 0.4.x.
Grammar Definition Syntax
use name;
The grammar may begin with a series of use
declarations, just like in Rust, which are included in
the generated module. Since the grammar is in its own module, you must use super::StructName;
to
access a structure from the parent module.
= expression
rule_name
If a rule is marked with #[pub]
, the generated module has a public function that begins parsing at that rule.
.
- match any single character"literal"
- match a literal string[a-z]
- match a single character from a set[^a-z]
- match a single character not in a setrule
- match a production defined elsewhere in the grammar and return its resultexpression*
- Match zero or more repetitions ofexpression
and return the results as aVec
expression+
- Match one or more repetitions ofexpression
and return the results as aVec
expression*<n>
- Matchn
repetitions ofexpression
and return the results as aVec
expression*<n,m>
- Match betweenn
andm
repetitions ofexpression
and return the results as aVec
.expression?
- Match one or zero repetitions ofexpression
. Returns anOption
&expression
- Match only ifexpression
matches at this position, without consuming any characters!expression
- Match only ifexpression
does not match at this position, without consuming any charactersexpression ** delim
- Match zero or more repetitions ofexpression
delimited withdelim
and return the results as aVec
expression ++ delim
- Match one or more repetitions ofexpression
delimited withdelim
and return the results as aVec
e1 / e2 / e3
- Try to match e1. If the match succeeds, return its result, otherwise try e2, and so on.e1 e2 e3
- Match expressions in sequencea:e1 b:e2 c:e3 { rust }
- Match e1, e2, e3 in sequence. If they match successfully, run the Rust code in the block and return its return value. The variable names before the colons in the preceding sequence are bound to the results of the corresponding expressions. The Rust code must contain matched curly braces, including those in strings and comments.a:e1 b:e2 c:e3 {? rust }
- Like above, but the Rust block returns aResult
instead of a value directly. OnOk(v)
, it matches successfully and returnsv
. OnErr(e)
, the match of the entire expression fails and it tries alternatives or reports a parse error with the&str
e
.$(e)
- matches the expression e, and returns the&str
slice of the input string corresponding to the match#position
- returns ausize
representing the current offset into the input string, and consumes no characters
You can use line comments and block comments just as in Rust code, for example:
// comment
name -> String
= /* weirdly placed comment */ [0-9]+ { from_str::<u64>(match_str).unwrap() } // comment
Usage
With a build script
A Cargo build script can compile your PEG grammar to Rust source automatically.
Example crate using rust-peg with a build script
Add to your Cargo.toml
:
# Under [package]
build = "build.rs"
[build-dependencies]
peg = { version = "0.4" }
Create build.rs
with:
extern crate peg;
fn main() {
peg::cargo_build("src/my_grammar.rustpeg");
}
And import the generated code:
mod my_grammar {
include!(concat!(env!("OUT_DIR"), "/my_grammar.rs"));
}
As a syntax extension
rust-syntax-ext
only works on Nightly builds of Rust.
Examples using rust-peg as a syntax extension
Add to your Cargo.toml:
[]
= "0.4.0"
Add to your crate root:
Use peg_file! modname("mygrammarfile.rustpeg");
to include the grammar from an external file. The macro expands into a module called modname
with functions corresponding to the #[pub]
rules in your grammar.
Or, use
peg! modname;`
to embed a short PEG grammar inline in your Rust source file. Example.
As a standalone code generator
Run peg input_file.rustpeg
to compile a grammar and generate Rust code on stdout.
Tracing
If you pass the peg/trace
feature to Cargo when building your project, a trace of the parsing will be output to stdout when running the binary. For example,
$ cargo run --features peg/trace
...
[PEG_TRACE] Matched rule type at 8:5
[PEG_TRACE] Attempting to match rule ident at 8:12
[PEG_TRACE] Attempting to match rule letter at 8:12
[PEG_TRACE] Failed to match rule letter at 8:12
...