lemon-tree
The famous Lemon Parser Generator, exposed as a Rust library that builds an LALR(1) parser transparently during cargo build.
Instead of maintaining a separate grammar file and running a code-generation step, you describe the grammar directly in Rust by adding annotation attributes to ordinary functions, structs and enums. The generated parser produces your own Rust types, so a successful parse yields a ready-to-use syntax tree (or a computed value) with no glue code in between.
This crate uses lemon-mint as the parser-generation backend.
How it works
The annotation attributes are procedural macros. As cargo compiles your source file, the macros accumulate the grammar rules they encounter, and the final #[derive(LemonTree)] (the start symbol) triggers generation of a complete LALR(1) parser. The parser is emitted into a private submodule and connected to your types through the LemonTree and LemonTreeNode traits. Because everything happens at compile time, grammar conflicts and errors surface as ordinary build errors.
Installation
Add this to your project's Cargo.toml:
[]
= "0.2"
The crate exports three items: lem_fn, LemonTree and LemonTreeNode.
Core concepts
A grammar is made of terminal symbols (tokens, supplied by your tokenizer) and nonterminal symbols (produced by reducing rules). By convention, names containing a lowercase letter are nonterminals, and all-uppercase names like PLUS or NUM are tokens.
Each nonterminal is backed by a Rust type. There are three ways to attach rules to it:
- A struct annotated with
#[derive(LemonTreeNode)](or#[derive(LemonTree)]for the start symbol). Rules are listed in#[lem("...")]attributes on the struct, and each rule constructs the struct. - An enum annotated with
#[derive(LemonTreeNode)](or#[derive(LemonTree)]). The#[lem("...")]attributes are placed on the individual variants. - A function annotated with
#[lem_fn("...")]. The function's return type is the left-hand-side nonterminal, and the body is the action run when the rule reduces.
The start symbol is marked with #[derive(LemonTree)] and represents the final result of a parse. Parser-wide options are set with #[lem_opt(...)] attributes placed next to it.
Rule syntax
The string inside #[lem(...)] / #[lem_fn(...)] is the right-hand side of a Lemon rule — a whitespace-separated sequence of symbols. The left-hand side is implied by where the attribute sits (the struct/enum/variant type, or the function return type).
- Multiple rules. An attribute may list several alternatives, and may be repeated:
#[lem("A(value)", "B(value)")]. - Aliases. A symbol can be followed by an alias in parentheses to bind its value. For structs, the alias is a field name (
"VALUE(value)"). For enum variants and#[lem_fn]functions it is an argument/tuple-field name or a zero-based index ("Expr(0) PLUS Expr(1)"). - Value conversion. Bound values are moved into their target via
.into(), so the target type only needsFromfor the matched type (this is why anExprflows into aBox<Expr>field automatically). ImplementInto<TargetType>yourself for arbitrary conversions. Unbound fields/arguments are filled withDefault::default(). - Optional symbols. Square brackets mark an optional part and expand into alternatives:
"Exprs(exprs) [SEMICOLON]"means both"Exprs(exprs)"and"Exprs(exprs) SEMICOLON". Brackets may be nested.
Parser options — #[lem_opt(...)]
| Option | Lemon directive | Meaning |
|---|---|---|
token_type |
%token_type |
The Rust type carried by every token's value. |
extra_argument |
%extra_argument |
Type of a user value made available to every action. |
left |
%left |
Declare tokens left-associative (precedence grows with each line). |
right |
%right |
Declare tokens right-associative. |
nonassoc |
%nonassoc |
Declare tokens non-associative. |
fallback |
%fallback |
"FALLBACK_TOK TOK_A TOK_B ..." — fall back the listed tokens. |
trace |
%trace |
Print a parser trace to stderr, prefixed with the given prompt. |
Associativity options may be repeated; rules declared earlier have lower precedence than later ones, exactly as in Lemon.
The generated parser API
Deriving LemonTree on a start symbol S generates:
S::get_parser(extra)— create a parser (extrais the%extra_argument; use()when none is set).<S as LemonTree>::Token— an enum with one variant per terminal symbol.parser.add_token(token, value)— feed one token; returnsResult<(), ()>(Erron syntax error).parser.try_add_token(token, value)— likeadd_token, but returnsOk(false)when the token is not accepted instead of erroring.parser.end()— signal end of input; returnsResult<S, ()>.parser.extra— public field holding theextra_argumentvalue.
Example: a calculator
This first example builds a calculator that evaluates expressions to f64. Create an empty cargo project and put the following in its main.rs:
use ;
type Expr = f64; // in lemon this corresponds to: %type Expr {f64}
type Exprs = ; // %type Exprs {Vec<f64>}
// Expr ::= NUM(value). value
// Expr ::= MINUS Expr(a). -a
// Expr ::= PLUS Expr(a). a
// Expr ::= Expr(a) PLUS Expr(b). a + b
// Expr ::= Expr(a) MINUS Expr(b). a - b
// Expr ::= Expr(a) TIMES Expr(b). a * b
// Expr ::= Expr(a) DIVIDE Expr(b). a / b
// Expr ::= PAR_OPEN Expr(a) PAR_CLOSE. a
// Exprs ::= Expr(item). vec![item]
// Exprs ::= Exprs(items) SEMICOLON Expr(item). items.push(item); items
// The start symbol is marked with #[derive(LemonTree)]
// The above code generates the following lemon directives:
// %start_symbol {Program}
// %token_type {f64}
// %left PLUS MINUS.
// %left DIVIDE TIMES.
// Program ::= Exprs(exprs). Program {exprs}
// Program ::= Exprs(exprs) SEMICOLON. Program {exprs}
// This is the tokenizer function. It takes "input", and feeds tokens to the "parser".
Example: building a syntax tree
Rather than computing a value, you can produce an AST. Here Expr is an enum whose variants carry the rules, so the parser hands back a tree of Expr nodes. Note how each matched Expr is automatically wrapped in a Box<Expr> via .into().
use ;
pub type Exprs = ;
Constraints
- All attributes describing a single parser must live in one Rust file.
#[derive(LemonTree)](the start symbol) must be the last parser attribute in that file — it is what triggers parser generation.- A file may define only one parser (one
#[derive(LemonTree)]). - Unions are not supported as symbol types.
You can have several parsers in one crate, as long as each lives in its own file.
Cargo features
These features (forwarded to lemon-tree-derive) help debug the grammar at build time:
dump-grammar— print the generated grammar, with Rust actions, to stderr during the build.dump-lemon-grammar— print the grammar in classic Lemon.ysyntax to stderr.debug-parser-to-file— write the generated parser source to a file next to your source instead of inlining it, making the generated code easy to inspect.
License
Licensed under the MIT license.