Crate lemon_mint

Expand description

§lemon-mint

A pure-Rust port of the famous Lemon parser generator, packaged as a library with an API instead of a command-line tool.

You describe an LALR(1) grammar — either as a Lemon-style .y grammar string/file, or by calling builder methods — and lemon-mint emits a self-contained Rust source file containing the parser’s state-machine tables and a small driver. The generated parser is plain Rust with no runtime dependency on this crate, so you can commit it to your repository or generate it from a build.rs build script.

Unlike a hand-written recursive-descent parser, the output is a table-driven bottom-up (shift/reduce) LALR(1) parser: compact, fast, and able to report grammar conflicts at generation time rather than surprising you at run time.

§The three steps

Build a grammar with LemonMintBuilder. Either feed it a whole grammar with load_y / load_y_file, or assemble it piece by piece with set_start_symbol, set_token_type, add_type, add_rule, and friends. The two styles can be mixed freely.
Compile the grammar into the parser tables with try_into_lemon, which returns a LemonMint. Grammar errors and unresolved parser conflicts are reported here as an Err.
Emit Rust source with gen_rust. Optionally write a human-readable report of the state machine with gen_log (the classic Lemon y.output), or a normalized copy of the grammar with gen_y.

§Example: a calculator

This generates a parser for newline-separated arithmetic expressions, then runs it on 15 / 5 to get 3.0:

use std::sync::Arc;
use lemon_mint::LemonMintBuilder;

let builder = LemonMintBuilder::new().load_y
(	&Arc::new("source.y".to_string()), // fake source name, that will appear in error messages
	"	%trace {>> }
		%extra_argument {()}
		%left PLUS MINUS.
		%left TIMES DIVIDE.
		%token_type {f64}
		%type Expr {super::Expr}
		%type Exprs {Vec<super::Expr>}
		%type Program {Vec<super::Expr>}

		Program ::= Exprs(exprs). exprs
		Program ::= Exprs(exprs) NEW_LINE. exprs

		Exprs ::= Expr(item).                       vec![item]
		Exprs ::= Exprs(items) NEW_LINE Expr(item). let mut items = items; items.push(item); items

		Expr ::= NUM(value). super::Expr {value}
		Expr ::= PAR_OPEN Expr(a) PAR_CLOSE. a
		Expr ::= PLUS Expr(a). a
		Expr ::= MINUS Expr(a). let mut a = a; a.value = -a.value; a
		Expr ::= Expr(a) PLUS Expr(b). super::Expr{value: a.value + b.value}
		Expr ::= Expr(a) MINUS Expr(b). super::Expr{value: a.value - b.value}
		Expr ::= Expr(a) TIMES Expr(b). super::Expr{value: a.value * b.value}
		Expr ::= Expr(a) DIVIDE Expr(b). super::Expr{value: a.value / b.value}

		%code {
			use code::{Parser, Token};

            #[derive(Debug, PartialEq)]
			pub struct Expr {value: f64}

			fn main()
			{	let mut parser = Parser::new(()); // () is our extra argument, accessible in actions and through parser.extra

				parser.add_token(Token::NUM, 15.0).unwrap();
				parser.add_token(Token::DIVIDE, 0.0).unwrap();
				parser.add_token(Token::NUM, 5.0).unwrap();
				parser.add_token(Token::NEW_LINE, 0.0).unwrap();

				let result = parser.end().unwrap(); // type is %type of the start symbol Program
                assert_eq!(result, vec![Expr {value: 3.0}]);
                println!(\"Result: {:?}\", result);
			}
		}
	".as_bytes()
).unwrap();

let lemon = builder.try_into_lemon().unwrap();

// Emit the parser. Here we write to in-memory buffers; in practice you would write `parser.rs`
// (e.g. to `OUT_DIR` from a build script) and `include!` it.
let mut out_rust: Vec<u8> = Vec::new();
let mut out_log: Vec<u8> = Vec::new();
lemon.gen_rust(&mut out_rust).unwrap();
lemon.gen_log(&mut out_log, false, false).unwrap();

§Using the generated parser

gen_rust wraps everything in a code module. The two items you use from it are Token (an enum with one variant per terminal symbol) and Parser:

use code::{Parser, Token};

let mut parser = Parser::new(extra);           // `extra` is the initial %extra_argument value
parser.add_token(Token::NUM, 15.0)?;           // feed terminals in input order, with their values
parser.add_token(Token::DIVIDE, 0.0)?;
parser.add_token(Token::NUM, 5.0)?;
let result = parser.end()?;                    // finish; result has the start symbol's %type

add_token returns Err(()) on a syntax error, and end returns Err(()) if the input is not a complete sentence of the grammar. The whole driver lives in code; a second rules module holds the action code and is not used directly. Items from your crate are reachable from actions and %code as super::… or crate::… (e.g. super::Expr).

§Soft keywords with `try_add_token`

Parser::try_add_token is a non-committal variant of add_token: it returns Ok(true) if the token is accepted in the current state, or Ok(false) if it is not — without raising a syntax error. This lets a tokenizer treat a word as a keyword where the grammar expects one and fall back to feeding it as an identifier elsewhere (“soft” / non-reserved keywords):

match parser.try_add_token(Token::DAY, value) {
    Ok(true)  => {}                                   // consumed as the DAY keyword
    Ok(false) => parser.add_token(Token::IDENT, ident)?, // not valid here — re-feed as identifier
    Err(())   => return Err(SyntaxError),
}

§Y-grammar reference

lemon-mint accepts a grammar syntax close to classic Lemon. Line (//) and block (/* … */) C-style comments are allowed anywhere.

§Rules and actions

A rule is Lhs ::= Rhs. optionally followed by an action — Rust code producing the semantic value of the left-hand side:

Expr ::= Expr(a) PLUS Expr(b). super::Expr{value: a.value + b.value}

The . ends the right-hand side; everything after it (up to the next rule or directive) is the action. A rule with no action produces (). Parenthesized names like (a) are aliases that bind the semantic value of that symbol to a variable usable in the action. The action also has &mut extra in scope (the %extra_argument).

§Terminals vs nonterminals

A symbol whose name contains at least one lowercase letter is a nonterminal; an all-uppercase name is a terminal (a Token). (Classic Lemon decides this from the first letter only.)

§Types

Every symbol carries a semantic value. Terminals all share the %token_type. Each nonterminal’s type is set with %type Name {Type}, or defaults to %default_type. The start symbol’s type is what the generated Parser::end returns.

§Precedence and associativity

%left, %right, and %nonassoc declare terminal precedence to resolve shift/reduce conflicts. Each successive declaration binds more tightly than the previous one:

%left PLUS MINUS.    // lower precedence
%left TIMES DIVIDE.  // higher precedence

§Supported directives

Directive	Purpose
`%token_type {T}`	Type of every terminal’s semantic value
`%type Name {T}`	Type of one nonterminal’s semantic value
`%default_type {T}`	Default type for nonterminals without a `%type`
`%start_symbol Name`	The grammar’s start symbol
`%extra_argument {T}`	Type of the user value threaded through the parser (name is always `extra`, default `()`)
`%left` / `%right` / `%nonassoc`	Terminal precedence and associativity
`%fallback FB A B C.`	Tokens `A B C` fall back to `FB` when they have no action in the current state
`%trace {prompt}`	Enable tracing to stderr, prefixing each message with `prompt`
`%code` / `%include`	Verbatim Rust code appended to the generated file (synonyms)

Directive syntax is permissive — braces, a trailing ., or a bare token all work, and a braced value may span multiple lines up to its matching closing brace:

%start_symbol {Unit}
%start_symbol Unit.
%start_symbol Unit

§Differences from classic Lemon

A symbol is a nonterminal if its name contains any lowercase letter (classic Lemon looks only at the first letter).
Tracing is enabled by the %trace directive rather than a ParseTrace() call.
No %name directive — each generated file is its own Rust module.
No destructors — Rust’s ownership and Drop handle cleanup.

Structs§

LemonMint: The compiled parser that can be saved to a rust file. Use LemonMintBuilder to build it.
LemonMintBuilder: Builder class that will finally generate LemonMint. Call builder methods to supply parser rules and options - everything that you would normally put to Lemon’s Y-grammar file. Or you can feed the Y-file itself (it’s syntax is similar to Lemon’s one).
LemonMintError

Crate lemon_mint

Crate lemon_mint Copy item path

§lemon-mint

§The three steps

§Example: a calculator

§Using the generated parser

§Soft keywords with try_add_token

§Y-grammar reference

§Rules and actions

§Terminals vs nonterminals

§Types

§Precedence and associativity

§Supported directives

§Differences from classic Lemon

Structs§

Crate lemon_mint

§Soft keywords with `try_add_token`