lemon_tree/lib.rs
1//! The famous [Lemon Parser Generator](https://www.hwaci.com/sw/lemon/), exposed as a Rust library
2//! that builds an LALR(1) parser transparently during `cargo build`. Instead of writing a separate
3//! grammar file and running a code generator, you describe the grammar directly in Rust by adding
4//! annotation attributes to ordinary functions, structs and enums. The resulting parser produces
5//! your own Rust types, so parsing yields a ready-to-use syntax tree (or computed value) with no
6//! glue code in between.
7//!
8//! This crate uses [lemon-mint](https://crates.io/crates/lemon-mint) as the parser-generation backend.
9//! You can find more usage examples [here](https://github.com/jeremiah-shaulov/lemon-tree).
10//!
11//! # How it works
12//!
13//! The annotation attributes are procedural macros. As `cargo` compiles your source file, the macros
14//! accumulate the grammar rules they see, and the final `#[derive(LemonTree)]` (the start symbol)
15//! triggers generation of a complete LALR(1) parser. The generated parser is emitted into a private
16//! submodule and wired to your types through the [`LemonTree`] and [`LemonTreeNode`] traits. Because
17//! everything happens at compile time, grammar conflicts and errors are reported as ordinary build
18//! errors.
19//!
20//! # Mapping to Lemon
21//!
22//! If you are familiar with Lemon's `.y` grammar files, this is the correspondence. Let's say we want
23//! to create a Lemon parser like this:
24//!
25//! ```ignore
26//! %token_type {f64}
27//! %left PLUS
28//!
29//! Unit ::= Expr(expr).
30//! Expr ::= VALUE(value).
31//! Expr ::= Expr(a) PLUS Expr(b).
32//! ```
33//!
34//! We want that Unit and Expr will be represented by the following types in rust:
35//!
36//! ```ignore
37//! struct Unit
38//! { expr: Expr,
39//! }
40//!
41//! struct Expr
42//! { value: f64,
43//! }
44//! ```
45//!
46//! Every symbol except the start symbol we need to annotate with `#[derive(LemonTreeNode)]`, and the start symbol with `#[derive(LemonTree)]`.
47//! Parser rules that return this symbol we put into `#[lem()]` annotation attributes.
48//! All `#[derive(LemonTreeNode)]`, `#[derive(LemonTree)]` and `#[lem_fn()]` attributes that describe single Lemon parser must be contained in single rust file,
49//! and the `#[derive(LemonTree)]` must come the last.
50//!
51//! ```ignore
52//! #[derive(LemonTreeNode)]
53//! #[lem("VALUE(value)")]
54//! pub struct Expr
55//! { value: f64,
56//! }
57//!
58//! #[derive(LemonTree)]
59//! #[lem("Expr(expr)")]
60//! pub struct Unit
61//! { expr: Expr,
62//! }
63//! ```
64//!
65//! The `#[lem()]` attribute can appear multiple times, and each attribute can contain multiple rules, like `#[lem("A(value)", "B(value)")]`.
66//!
67//! Each rule will produce code that creates new struct instance. Aliases given in parentheses will be assigned to struct fields.
68//! If a struct has more fields than appear in expression, the remaining fields will be set to `Default::default()`, so they need to implement `std::default::Default` trait.
69//! Existing fields will be assigned like this: `Type {field: value.into()}`. So field type in struct can be the type of value, or compatible with it.
70//!
71//! In example above, there's one Lemon rule, that doesn't return the final result, but needs to perform some calculation.
72//! We expect that rule `Expr ::= Expr(a) PLUS Expr(b)` will produce type `Expr {value: a.value + b.value}`.
73//! We can implement this rule as rust function:
74//!
75//! ```ignore
76//! #[lem_fn("Expr(a) PLUS Expr(b)")]
77//! pub fn expr_1(a: Expr, b: Expr) -> Expr
78//! { Expr {value: a.value + b.value}
79//! }
80//! ```
81//!
82//! So `#[lem_fn()]` attribute creates parser rule, whose action is module-global function call.
83//! The return type of such function will be the left-hand side symbol in Lemon rule, like `Expr ::= Expr(a) PLUS Expr(b)`.
84//!
85//! To specify Lemon parser directives, like `%token_type {f64}`, need to use `#[lem_opt()]` attributes near start symbol, like `#[lem_opt(token_type="f64")]`.
86//!
87//! Here is complete example:
88//!
89//! ```
90//! use lemon_tree::{lem_fn, LemonTree, LemonTreeNode};
91//!
92//! #[derive(LemonTreeNode, Debug)]
93//! #[lem("VALUE(value)")]
94//! pub struct Expr
95//! { value: f64,
96//! }
97//!
98//! #[lem_fn("Expr(a) PLUS Expr(b)")]
99//! pub fn expr_1(a: Expr, b: Expr) -> Expr
100//! { Expr {value: a.value + b.value}
101//! }
102//!
103//! #[derive(LemonTree, Debug)]
104//! #[lem("Expr(expr)")]
105//! #[lem_opt(token_type="f64", left="PLUS")]
106//! pub struct Unit
107//! { expr: Expr,
108//! }
109//!
110//! fn main()
111//! { let mut parser = Unit::get_parser(());
112//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 10.0).unwrap();
113//! parser.add_token(<Unit as LemonTree>::Token::PLUS, 0.0).unwrap();
114//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 20.0).unwrap();
115//! let result = parser.end().unwrap();
116//! assert_eq!(result.expr.value, 30.0);
117//! println!("Result: {:?}", result);
118//! }
119//! ```
120//!
121//! Enums can be used as symbol types as well. With enums need to put `#[lem()]` parser rules near enum variants.
122//! Example:
123//!
124//! ```
125//! use lemon_tree::{lem_fn, LemonTree, LemonTreeNode};
126//!
127//! #[derive(LemonTreeNode, Debug, PartialEq)]
128//! pub enum Expr
129//! { #[lem("VALUE(0)")]
130//! Value(f64),
131//!
132//! #[lem("Expr(0) PLUS Expr(1)")]
133//! Plus(Box<Expr>, Box<Expr>), // the generated action will look like: Expr::Plus(arg_0.into(), arg_1.into())
134//! }
135//!
136//! #[derive(LemonTree, Debug, PartialEq)]
137//! #[lem("Expr(expr)")]
138//! #[lem_opt(token_type="f64", left="PLUS")]
139//! pub struct Unit
140//! { expr: Expr,
141//! }
142//!
143//! fn main()
144//! { let mut parser = Unit::get_parser(());
145//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 10.0).unwrap();
146//! parser.add_token(<Unit as LemonTree>::Token::PLUS, 0.0).unwrap();
147//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 20.0).unwrap();
148//! let result = parser.end().unwrap();
149//! assert_eq!
150//! ( result,
151//! Unit
152//! { expr: Expr::Plus
153//! ( Box::new(Expr::Value(10.0)),
154//! Box::new(Expr::Value(20.0)),
155//! )
156//! }
157//! );
158//! println!("Result: {:?}", result);
159//! }
160//! ```
161//!
162//! Notice, that in `Expr::Plus` action, `Expr` object magically converted to `Box<Expr>`, because `Box<T>` implements `From<T>`, so `into()` can be used to convert.
163//!
164//! What if we want to do more complex conversion? Actually we can convert anything to anything, if we manually implement an `Into<T>` trait.
165//! Example:
166//!
167//! ```
168//! use lemon_tree::{lem_fn, LemonTree, LemonTreeNode};
169//!
170//! #[derive(LemonTreeNode, Debug, PartialEq)]
171//! pub enum Expr
172//! { #[lem("VALUE(0)")]
173//! Value(f64),
174//!
175//! #[lem("Expr(0) PLUS Expr(1)")]
176//! Plus(String, String),
177//! }
178//!
179//! impl Into<String> for Expr
180//! { fn into(self) -> String
181//! { match self
182//! { Expr::Value(v) => format!("{}", v),
183//! Expr::Plus(a, b) => format!("{} + {}", a, b),
184//! }
185//! }
186//! }
187//!
188//! #[derive(LemonTree, Debug, PartialEq)]
189//! #[lem("Expr(expr)")]
190//! #[lem_opt(token_type="f64", left="PLUS")]
191//! pub struct Unit
192//! { expr: Expr,
193//! }
194//!
195//! fn main()
196//! { let mut parser = Unit::get_parser(());
197//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 10.0).unwrap();
198//! parser.add_token(<Unit as LemonTree>::Token::PLUS, 0.0).unwrap();
199//! parser.add_token(<Unit as LemonTree>::Token::VALUE, 20.0).unwrap();
200//! let result = parser.end().unwrap();
201//! assert_eq!
202//! ( result,
203//! Unit
204//! { expr: Expr::Plus("10".to_string(), "20".to_string())
205//! }
206//! );
207//! println!("Result: {:?}", result);
208//! }
209//! ```
210//!
211//! # Defining grammar symbols
212//!
213//! A grammar is made of *terminal* symbols (tokens, fed in by your tokenizer) and *nonterminal*
214//! symbols (produced by reducing rules). By convention, names that contain a lowercase letter are
215//! treated as nonterminals, and all-uppercase names like `PLUS` or `VALUE` are treated as tokens.
216//!
217//! Each nonterminal is backed by a Rust type, and there are three ways to attach rules to it:
218//!
219//! * **A struct** annotated with `#[derive(LemonTreeNode)]` (or `#[derive(LemonTree)]` for the start
220//! symbol). Rules are listed in `#[lem("...")]` attributes on the struct. Each rule constructs the
221//! struct; aliases in the rule name the fields to fill.
222//! * **An enum** annotated with `#[derive(LemonTreeNode)]` (or `#[derive(LemonTree)]`). Here the
223//! `#[lem("...")]` attributes are placed on the individual variants, and each rule constructs that
224//! variant.
225//! * **A function** annotated with `#[lem_fn("...")]`. The function's return type is the left-hand
226//! side nonterminal, and the function body is the action executed when the rule reduces. This is the
227//! most flexible form, useful when a reduction needs to compute something rather than just build a value.
228//!
229//! All rules for one parser must live in a single Rust file, and the `#[derive(LemonTree)]` start
230//! symbol must come last in that file (see [Constraints](#constraints)).
231//!
232//! # Rule syntax
233//!
234//! The string inside `#[lem(...)]` / `#[lem_fn(...)]` is the right-hand side of a Lemon rule — a
235//! sequence of terminal and nonterminal symbols separated by whitespace. The left-hand side is implied
236//! by where the attribute is placed (the struct/enum/variant type, or the function return type).
237//!
238//! A single attribute may contain several alternative rules, and the attribute may be repeated:
239//!
240//! ```ignore
241//! #[lem("A(value)", "B(value)")] // two rules, both producing this symbol
242//! #[lem("C(value)")] // another rule
243//! ```
244//!
245//! ## Aliases
246//!
247//! A symbol in the rule can be followed by an alias in parentheses, which binds the symbol's value so
248//! the action can use it:
249//!
250//! * For **structs**, the alias is a field name: `"VALUE(value)"` assigns the matched value to the
251//! `value` field.
252//! * For **enum variants** and **`#[lem_fn]` functions**, the alias may be a function argument /
253//! tuple-field name, or a zero-based positional index: `"Expr(0) PLUS Expr(1)"` binds the first and
254//! second `Expr` to tuple fields `0` and `1`.
255//!
256//! ## Value conversion
257//!
258//! Bound values are moved into the target field via `.into()`, so the field type only needs to
259//! implement `From` for the matched value's type (for example `Box<T>` implements `From<T>`, which is
260//! why `Expr` can flow into a `Box<Expr>` field automatically). For arbitrary conversions you can
261//! implement `Into<TargetType>` yourself. Fields and arguments that are *not* bound by an alias are
262//! filled with [`Default::default()`], so those types must implement [`Default`].
263//!
264//! ## Optional symbols `[...]`
265//!
266//! Square brackets mark an optional part of a rule and expand into multiple alternatives. For example
267//! `"Exprs(exprs) [SEMICOLON]"` is shorthand for the two rules `"Exprs(exprs)"` and
268//! `"Exprs(exprs) SEMICOLON"`. Brackets may be nested.
269//!
270//! # Parser options — `#[lem_opt(...)]`
271//!
272//! Lemon directives are set with `#[lem_opt(...)]` attributes placed next to the start symbol
273//! (`#[derive(LemonTree)]`). Each option takes a string value:
274//!
275//! | Option | Lemon directive | Meaning |
276//! |------------------|--------------------|-------------------------------------------------------------------------|
277//! | `token_type` | `%token_type` | The Rust type carried by every token's value (the `minor` value). |
278//! | `extra_argument` | `%extra_argument` | Type of a user value made available to all actions (see below). |
279//! | `left` | `%left` | Declare tokens left-associative (precedence increases with each line). |
280//! | `right` | `%right` | Declare tokens right-associative. |
281//! | `nonassoc` | `%nonassoc` | Declare tokens non-associative. |
282//! | `fallback` | `%fallback` | `"FALLBACK_TOK TOK_A TOK_B ..."` — fall back the listed tokens. |
283//! | `trace` | `%trace` | Print a parser trace to stderr, prefixed with the given prompt string. |
284//!
285//! Associativity / precedence options may be repeated; rules declared earlier have lower precedence
286//! than those declared later, exactly as in Lemon. Example:
287//!
288//! ```ignore
289//! #[lem_opt(token_type="f64", left="PLUS MINUS", left="DIVIDE TIMES", trace=">>")]
290//! ```
291//!
292//! # The generated parser API
293//!
294//! Deriving [`LemonTree`] on the start symbol `S` generates:
295//!
296//! * `S::get_parser(extra)` — create a parser. `extra` is the `%extra_argument` value (use `()` when
297//! no `extra_argument` is set). Its type is `<S as LemonTree>::Parser`.
298//! * `<S as LemonTree>::Token` — an enum with one variant per terminal symbol used anywhere in the
299//! grammar.
300//! * `parser.add_token(token, value)` — feed one token. `value` has the `token_type` type. Returns
301//! `Result<(), ()>`; an `Err` means a syntax error at that token.
302//! * `parser.try_add_token(token, value)` — like `add_token`, but returns `Result<bool, ()>` where
303//! `Ok(false)` indicates the token was not accepted in the current state instead of erroring.
304//! * `parser.end()` — signal end of input. Returns `Result<S, ()>`: the constructed start symbol on
305//! success, or `Err(())` on a syntax error.
306//! * `parser.extra` — public field holding the `extra_argument` value, readable and writable between
307//! tokens.
308//!
309//! You drive the parser from your own tokenizer: repeatedly call `add_token`, then call `end` to get
310//! the result. See the [`README`](https://github.com/jeremiah-shaulov/lemon-tree) for a complete
311//! calculator with a hand-written tokenizer.
312//!
313//! # The `extra_argument`
314//!
315//! Setting `#[lem_opt(extra_argument="MyType")]` gives every action access to a shared value. In
316//! `#[lem_fn]` functions, add a final argument literally named `extra` to receive it:
317//!
318//! ```ignore
319//! #[lem_fn("Expr(a) PLUS Expr(b)")]
320//! pub fn expr_plus(a: Expr, b: Expr, extra: &mut Context) -> Expr
321//! { extra.count += 1;
322//! Expr {value: a.value + b.value}
323//! }
324//! ```
325//!
326//! The value is supplied when constructing the parser via `get_parser(extra)` and is also accessible
327//! as `parser.extra`.
328//!
329//! # Constraints
330//!
331//! * All attributes describing a single parser must live in **one** Rust file.
332//! * `#[derive(LemonTree)]` (the start symbol) must be the **last** parser attribute in that file; it
333//! is what triggers parser generation.
334//! * A file may define only one parser (one `#[derive(LemonTree)]`).
335//! * Unions are not supported as symbol types.
336//!
337//! Different parsers can coexist in the same crate as long as each lives in its own file.
338//!
339//! # Cargo features
340//!
341//! These features (forwarded to `lemon-tree-derive`) help with debugging the grammar at build time:
342//!
343//! * `dump-grammar` — print the generated grammar, with Rust actions, to stderr during the build.
344//! * `dump-lemon-grammar` — print the grammar in classic Lemon `.y` syntax to stderr.
345//! * `debug-parser-to-file` — write the generated parser source to a file next to your source instead
346//! of inlining it, which makes the generated code easy to inspect.
347
348pub use lemon_tree_derive::{lem_fn, LemonTree, LemonTreeNode};
349
350/// Implemented for the parser's *start symbol* — the type that a successful parse produces.
351///
352/// You don't implement this trait by hand: annotate a struct or enum with `#[derive(LemonTree)]`
353/// and the implementation, together with a `get_parser` constructor, is generated for you. This
354/// derive must be the last parser attribute in the file, because it is what triggers parser
355/// generation (see the [crate-level documentation](crate) for the full picture).
356///
357/// The implementation provides two associated types:
358/// * [`Parser`](LemonTree::Parser) — the parser, which accepts tokens and finally returns the start symbol.
359/// * [`Token`](LemonTree::Token) — an enum whose variants are all the terminal symbols (tokens) that
360/// appear anywhere in your grammar (in `#[lem()]` and `#[lem_fn()]` attributes).
361///
362/// If you annotate a struct like this:
363///
364/// ```ignore
365/// #[derive(LemonTree)]
366/// struct Unit
367/// {
368/// }
369/// ```
370///
371/// And you have terminal symbols `HELLO` and `WORLD`, then you can:
372///
373/// ```ignore
374/// let mut parser = Unit::get_parser(()); // where () is initializer for %extra_argument
375/// // the type of parser is <Unit as LemonTree>::Parser
376/// parser.add_token(<Unit as LemonTree>::Token::HELLO, ()).unwrap();
377/// parser.add_token(<Unit as LemonTree>::Token::WORLD, ()).unwrap();
378/// let resulting_unit = parser.end().unwrap(); // returns Unit
379/// ```
380pub trait LemonTree
381{ /// The generated parser. Create one with `get_parser(extra)`, feed it tokens with
382 /// `add_token` / `try_add_token`, and call `end` to obtain the start symbol.
383 type Parser;
384
385 /// Enum of all terminal symbols (tokens) used in the grammar; pass its variants to `add_token`.
386 type Token;
387}
388
389/// Marker trait implemented by every nonterminal symbol other than the start symbol.
390///
391/// Derive it with `#[derive(LemonTreeNode)]` on a struct or enum to make that type a nonterminal
392/// of the current parser. Attach the rules that produce it with `#[lem("...")]` attributes — on the
393/// struct itself, or on each enum variant. The start symbol uses [`LemonTree`] instead.
394pub trait LemonTreeNode
395{
396}