lexpar/parser.rs
1//! This module contains parser related macros and structures.
2//!
3//! The following chapters will walk you through the basics of the `parse_rules` macro.
4//! Prior knowledge of [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar)
5//! is recommended but not necessary. If you are familiar with grammars just skip to the
6//! basic syntax section.
7//!
8//! # Introduction
9//!
10//! First of all why do we need parsers? Well, we need a way to express grammars so that we abstract
11//! away from automatas and think about grammars in a more intuitive manner. This more intuitive
12//! way would be pattern matching. Let say we have this code snippet
13//!
14//! `x = 1 + 2 * 3`
15//!
16//! Depending on the context we as humans we can easily parse this in our heads:
17//!
18//! > Assign to a variable called "x" the expression 1 + 2 * 3
19//!
20//! And as we are familiar with basic maths the expression 1 + 2 * 3 would result in the number 7.
21//! So we now have the following human expression
22//!
23//! > Assign the number 7 to a variable called "x"
24//!
25//! or shorter
26//!
27//! > Assign 7 to "x"
28//!
29//! So what we did here is we looked at the code and matched familiar patterns from what we have
30//! seen in our experience with programming languages and maths.
31//!
32//! The thing is that machines can't differentiate contexts as well as humans unless they are
33//! programmed to. For a machine this could mean
34//!
35//! > Is x equal to 9? Where 9 is (1 + 2) * 3.
36//!
37//! or just a bunch of numbers
38//!
39//! > `120 32 61 32 49 32 43 32 50 32 42 32 51`
40//!
41//! Directly using [Automatas](https://en.wikipedia.org/wiki/Automata_theory) could tell the machine
42//! how to recognize these patterns by switching between states and so on but they are not really
43//! intuitive for humans especially when the grammar we want to parse is large.
44//!
45//! This is where formal grammars and parsers come in.
46//!
47//! # Basic syntax of the parse_rules macro
48//!
49//! Say we have the following grammar
50//!
51//! > Expression -> Number<br>
52//! > Expression -> ( Expression )
53//!
54//! This means that an expression can be a single number or another expression surrounded with
55//! parenthesis. The grammar would recognize expressions like
56//!
57//! > `1`<br>
58//! > `(123)`<br>
59//! > `(((5)))`
60//!
61//! We can use the `parse_rules!` macro to express that in a Rusty way
62//!
63//! ```no_run
64//! # #[macro_use]
65//! # extern crate lexpar;
66//! enum Token {
67//! LParen,
68//! RParen,
69//! Number(u32)
70//! }
71//!
72//! # fn main() {
73//! use Token::*;
74//!
75//! parse_rules! {
76//! term: Token;
77//!
78//! expression: u32 => {
79//! [Number(value)] => value,
80//! [LParen, expr: expression, RParen] => expr
81//! }
82//! }
83//! # }
84//! ```
85//!
86//! Lets break this down. The first thing you might have noticed is `term: Token;`. What this does
87//! is it tells the macro that the type of each item (terminal) of the input that we are going to
88//! parse is `Token`.
89//!
90//! The `expression: u32 => { /* ... */ }` notation defines a new function called `expression`
91//! with a return type `u32`. So now we have a function that looks like this
92//!
93//! ```no_run
94//! fn expression(/* ... */) -> u32 {
95//! // ...
96//! # 0
97//! }
98//! ```
99//!
100//! This function is the representation of a nonterminal for a formal grammar. When called it
101//! parses the input based on whatever rules we've defined and spits out a `u32`. We'll take a look
102//! at how input is passed to the function in a moment.
103//!
104//! For now lets see what are these array looking things inside the notation
105//!
106//! ```no_run
107//! # #[macro_use]
108//! # extern crate lexpar;
109//! # enum Token { LParen, RParen, Number(u32) }
110//! # fn main() {
111//! # use Token::*;
112//! # parse_rules! { term: Token; expression: u32 => {
113//! [Number(value)] => value,
114//! [LParen, expr: expression, RParen] => expr
115//! # }}}
116//! ```
117//!
118//! Those are the rules of a nonterminal. It's very similar to Rust's `match`. Each rules is an arm
119//! of the match. They are also comma seperated but there are differences. One of them is that each
120//! rule follows this format
121//!
122//! ```ignore
123//! [ (terminal|nonterminal),+ ] => rust_expression
124//! ```
125//!
126//! Inside the brackets we can match more than just patterns (terminals), we can match
127//! function calls (nonterminals). What that means is that we can invoke another function generated
128//! by the macro and then decide what to do with the result. Example syntax would be
129//!
130//! * `Number(num)` is a destructuring pattern
131//! * `expr: expression` expands to the statement `let expr = expression(/* ... */)?`
132//!
133//! In our little example `expr: expression` is a recursive call of `expression`. We'll see why
134//! there is a `?` in the expanded statement in the next section.
135//!
136//! Extending the example into a more verbose form shows how multiple nonterminals can be defined
137//!
138//! ```no_run
139//! # #[macro_use]
140//! # extern crate lexpar;
141//! # enum Token { LParen, RParen, Number(u32) }
142//! # fn main() {
143//! # use Token::*;
144//! parse_rules! {
145//! term: Token;
146//!
147//! number: u32 => {
148//! [Number(value)] => value
149//! },
150//!
151//! expression: u32 => {
152//! [num: number] => num,
153//! [LParen, expr: expression, RParen] => expr
154//! }
155//! }
156//! # }
157//! ```
158//!
159//! # Using the parser
160//!
161//! Now we have this 'parser' generated for us but how do we use it?
162//! Well there is no `Parser` structure or anything. The macro generates the private nonterminal
163//! functions and leaves the rest to the user.
164//!
165//! Each function takes an iterator but not just any iterator - `lexpar::parser::UnshiftIter`.
166//! To ease the import and use of that exact iterator we can call any nonterminal like this
167//!
168//! ```
169//! # fn expression<I>(_: ::lexpar::parser::UnshiftIter<I>) where I: Iterator {}
170//! # enum Token { LParen, RParen, Number(u32) }
171//! # fn main() {
172//! # use Token::*;
173//! let input = vec![LParen, Number(42), RParen].into_iter();
174//!
175//! let result = expression(input.into());
176//! # }
177//! ```
178//!
179//! After we get a result we should match it for any parsing errors
180//!
181//! ```
182//! # enum Token { LParen, RParen, Number(u32) }
183//! use lexpar::parser::{self, ParseError};
184//!
185//! let result: parser::Result<u32, Token> = Ok(0u32);
186//!
187//! match result {
188//! Ok(value) => { /* Do something with the value */ },
189//! Err(ParseError::Eof) => { /* Unexpected end of input */ },
190//! Err(ParseError::Unexpected(token)) => { /* Unexpected token */ },
191//! Err(ParseError::UnexpectedRoot) => { /* Unexpected head token */ }
192//! }
193//! ```
194//!
195//! This is why in the previous section we saw a `?` in the expanded statement of a nonterminal
196//! match. It propagates parsing errors internally so you don't have to worry about them.
197//!
198//! # User defined errors
199//!
200//! If necessary a nonterminal can be defined with a return type as `Result<...>`. Then we can have
201//! more than just the normal parsing errors - we can match each nonterminal result instead of
202//! propagating it with `?` and assemble our own errors and warnings. This could be used to
203//! improve the error variants for your compiler/interpreter.
204//!
205//! Right now the parsers is limited in the way that we have to handle custom errors ourselves
206//! instead of using the macro to do it for us. This is a goal for future improvement.
207//!
208//! # Epsilon, recursion and folding
209//!
210//! What we know until now is that we can use the macro to express `and` and `or` with match arms
211//! and recursive grammars by calling the same nonterminal.
212//!
213//! There are some more complex compositions we might need to parse for example:
214//!
215//! * `?` - `zero or one`
216//! * `*` - `zero or more`
217//! * `+` - `one or more`
218//!
219//! ### Zero or one
220//!
221//! This sound like Rust's `Option`, doesn't it? Well it does.
222//!
223//! Lets see how to express that
224//!
225//! ```
226//! # #[macro_use]
227//! # extern crate lexpar;
228//! enum Token {
229//! Something
230//! }
231//!
232//! # use self::Token::*;
233//! # fn main() {
234//! parse_rules! {
235//! term: Token;
236//!
237//! zero_or_one: Option<()> => {
238//! [Token::Something] => Some(()),
239//! [@] => None
240//! }
241//! }
242//! # }
243//! ```
244//!
245//! Everything seems familiar except `[@]`. What is that?
246//! Grammars have the concept of an 'empty token'. It's usually called `epsilon` and that's how
247//! we'll call it as well. It's used to match the empty string `""` or in our case it's like an
248//! `else` arm for the nonterminal just like `_` is in match expressions.
249//!
250//! So the epsilon is a away of saying
251//!
252//! > Nothing matched so far but it's OK since we can use this default value.
253//!
254//! It can be used only as the last arm of a nonterminal.
255//!
256//! ### Zero or more (Kleene star)
257//!
258//! Now this one is a bit trickier. I'll try not to bore you with set theory
259//! so lets jump straight into the definition
260//!
261//! ```
262//! # #[macro_use]
263//! # extern crate lexpar;
264//! # enum Token { Something }
265//! # use self::Token::*;
266//! # fn main() {
267//! parse_rules! {
268//! term: Token;
269//!
270//! zero_or_more: Vec<()> => {
271//! [Something, mut acc: zero_or_more] => {
272//! acc.push(());
273//! acc
274//! },
275//! [@] => Vec::new()
276//! }
277//! }
278//! # }
279//! ```
280//!
281//! The `zero_or_more` nonterminal will give us a vector of the items we've matched or an empty
282//! vector if none. The catch is that the resulting vector will be in a reverse order of what we
283//! want since it uses recursion backtracking to accumulate the items. To handle this situation
284//! we can use one of these approaches:
285//!
286//! * Insert the items at the beginning of the vector which will have a time complexity of `O(n^2)`.
287//! * Have a second nonterminal dedicated to calling this one and reversing the result.
288//! This one is faster than the previous one being `O(n)`.
289//! * Use a special syntax to avoid recursion and use a loop internally. `O(n)` time complexity again
290//! but without stack overflows.
291//!
292//! This third approach can be achieved with
293//!
294//! ```
295//! # #[macro_use]
296//! # extern crate lexpar;
297//! # enum Token { Something }
298//! # use self::Token::*;
299//! # fn main() {
300//! parse_rules! {
301//! term: Token;
302//!
303//! #[fold(acc)]
304//! zero_or_more: Vec<()> => {
305//! [Something] => {
306//! acc.push(());
307//! acc
308//! },
309//! [@] => Vec::new()
310//! }
311//! }
312//! # }
313//! ```
314//!
315//! As a result it will use way less stack frames and contain the items in the right order.
316//! Win-win situation. Doing a little bit of code patter matching we can see that we've replaced
317//! the call to the nonterminal with a `#[fold(acc)]`. The `acc` can be any identifier we want to
318//! name our accumulator variable and it's always mutable.
319//!
320//! The down side of this is that we can't have multiple arms in the nonterminal.
321//! This might be improved in later versions of the crate. Right now it takes exactly one matching
322//! arm and an epsilon arm which is the starting value of the accumulator. A workaround would be to
323//! call another nonterminal that has the desired matching arms.
324//!
325//! ### One or more
326//!
327//! As opposing to the last section this one is pretty obvious
328//!
329//! ```
330//! # #[macro_use]
331//! # extern crate lexpar;
332//! # enum Token { Something }
333//! # use self::Token::*;
334//! # fn main() {
335//! parse_rules! {
336//! term: Token;
337//!
338//! one_or_more: Vec<()> => {
339//! [Something, mut zom: zero_or_more] => {
340//! zom.insert(0, ());
341//! zom
342//! }
343//! },
344//!
345//! #[fold(acc)]
346//! zero_or_more: Vec<()> => {
347//! [Something] => {
348//! acc.push(());
349//! acc
350//! },
351//! [@] => Vec::new()
352//! }
353//! }
354//! # }
355//! ```
356//!
357//! Some computational time could be save by creating a vector with expected or averaged capacity
358//! and inserting a `mem::uninitialized` element before pushing the folded elements. Then the
359//! first element that is matched in `one_or_more` can be inserted into the vector with
360//! `mem:replace` and the element returned from the replace must be passed to `mem::forget`
361//! to prevent Rust from dropping an uninitialized value. This is an advanced approach so if you're
362//! not familiar with what the functions do you can go and read on them in the Rust docs. In later
363//! versions of the crate this construct could receive it's own syntax for convenience.
364//!
365//! ### Synopsis
366//!
367//! Usually handling these compositions in real situations might be a bit more complex but the
368//! examples should have given you a good idea on how to approach such definitions.
369//!
370//! # Binary operators and precedence
371//!
372//! In this section we'll see how to define infix binary operators. This is always a bit of a
373//! hustle to define by hand and so the macro provides a syntax for that.
374//!
375//! Without going through all the trial and error examples lets jump straight into the working
376//! definition
377//!
378//! ```
379//! # #[macro_use]
380//! # extern crate lexpar;
381//! # fn main() {
382//! enum Ast {
383//! BinOp(String, Box<Ast>, Box<Ast>),
384//! Number(u32)
385//! }
386//!
387//! impl Ast {
388//! fn binop(op: &str, lhs: Ast, rhs: Ast) -> Self {
389//! Ast::BinOp(op.to_string(), Box::new(lhs), Box::new(rhs))
390//! }
391//! }
392//!
393//! parse_rules! {
394//! term: &'static str;
395//!
396//! #[binop(infix)]
397//! expr: Ast => _expr where u32 => |lhs, rhs| {
398//! &"==" | 0 => Ast::binop("eq", lhs, rhs),
399//! &"!=" | 0 => Ast::binop("neq", lhs, rhs),
400//! &"+" | 1 => Ast::binop("add", lhs, rhs),
401//! &"-" | 1 => Ast::binop("sub", lhs, rhs),
402//! &"*" | 2 => Ast::binop("mul", lhs, rhs),
403//! &"/" | 2 => Ast::binop("div", lhs, rhs),
404//! },
405//!
406//! _expr: Ast => {
407//! ["7"] => Ast::Number(7)
408//! },
409//! }
410//! # }
411//! ```
412//!
413//! Well this is a complex one... Lets break it down.
414//!
415//! Firstly `#[binop(infix)]` triggers the special syntax.
416//! Then there are three things to observe:
417//!
418//! 1. Naming the nonterminal and giving it a type `expr: Ast`.
419//! 2. Giving the name of the nonterminal used for the left and right-hand-sides as `_expr`. It needs to have
420//! the same type as the this nonterminal.
421//! 3. Declaring the precedence type as `u32`. The type can be anything with ordering but preferably
422//! stick to unsigned integers.
423//!
424//! After that we get to the actual rules. The closure looking syntax describes how to handle the
425//! different operators. `lhs` and `rhs` can be any identifiers and are respectively the
426//! left-hand-side and the right-hand-side of the operator. The 'body' contains some `match` looking
427//! arms with the difference that we use `| <value>` which states the operator precedence. Lower
428//! precedence means that the operator will be processed later than operators with higher
429//! precedence.
430//!
431//! Notice the `&`s. This is because the internals take references to the original terms.
432//!
433//! # Unary operators
434//!
435//! Defining unary operators is fairly simple. The only thing to consider is giving them precedence
436//! but that can be achieved by nonterminal call arrangements. Lets take a look
437//!
438//! ### Prefix unary operators
439//!
440//! ```
441//! # #[macro_use]
442//! # extern crate lexpar;
443//! # fn main() {
444//! parse_rules! {
445//! term: &'static str;
446//!
447//! prefix: bool => {
448//! [op: not] => op
449//! },
450//!
451//! not: bool => {
452//! ["!", expr: expr] => !expr
453//! },
454//!
455//! expr: bool => {
456//! ["true"] => true,
457//! ["false"] => false,
458//! },
459//! }
460//! # }
461//! ```
462//!
463//! ### Postfix unary operators
464//!
465//! ```
466//! # #[macro_use]
467//! # extern crate lexpar;
468//! # fn main() {
469//! parse_rules! {
470//! term: &'static str;
471//!
472//! postfix: bool => {
473//! [expr: expr, op] => match op {
474//! "?" => expr.is_ok(),
475//! _ => false
476//! }
477//! },
478//!
479//! expr: Result<(), ()> => {
480//! ["Ok"] => Ok(()),
481//! ["Err"] => Err(()),
482//! },
483//! }
484//! # }
485//! ```
486//!
487//! # Custom handlers
488//!
489//! In the end if we want to create our own custom nonterminal we can use this special syntax
490//!
491//! ```
492//! # #[macro_use]
493//! # extern crate lexpar;
494//! # fn main() {
495//! parse_rules! {
496//! term: ();
497//!
498//! my_handler: () => |iter| {
499//! /* handler logic goes here */
500//! Ok(())
501//! }
502//! }
503//! # }
504//! ```
505//!
506//! The `iter` is a mutable reference to the `UnshiftableIter` that the internals use.
507//! As opposing to the normal nonterminals we need to handle result propagation from other
508//! nonterminals called in the handle and also wrap our result in `Ok` or `Err` which gives
509//! us control over the error handling.
510//!
511//! # Debugging
512//!
513//! As one of the future goals for the crate is to have a parser debug mode to know what exactly
514//! failed and where. This would lower the time spent on debugging your grammar.
515//!
516//! For now here are some approaches to making debugging more pleasant:
517//!
518//! ### Common mistakes
519//!
520//! * Forgotten commas (`,`) between rules or nonterminals
521//! * Forgotten arrows (`=>`)
522//!
523//! ### EchoIterator
524//!
525//! You can create an echo iterator with a simple map to print each term that is used
526//!
527//! ```ignore
528//! iter.map(|term| {
529//! println!("{:?}", term);
530//! term
531//! })
532//! ```
533//!
534//! # Examples
535//!
536//! Run the example projects with `cargo run --example <name>` where name is `ml` or `rust`.
537//!
538//! Some more examples can be found under the form of integration tests.
539//!
540
541use std::iter::Peekable;
542
543/// Common errors that occur during parsing.
544///
545/// The nonterminals generated by `parse_rules` handle these errors by default.
546#[derive(Debug, Clone, PartialEq)]
547pub enum ParseError<T> {
548 UnexpectedRoot,
549 Unexpected(T),
550 Eof,
551}
552
553/// The result type returned by any nonterminal.
554pub type Result<P, T> = ::std::result::Result<P, ParseError<T>>;
555
556/// Unshiftable interator.
557///
558/// Can unshift one element back into the iterator as the next element to be iterated.
559pub struct UnshiftIter<I> where I: Iterator {
560 iter: Peekable<I>,
561 head: Option<I::Item>
562}
563
564impl<I> From<I> for UnshiftIter<I> where I: Iterator {
565 fn from(iter: I) -> Self {
566 Self {
567 iter: iter.peekable(),
568 head: None
569 }
570 }
571}
572
573impl<I> Iterator for UnshiftIter<I> where I: Iterator {
574 type Item = I::Item;
575
576 fn next(&mut self) -> Option<Self::Item> {
577 if self.head.is_some() {
578 self.head.take()
579 } else {
580 self.iter.next()
581 }
582 }
583}
584
585impl<I> UnshiftIter<I> where I: Iterator {
586 pub fn unshift(&mut self, item: I::Item) {
587 self.head = Some(item);
588 }
589
590 pub fn peek(&mut self) -> Option<&I::Item> {
591 match self.head {
592 Some(ref item) => Some(item),
593 None => self.iter.peek()
594 }
595 }
596}
597
598/// Macro that generates a parser from a formal grammar.
599///
600/// The macro does not yet account for:
601///
602/// * Left Factoring (`A -> qB | qC`)
603/// * Left Recursion (`A -> Aq` (direct) and `A -> Bq B -> Ar` (indirect))
604///
605/// ### Crude example syntax
606///
607/// ```no_run
608/// # #[macro_use]
609/// # extern crate lexpar;
610///
611/// #[derive(Debug, PartialEq)]
612/// enum Token {
613/// LParen,
614/// RParen,
615/// Number(u32),
616/// Ident(String)
617/// }
618///
619/// # fn main() {
620/// use Token::*;
621///
622/// parse_rules! {
623/// term: Token;
624/// ident: () => {
625/// [Ident(name)] => {}
626/// },
627/// expr: () => {
628/// [LParen, ex: expr, RParen] => {},
629/// [id: ident] => {},
630/// [Number(n)] => {}
631/// },
632/// eps: () => {
633/// [@] => { /* Epsilon */ }
634/// },
635/// handle: () => |iter| Ok({ /* Custom code */ })
636/// }
637/// # }
638/// ```
639#[macro_export]
640macro_rules! parse_rules {
641 {
642 term: $term_type: ty;
643 $($nonterm_def: tt)+
644 } => {
645 parse_rules! {
646 // `iter` is necessary to be passed so that each arm has iter in its macro scope
647 @NONTERM __iter; $term_type;
648 $($nonterm_def)+
649 }
650 };
651
652 // Loop nonterms
653 //
654 // <nonterm>: <type> => { ... }
655 {
656 @NONTERM $iter: ident; $term_type: ty;
657
658 $nonterm: ident : $ret_type: ty => {
659 $( [$($rule_token: tt)*] => $logic: expr ),+
660 $(,)*
661 },
662 $($nonterm_def: tt)+
663 } => {
664 parse_rules!(@NONTERM $iter; $term_type; $nonterm : $ret_type => {
665 $( [$($rule_token)*] => $logic ),+
666 });
667
668 parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
669 };
670
671 // Loop nonterm handlers
672 //
673 // <nonterm>: <type> => |<iter>| { ... }
674 {
675 @NONTERM $iter: ident; $term_type: ty;
676
677 $nonterm: ident : $ret_type: ty => |$iter_name: ident| $logic: expr,
678 $($nonterm_def: tt)+
679 } => {
680 parse_rules!(@NONTERM $iter; $term_type; $nonterm : $ret_type => |$iter_name| $logic);
681
682 parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
683 };
684
685 // Loop folds
686 //
687 // #[fold(<acc>)] <nonterm>: <type> => { ... }
688 {
689 @NONTERM $iter: ident; $term_type: ty;
690
691 #[fold($acc: ident)]
692 $nonterm: ident : $ret_type: ty => {
693 [$($rule_token: tt)*] => $logic: expr,
694 [@] => $acc_expr: expr
695 $(,)*
696 },
697 $($nonterm_def: tt)+
698 } => {
699 parse_rules!(@NONTERM $iter; $term_type; #[fold($acc)] $nonterm : $ret_type => {
700 [$($rule_token)*] => $logic,
701 [@] => $acc_expr
702 });
703
704 parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
705 };
706
707 // Loop bin ops
708 //
709 // #[binop(<affix>)] <nonterm>: <type> => { ... }
710 {
711 @NONTERM $iter: ident; $term_type: ty;
712
713 #[binop($affix: ident)]
714 $nonterm: ident : $prim_type: ty => $primary: ident where
715 $prec_type :ty => |$lhs: ident, $rhs: ident| { $($binop_def: tt)+ },
716 $($nonterm_def: tt)+
717 } => {
718 parse_rules!(@NONTERM $iter; $term_type; #[binop($affix)]
719 $nonterm: $prim_type => $primary where $prec_type => |$lhs, $rhs| { $($binop_def)+ });
720
721 parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
722 };
723
724 // Nonterm rules
725 //
726 // <nonterm>: <type> => { arms+ }
727 {
728 @NONTERM $iter: ident; $term_type: ty;
729
730 $nonterm: ident : $ret_type: ty => {
731 $( [$($rule_token: tt)*] => $logic: expr ),+
732 $(,)*
733 } $(,)*
734 } => {
735 // Disable the warning since Epsilon does not use iter
736 #[allow(unused_variables)]
737 fn $nonterm<I>($iter: &mut ::lexpar::parser::UnshiftIter<I>)
738 -> ::lexpar::parser::Result<$ret_type, $term_type>
739 where I: Iterator<Item = $term_type>
740 {
741 $(parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);)*
742
743 #[allow(unreachable_code)]
744 match $iter.peek() {
745 Some(_) => Err(::lexpar::parser::ParseError::UnexpectedRoot),
746 None => Err(::lexpar::parser::ParseError::Eof)
747 }
748 }
749 };
750
751 // Nonterm handler
752 //
753 // |iter| => expr
754 {
755 @NONTERM $iter: ident; $term_type: ty;
756
757 $nonterm: ident : $ret_type: ty => |$iter_name: ident| $logic: expr $(,)*
758 } => {
759 fn $nonterm<I>($iter_name: &mut ::lexpar::parser::UnshiftIter<I>)
760 -> ::lexpar::parser::Result<$ret_type, $term_type>
761 where I: Iterator<Item = $term_type>
762 {
763 $logic
764 }
765 };
766
767 // Fold syntax
768 //
769 // #[fold(<acc>)] <nonterm>: <type> => {
770 // [ rules+ ] => logic,
771 // [@] => start_acc
772 // }
773 {
774 @NONTERM $iter: ident; $term_type: ty;
775
776 #[fold($acc: ident)]
777 $nonterm: ident : $ret_type: ty => {
778 [$($rule_token: tt)*] => $logic: expr,
779 [@] => $acc_expr: expr
780 $(,)*
781 } $(,)*
782 } => {
783 #[allow(unused_variables)]
784 parse_rules!(@NONTERM $iter; $term_type; $nonterm: $ret_type => |$iter| {
785 use ::lexpar::parser::{self, UnshiftIter};
786
787 type ParserResult = parser::Result<$ret_type, $term_type>;
788
789 fn matcher_root<I>($iter: &mut UnshiftIter<I>, $acc: $ret_type) -> ParserResult
790 where I: Iterator<Item = $term_type>
791 {
792 #[allow(unused_mut)]
793 let mut $acc = $acc;
794
795 parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);
796
797 #[allow(unreachable_code)]
798 Ok($acc)
799 }
800
801 fn matcher<I>($iter: &mut UnshiftIter<I>, $acc: $ret_type, __ur: &mut bool) -> ParserResult
802 where I: Iterator<Item = $term_type>
803 {
804 #[allow(unused_mut)]
805 let mut $acc = $acc;
806
807 parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);
808
809 if $iter.peek().is_some() {
810 *__ur = true;
811 }
812
813 #[allow(unreachable_code)]
814 Ok($acc)
815 }
816
817 let mut acc = $acc_expr;
818 let mut unexpected_root = false;
819
820 macro_rules! matcher {
821 ($matcher: expr => $end_cond: expr) => {
822 match $matcher {
823 Ok(res) => {
824 if $end_cond {
825 return Ok(res);
826 } else {
827 acc = res;
828 }
829 },
830 Err(err) => return Err(err)
831 }
832 };
833 }
834
835 matcher!((matcher_root)($iter, acc) => $iter.peek().is_none());
836
837 loop {
838 matcher!{
839 (matcher)($iter, acc, &mut unexpected_root) =>
840 $iter.peek().is_none() || unexpected_root
841 }
842 }
843 });
844 };
845
846 // Infix binop syntax
847 //
848 // #[binop(infix)] <nonterm>: <type> => <primary_nonterm>
849 // where <prec_type> => |<lhs>, <rhs>| {
850 // (<op> | <precedence> => logic),+
851 // }
852 {
853 @NONTERM $iter: ident; $term_type: ty;
854
855 #[binop($affix: ident)]
856 $nonterm: ident : $prim_type: ty => $primary: ident where
857 $prec_type: ty => |$lhs: ident, $rhs: ident| {
858 $($op: pat | $precedence: expr => $logic: expr),+
859 $(,)*
860 }
861 $(,)*
862 } => {
863 /*
864 parse_expression ()
865 return parse_binop (parse_primary (), 0)
866
867 parse_binop (lhs, min_precedence)
868 lookahead := peek next token
869 while lookahead is a binary operator whose precedence is >= min_precedence
870 op := lookahead
871 advance to next token
872 rhs := parse_primary ()
873 lookahead := peek next token
874 while lookahead is a binary operator whose precedence is greater
875 than op's, or a right-associative operator
876 whose precedence is equal to op's
877 rhs := parse_binop (rhs, lookahead's precedence)
878 lookahead := peek next token
879 lhs := the result of applying op with operands lhs and rhs
880 return lhs
881 */
882
883 parse_rules!(@NONTERM $iter; $term_type; $nonterm: $prim_type => |$iter| {
884 use ::lexpar::parser::{self, UnshiftIter};
885 use ::lexpar::parser::ParseError::*;
886
887 type ParserResult = parser::Result<$prim_type, $term_type>;
888
889 fn prec(term: &$term_type) -> Option<$prec_type> {
890 #[allow(unused_variables)]
891 match term {
892 $($op => Some($precedence)),+,
893 _ => None
894 }
895 }
896
897 fn eval(__term: &$term_type, $lhs: $prim_type, $rhs: $prim_type) -> $prim_type {
898 match __term {
899 $($op => $logic),+,
900 _ => unreachable!()
901 }
902 }
903
904 fn parse_binop<I>($iter: &mut UnshiftIter<I>, mut lhs: $prim_type, min_prec: $prec_type)
905 -> ParserResult where I: Iterator<Item = $term_type>
906 {
907 while let Some(la) = $iter.next() {
908 match prec(&la) {
909 Some(precedence) if precedence >= min_prec => {
910 let op = la;
911 let mut rhs = match $primary($iter) {
912 Ok(rhs) => rhs,
913 Err(Eof) | Err(UnexpectedRoot) => break,
914 Err(err) => return Err(err)
915 };
916
917 while let Some(la_inner) = $iter.next() {
918 match prec(&la_inner) {
919 Some(next_prec) if next_prec > precedence => {
920 $iter.unshift(la_inner);
921 rhs = match parse_binop($iter, rhs, next_prec) {
922 Ok(rhs) => rhs,
923 Err(err) => return Err(err)
924 };
925 },
926 _ => {
927 $iter.unshift(la_inner);
928 break;
929 }
930 }
931 }
932
933 lhs = eval(&op, lhs, rhs);
934 },
935 _ => {
936 $iter.unshift(la);
937 break;
938 }
939 }
940 }
941
942 Ok(lhs)
943 }
944
945 // with the + repetition it's guaranteed that there will be at least one element
946 let min_prec = vec![ $($precedence),+ ].into_iter().min().unwrap();
947
948 let lhs = $primary($iter)?;
949 parse_binop($iter, lhs, min_prec)
950 });
951 };
952
953 // Epsilon
954 {
955 @ROOT_RULE $iter: ident; $term_type: ty;
956
957 @ => $logic: expr
958 } => {
959 return Ok($logic);
960 };
961
962 // First rule and more rules
963 {
964 @ROOT_RULE $iter: ident; $term_type: ty;
965
966 $term: pat, $($rule_token: tt)+
967 } => {
968 let item = $iter.next();
969
970 match item {
971 Some($term) => {
972 return parse_rules!(@RULE $iter; $($rule_token)+);
973 },
974 // Skip to the next branch of the nonterm
975 Some(_) => {
976 $iter.unshift(item.unwrap());
977 },
978 // Let the nonterm handle the eof
979 // This is so we can enter an Epsilon root rule if it exists
980 None => ()
981 }
982 };
983
984 // Only rule
985 {
986 @ROOT_RULE $iter: ident; $term_type: ty;
987
988 $term: pat => $logic: expr
989 } => {
990 let item = $iter.next();
991
992 match item {
993 Some($term) => {
994 return Ok($logic);
995 },
996 // Skip to the next branch of the nonterm
997 Some(_) => {
998 $iter.unshift(item.unwrap());
999 },
1000 // Let the nonterm handle the eof
1001 // This is so we can enter an Epsilon root rule if it exists
1002 None => ()
1003 }
1004 };
1005
1006 // First nonterm and more rules
1007 {
1008 @ROOT_RULE $iter: ident; $term_type: ty;
1009
1010 // A hack to allow the mut specifier
1011 $($id: ident)+ : $nonterm: expr, $($rule_token: tt)+
1012 } => {
1013 let __temp = $nonterm($iter);
1014
1015 if let Err(::lexpar::parser::ParseError::UnexpectedRoot) = __temp {
1016 // Skip to the next branch of the nonterm
1017 } else {
1018 let $($id)+ = __temp?;
1019 return parse_rules!(@RULE $iter; $($rule_token)+);
1020 }
1021 };
1022
1023 // Only nonterm
1024 {
1025 @ROOT_RULE $iter: ident; $term_type: ty;
1026
1027 // A hack to allow the mut specifier
1028 $($id: ident)+ : $nonterm: expr => $logic: expr
1029 } => {
1030 let __temp = $nonterm($iter);
1031
1032 if let Err(::lexpar::parser::ParseError::UnexpectedRoot) = __temp {
1033 // Skip to the next branch of the nonterm
1034 } else {
1035 let $($id)+ = __temp?;
1036 return Ok($logic);
1037 }
1038 };
1039
1040 // One and more rules
1041 {
1042 @RULE $iter: ident;
1043
1044 $term: pat, $($rule_token: tt)+
1045 } => {
1046 match $iter.next() {
1047 Some($term) => {
1048 parse_rules!(@RULE $iter; $($rule_token)+)
1049 },
1050 Some(u) => Err(::lexpar::parser::ParseError::Unexpected(u)),
1051 None => Err(::lexpar::parser::ParseError::Eof)
1052 }
1053 };
1054
1055 // Last rule
1056 {
1057 @RULE $iter: ident;
1058
1059 $term: pat => $logic: expr
1060 } => {
1061 match $iter.next() {
1062 Some($term) => Ok($logic),
1063 Some(u) => Err(::lexpar::parser::ParseError::Unexpected(u)),
1064 None => Err(::lexpar::parser::ParseError::Eof)
1065 }
1066 };
1067
1068 // Nonterm and more rules
1069 {
1070 @RULE $iter: ident;
1071
1072 // A hack to allow the mut specifier
1073 $($id: ident)+ : $nonterm: expr, $($rule_token: tt)+
1074 } => {
1075 {
1076 #[allow(unused_variables)]
1077 let $($id)+ = $nonterm($iter)?;
1078
1079 parse_rules!(@RULE $iter; $($rule_token)+)
1080 }
1081 };
1082
1083 // Last nonterm
1084 {
1085 @RULE $iter: ident;
1086
1087 // A hack to allow the mut specifier
1088 $($id: ident)+ : $nonterm: expr => $logic: expr
1089 } => {
1090 {
1091 #[allow(unused_variables)]
1092 let $($id)+ = $nonterm($iter)?;
1093
1094 Ok($logic)
1095 }
1096 };
1097}