lexpar/
parser.rs

1//! This module contains parser related macros and structures.
2//!
3//! The following chapters will walk you through the basics of the `parse_rules` macro.
4//! Prior knowledge of [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar)
5//! is recommended but not necessary. If you are familiar with grammars just skip to the
6//! basic syntax section.
7//!
8//! # Introduction
9//!
10//! First of all why do we need parsers? Well, we need a way to express grammars so that we abstract
11//! away from automatas and think about grammars in a more intuitive manner. This more intuitive
12//! way would be pattern matching. Let say we have this code snippet
13//!
14//! `x = 1 + 2 * 3`
15//!
16//! Depending on the context we as humans we can easily parse this in our heads:
17//!
18//! > Assign to a variable called "x" the expression 1 + 2 * 3
19//!
20//! And as we are familiar with basic maths the expression 1 + 2 * 3 would result in the number 7.
21//! So we now have the following human expression
22//!
23//! > Assign the number 7 to a variable called "x"
24//!
25//! or shorter
26//!
27//! > Assign 7 to "x"
28//!
29//! So what we did here is we looked at the code and matched familiar patterns from what we have
30//! seen in our experience with programming languages and maths.
31//!
32//! The thing is that machines can't differentiate contexts as well as humans unless they are
33//! programmed to. For a machine this could mean
34//!
35//! > Is x equal to 9? Where 9 is (1 + 2) * 3.
36//!
37//! or just a bunch of numbers
38//!
39//! > `120 32 61 32 49 32 43 32 50 32 42 32 51`
40//!
41//! Directly using [Automatas](https://en.wikipedia.org/wiki/Automata_theory) could tell the machine
42//! how to recognize these patterns by switching between states and so on but they are not really
43//! intuitive for humans especially when the grammar we want to parse is large.
44//!
45//! This is where formal grammars and parsers come in.
46//!
47//! # Basic syntax of the parse_rules macro
48//!
49//! Say we have the following grammar
50//!
51//! > Expression -> Number<br>
52//! > Expression -> ( Expression )
53//!
54//! This means that an expression can be a single number or another expression surrounded with
55//! parenthesis. The grammar would recognize expressions like
56//!
57//! > `1`<br>
58//! > `(123)`<br>
59//! > `(((5)))`
60//!
61//! We can use the `parse_rules!` macro to express that in a Rusty way
62//!
63//! ```no_run
64//! # #[macro_use]
65//! # extern crate lexpar;
66//! enum Token {
67//!     LParen,
68//!     RParen,
69//!     Number(u32)
70//! }
71//!
72//! # fn main() {
73//! use Token::*;
74//!
75//! parse_rules! {
76//!     term: Token;
77//!
78//!     expression: u32 => {
79//!         [Number(value)] => value,
80//!         [LParen, expr: expression, RParen] => expr
81//!     }
82//! }
83//! # }
84//! ```
85//!
86//! Lets break this down. The first thing you might have noticed is `term: Token;`. What this does
87//! is it tells the macro that the type of each item (terminal) of the input that we are going to
88//! parse is `Token`.
89//!
90//! The `expression: u32 => { /* ... */ }` notation defines a new function called `expression`
91//! with a return type `u32`. So now we have a function that looks like this
92//!
93//! ```no_run
94//! fn expression(/* ... */) -> u32 {
95//!     // ...
96//! # 0
97//! }
98//! ```
99//!
100//! This function is the representation of a nonterminal for a formal grammar. When called it
101//! parses the input based on whatever rules we've defined and spits out a `u32`. We'll take a look
102//! at how input is passed to the function in a moment.
103//!
104//! For now lets see what are these array looking things inside the notation
105//!
106//! ```no_run
107//! # #[macro_use]
108//! # extern crate lexpar;
109//! # enum Token { LParen, RParen, Number(u32) }
110//! # fn main() {
111//! # use Token::*;
112//! # parse_rules! { term: Token; expression: u32 => {
113//! [Number(value)] => value,
114//! [LParen, expr: expression, RParen] => expr
115//! # }}}
116//! ```
117//!
118//! Those are the rules of a nonterminal. It's very similar to Rust's `match`. Each rules is an arm
119//! of the match. They are also comma seperated but there are differences. One of them is that each
120//! rule follows this format
121//!
122//! ```ignore
123//! [ (terminal|nonterminal),+ ] => rust_expression
124//! ```
125//!
126//! Inside the brackets we can match more than just patterns (terminals), we can match
127//! function calls (nonterminals). What that means is that we can invoke another function generated
128//! by the macro and then decide what to do with the result. Example syntax would be
129//!
130//! * `Number(num)` is a destructuring pattern
131//! * `expr: expression` expands to the statement `let expr = expression(/* ... */)?`
132//!
133//! In our little example `expr: expression` is a recursive call of `expression`. We'll see why
134//! there is a `?` in the expanded statement in the next section.
135//!
136//! Extending the example into a more verbose form shows how multiple nonterminals can be defined
137//!
138//! ```no_run
139//! # #[macro_use]
140//! # extern crate lexpar;
141//! # enum Token { LParen, RParen, Number(u32) }
142//! # fn main() {
143//! # use Token::*;
144//! parse_rules! {
145//!     term: Token;
146//!
147//!     number: u32 => {
148//!         [Number(value)] => value
149//!     },
150//!
151//!     expression: u32 => {
152//!         [num: number] => num,
153//!         [LParen, expr: expression, RParen] => expr
154//!     }
155//! }
156//! # }
157//! ```
158//!
159//! # Using the parser
160//!
161//! Now we have this 'parser' generated for us but how do we use it?
162//! Well there is no `Parser` structure or anything. The macro generates the private nonterminal
163//! functions and leaves the rest to the user.
164//!
165//! Each function takes an iterator but not just any iterator - `lexpar::parser::UnshiftIter`.
166//! To ease the import and use of that exact iterator we can call any nonterminal like this
167//!
168//! ```
169//! # fn expression<I>(_: ::lexpar::parser::UnshiftIter<I>) where I: Iterator {}
170//! # enum Token { LParen, RParen, Number(u32) }
171//! # fn main() {
172//! # use Token::*;
173//! let input = vec![LParen, Number(42), RParen].into_iter();
174//!
175//! let result = expression(input.into());
176//! # }
177//! ```
178//!
179//! After we get a result we should match it for any parsing errors
180//!
181//! ```
182//! # enum Token { LParen, RParen, Number(u32) }
183//! use lexpar::parser::{self, ParseError};
184//!
185//! let result: parser::Result<u32, Token> = Ok(0u32);
186//!
187//! match result {
188//!     Ok(value) => { /* Do something with the value */ },
189//!     Err(ParseError::Eof) => { /* Unexpected end of input */ },
190//!     Err(ParseError::Unexpected(token)) => { /* Unexpected token */ },
191//!     Err(ParseError::UnexpectedRoot) => { /* Unexpected head token */ }
192//! }
193//! ```
194//!
195//! This is why in the previous section we saw a `?` in the expanded statement of a nonterminal
196//! match. It propagates parsing errors internally so you don't have to worry about them.
197//!
198//! # User defined errors
199//!
200//! If necessary a nonterminal can be defined with a return type as `Result<...>`. Then we can have
201//! more than just the normal parsing errors - we can match each nonterminal result instead of
202//! propagating it with `?` and assemble our own errors and warnings. This could be used to
203//! improve the error variants for your compiler/interpreter.
204//!
205//! Right now the parsers is limited in the way that we have to handle custom errors ourselves
206//! instead of using the macro to do it for us. This is a goal for future improvement.
207//!
208//! # Epsilon, recursion and folding
209//!
210//! What we know until now is that we can use the macro to express `and` and `or` with match arms
211//! and recursive grammars by calling the same nonterminal.
212//!
213//! There are some more complex compositions we might need to parse for example:
214//!
215//! * `?` - `zero or one`
216//! * `*` - `zero or more`
217//! * `+` - `one or more`
218//!
219//! ### Zero or one
220//!
221//! This sound like Rust's `Option`, doesn't it? Well it does.
222//!
223//! Lets see how to express that
224//!
225//! ```
226//! # #[macro_use]
227//! # extern crate lexpar;
228//! enum Token {
229//!     Something
230//! }
231//!
232//! # use self::Token::*;
233//! # fn main() {
234//! parse_rules! {
235//!     term: Token;
236//!
237//!     zero_or_one: Option<()> => {
238//!         [Token::Something] => Some(()),
239//!         [@] => None
240//!     }
241//! }
242//! # }
243//! ```
244//!
245//! Everything seems familiar except `[@]`. What is that?
246//! Grammars have the concept of an 'empty token'. It's usually called `epsilon` and that's how
247//! we'll call it as well. It's used to match the empty string `""` or in our case it's like an
248//! `else` arm for the nonterminal just like `_` is in match expressions.
249//!
250//! So the epsilon is a away of saying
251//!
252//! > Nothing matched so far but it's OK since we can use this default value.
253//!
254//! It can be used only as the last arm of a nonterminal.
255//!
256//! ### Zero or more (Kleene star)
257//!
258//! Now this one is a bit trickier. I'll try not to bore you with set theory
259//! so lets jump straight into the definition
260//!
261//! ```
262//! # #[macro_use]
263//! # extern crate lexpar;
264//! # enum Token { Something }
265//! # use self::Token::*;
266//! # fn main() {
267//! parse_rules! {
268//!     term: Token;
269//!
270//!     zero_or_more: Vec<()> => {
271//!         [Something, mut acc: zero_or_more] => {
272//!             acc.push(());
273//!             acc
274//!         },
275//!         [@] => Vec::new()
276//!     }
277//! }
278//! # }
279//! ```
280//!
281//! The `zero_or_more` nonterminal will give us a vector of the items we've matched or an empty
282//! vector if none. The catch is that the resulting vector will be in a reverse order of what we
283//! want since it uses recursion backtracking to accumulate the items. To handle this situation
284//! we can use one of these approaches:
285//!
286//! * Insert the items at the beginning of the vector which will have a time complexity of `O(n^2)`.
287//! * Have a second nonterminal dedicated to calling this one and reversing the result.
288//! This one is faster than the previous one being `O(n)`.
289//! * Use a special syntax to avoid recursion and use a loop internally. `O(n)` time complexity again
290//! but without stack overflows.
291//!
292//! This third approach can be achieved with
293//!
294//! ```
295//! # #[macro_use]
296//! # extern crate lexpar;
297//! # enum Token { Something }
298//! # use self::Token::*;
299//! # fn main() {
300//! parse_rules! {
301//!     term: Token;
302//!
303//!     #[fold(acc)]
304//!     zero_or_more: Vec<()> => {
305//!         [Something] => {
306//!             acc.push(());
307//!             acc
308//!         },
309//!         [@] => Vec::new()
310//!     }
311//! }
312//! # }
313//! ```
314//!
315//! As a result it will use way less stack frames and contain the items in the right order.
316//! Win-win situation. Doing a little bit of code patter matching we can see that we've replaced
317//! the call to the nonterminal with a `#[fold(acc)]`. The `acc` can be any identifier we want to
318//! name our accumulator variable and it's always mutable.
319//!
320//! The down side of this is that we can't have multiple arms in the nonterminal.
321//! This might be improved in later versions of the crate. Right now it takes exactly one matching
322//! arm and an epsilon arm which is the starting value of the accumulator. A workaround would be to
323//! call another nonterminal that has the desired matching arms.
324//!
325//! ### One or more
326//!
327//! As opposing to the last section this one is pretty obvious
328//!
329//! ```
330//! # #[macro_use]
331//! # extern crate lexpar;
332//! # enum Token { Something }
333//! # use self::Token::*;
334//! # fn main() {
335//! parse_rules! {
336//!     term: Token;
337//!
338//!     one_or_more: Vec<()> => {
339//!         [Something, mut zom: zero_or_more] => {
340//!             zom.insert(0, ());
341//!             zom
342//!         }
343//!     },
344//!
345//!     #[fold(acc)]
346//!     zero_or_more: Vec<()> => {
347//!         [Something] => {
348//!             acc.push(());
349//!             acc
350//!         },
351//!         [@] => Vec::new()
352//!     }
353//! }
354//! # }
355//! ```
356//!
357//! Some computational time could be save by creating a vector with expected or averaged capacity
358//! and inserting a `mem::uninitialized` element before pushing the folded elements. Then the
359//! first element that is matched in `one_or_more` can be inserted into the vector with
360//! `mem:replace` and the element returned from the replace must be passed to `mem::forget`
361//! to prevent Rust from dropping an uninitialized value. This is an advanced approach so if you're
362//! not familiar with what the functions do you can go and read on them in the Rust docs. In later
363//! versions of the crate this construct could receive it's own syntax for convenience.
364//!
365//! ### Synopsis
366//!
367//! Usually handling these compositions in real situations might be a bit more complex but the
368//! examples should have given you a good idea on how to approach such definitions.
369//!
370//! # Binary operators and precedence
371//!
372//! In this section we'll see how to define infix binary operators. This is always a bit of a
373//! hustle to define by hand and so the macro provides a syntax for that.
374//!
375//! Without going through all the trial and error examples lets jump straight into the working
376//! definition
377//!
378//! ```
379//! # #[macro_use]
380//! # extern crate lexpar;
381//! # fn main() {
382//! enum Ast {
383//!     BinOp(String, Box<Ast>, Box<Ast>),
384//!     Number(u32)
385//! }
386//!
387//! impl Ast {
388//!     fn binop(op: &str, lhs: Ast, rhs: Ast) -> Self {
389//!         Ast::BinOp(op.to_string(), Box::new(lhs), Box::new(rhs))
390//!     }
391//! }
392//!
393//! parse_rules! {
394//!     term: &'static str;
395//!
396//!     #[binop(infix)]
397//!     expr: Ast => _expr where u32 => |lhs, rhs| {
398//!         &"=="  | 0 => Ast::binop("eq", lhs, rhs),
399//!         &"!="  | 0 => Ast::binop("neq", lhs, rhs),
400//!         &"+"   | 1 => Ast::binop("add", lhs, rhs),
401//!         &"-"   | 1 => Ast::binop("sub", lhs, rhs),
402//!         &"*"   | 2 => Ast::binop("mul", lhs, rhs),
403//!         &"/"   | 2 => Ast::binop("div", lhs, rhs),
404//!     },
405//!
406//!     _expr: Ast => {
407//!         ["7"] => Ast::Number(7)
408//!     },
409//! }
410//! # }
411//! ```
412//!
413//! Well this is a complex one... Lets break it down.
414//!
415//! Firstly `#[binop(infix)]` triggers the special syntax.
416//! Then there are three things to observe:
417//!
418//! 1. Naming the nonterminal and giving it a type `expr: Ast`.
419//! 2. Giving the name of the nonterminal used for the left and right-hand-sides as `_expr`. It needs to have
420//! the same type as the this nonterminal.
421//! 3. Declaring the precedence type as `u32`. The type can be anything with ordering but preferably
422//! stick to unsigned integers.
423//!
424//! After that we get to the actual rules. The closure looking syntax describes how to handle the
425//! different operators. `lhs` and `rhs` can be any identifiers and are respectively the
426//! left-hand-side and the right-hand-side of the operator. The 'body' contains some `match` looking
427//! arms with the difference that we use `| <value>` which states the operator precedence. Lower
428//! precedence means that the operator will be processed later than operators with higher
429//! precedence.
430//!
431//! Notice the `&`s. This is because the internals take references to the original terms.
432//!
433//! # Unary operators
434//!
435//! Defining unary operators is fairly simple. The only thing to consider is giving them precedence
436//! but that can be achieved by nonterminal call arrangements. Lets take a look
437//!
438//! ### Prefix unary operators
439//!
440//! ```
441//! # #[macro_use]
442//! # extern crate lexpar;
443//! # fn main() {
444//! parse_rules! {
445//!     term: &'static str;
446//!
447//!     prefix: bool => {
448//!         [op: not] => op
449//!     },
450//!
451//!     not: bool => {
452//!         ["!", expr: expr] => !expr
453//!     },
454//!
455//!     expr: bool => {
456//!         ["true"] => true,
457//!         ["false"] => false,
458//!     },
459//! }
460//! # }
461//! ```
462//!
463//! ### Postfix unary operators
464//!
465//! ```
466//! # #[macro_use]
467//! # extern crate lexpar;
468//! # fn main() {
469//! parse_rules! {
470//!     term: &'static str;
471//!
472//!     postfix: bool => {
473//!         [expr: expr, op] => match op {
474//!             "?" => expr.is_ok(),
475//!             _ => false
476//!         }
477//!     },
478//!
479//!     expr: Result<(), ()> => {
480//!         ["Ok"] => Ok(()),
481//!         ["Err"] => Err(()),
482//!     },
483//! }
484//! # }
485//! ```
486//!
487//! # Custom handlers
488//!
489//! In the end if we want to create our own custom nonterminal we can use this special syntax
490//!
491//! ```
492//! # #[macro_use]
493//! # extern crate lexpar;
494//! # fn main() {
495//! parse_rules! {
496//!     term: ();
497//!
498//!     my_handler: () => |iter| {
499//!         /* handler logic goes here */
500//!         Ok(())
501//!     }
502//! }
503//! # }
504//! ```
505//!
506//! The `iter` is a mutable reference to the `UnshiftableIter` that the internals use.
507//! As opposing to the normal nonterminals we need to handle result propagation from other
508//! nonterminals called in the handle and also wrap our result in `Ok` or `Err` which gives
509//! us control over the error handling.
510//!
511//! # Debugging
512//!
513//! As one of the future goals for the crate is to have a parser debug mode to know what exactly
514//! failed and where. This would lower the time spent on debugging your grammar.
515//!
516//! For now here are some approaches to making debugging more pleasant:
517//!
518//! ### Common mistakes
519//!
520//! * Forgotten commas (`,`) between rules or nonterminals
521//! * Forgotten arrows (`=>`)
522//!
523//! ### EchoIterator
524//!
525//! You can create an echo iterator with a simple map to print each term that is used
526//!
527//! ```ignore
528//! iter.map(|term| {
529//!     println!("{:?}", term);
530//!     term
531//! })
532//! ```
533//!
534//! # Examples
535//!
536//! Run the example projects with `cargo run --example <name>` where name is `ml` or `rust`.
537//!
538//! Some more examples can be found under the form of integration tests.
539//!
540
541use std::iter::Peekable;
542
543/// Common errors that occur during parsing.
544///
545/// The nonterminals generated by `parse_rules` handle these errors by default.
546#[derive(Debug, Clone, PartialEq)]
547pub enum ParseError<T> {
548    UnexpectedRoot,
549    Unexpected(T),
550    Eof,
551}
552
553/// The result type returned by any nonterminal.
554pub type Result<P, T> = ::std::result::Result<P, ParseError<T>>;
555
556/// Unshiftable interator.
557///
558/// Can unshift one element back into the iterator as the next element to be iterated.
559pub struct UnshiftIter<I> where I: Iterator {
560    iter: Peekable<I>,
561    head: Option<I::Item>
562}
563
564impl<I> From<I> for UnshiftIter<I> where I: Iterator {
565    fn from(iter: I) -> Self {
566        Self {
567            iter: iter.peekable(),
568            head: None
569        }
570    }
571}
572
573impl<I> Iterator for UnshiftIter<I> where I: Iterator {
574    type Item = I::Item;
575
576    fn next(&mut self) -> Option<Self::Item> {
577        if self.head.is_some() {
578            self.head.take()
579        } else {
580            self.iter.next()
581        }
582    }
583}
584
585impl<I> UnshiftIter<I> where I: Iterator {
586    pub fn unshift(&mut self, item: I::Item) {
587        self.head = Some(item);
588    }
589
590    pub fn peek(&mut self) -> Option<&I::Item> {
591        match self.head {
592            Some(ref item) => Some(item),
593            None => self.iter.peek()
594        }
595    }
596}
597
598/// Macro that generates a parser from a formal grammar.
599///
600/// The macro does not yet account for:
601///
602/// * Left Factoring (`A -> qB | qC`)
603/// * Left Recursion (`A -> Aq` (direct) and `A -> Bq B -> Ar` (indirect))
604///
605/// ### Crude example syntax
606///
607/// ```no_run
608/// # #[macro_use]
609/// # extern crate lexpar;
610///
611/// #[derive(Debug, PartialEq)]
612/// enum Token {
613///     LParen,
614///     RParen,
615///     Number(u32),
616///     Ident(String)
617/// }
618///
619/// # fn main() {
620/// use Token::*;
621///
622/// parse_rules! {
623///     term: Token;
624///     ident: () => {
625///         [Ident(name)] => {}
626///     },
627///     expr: () => {
628///         [LParen, ex: expr, RParen] => {},
629///         [id: ident] => {},
630///         [Number(n)] => {}
631///     },
632///     eps: () => {
633///         [@] => { /* Epsilon */ }
634///     },
635///     handle: () => |iter| Ok({ /* Custom code */ })
636/// }
637/// # }
638/// ```
639#[macro_export]
640macro_rules! parse_rules {
641    {
642        term: $term_type: ty;
643        $($nonterm_def: tt)+
644    } => {
645        parse_rules! {
646            // `iter` is necessary to be passed so that each arm has iter in its macro scope
647            @NONTERM __iter; $term_type;
648            $($nonterm_def)+
649        }
650    };
651
652    // Loop nonterms
653    //
654    // <nonterm>: <type> => { ... }
655    {
656        @NONTERM $iter: ident; $term_type: ty;
657
658        $nonterm: ident : $ret_type: ty => {
659            $( [$($rule_token: tt)*] => $logic: expr ),+
660            $(,)*
661        },
662        $($nonterm_def: tt)+
663    } => {
664        parse_rules!(@NONTERM $iter; $term_type; $nonterm : $ret_type => {
665            $( [$($rule_token)*] => $logic ),+
666        });
667
668        parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
669    };
670
671    // Loop nonterm handlers
672    //
673    // <nonterm>: <type> => |<iter>| { ... }
674    {
675        @NONTERM $iter: ident; $term_type: ty;
676
677        $nonterm: ident : $ret_type: ty => |$iter_name: ident| $logic: expr,
678        $($nonterm_def: tt)+
679    } => {
680        parse_rules!(@NONTERM $iter; $term_type; $nonterm : $ret_type => |$iter_name| $logic);
681
682        parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
683    };
684
685    // Loop folds
686    //
687    // #[fold(<acc>)] <nonterm>: <type> => { ... }
688    {
689        @NONTERM $iter: ident; $term_type: ty;
690
691        #[fold($acc: ident)]
692        $nonterm: ident : $ret_type: ty => {
693            [$($rule_token: tt)*] => $logic: expr,
694            [@] => $acc_expr: expr
695            $(,)*
696        },
697        $($nonterm_def: tt)+
698    } => {
699        parse_rules!(@NONTERM $iter; $term_type; #[fold($acc)] $nonterm : $ret_type => {
700            [$($rule_token)*] => $logic,
701            [@] => $acc_expr
702        });
703
704        parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
705    };
706
707    // Loop bin ops
708    //
709    // #[binop(<affix>)] <nonterm>: <type> => { ... }
710    {
711        @NONTERM $iter: ident; $term_type: ty;
712
713        #[binop($affix: ident)]
714        $nonterm: ident : $prim_type: ty => $primary: ident where
715        $prec_type :ty => |$lhs: ident, $rhs: ident| { $($binop_def: tt)+ },
716        $($nonterm_def: tt)+
717    } => {
718        parse_rules!(@NONTERM $iter; $term_type; #[binop($affix)]
719        $nonterm: $prim_type => $primary where $prec_type => |$lhs, $rhs| { $($binop_def)+ });
720
721        parse_rules!(@NONTERM $iter; $term_type; $($nonterm_def)+);
722    };
723
724    // Nonterm rules
725    //
726    // <nonterm>: <type> => { arms+ }
727    {
728        @NONTERM $iter: ident; $term_type: ty;
729
730        $nonterm: ident : $ret_type: ty => {
731            $( [$($rule_token: tt)*] => $logic: expr ),+
732            $(,)*
733        } $(,)*
734    } => {
735        // Disable the warning since Epsilon does not use iter
736        #[allow(unused_variables)]
737        fn $nonterm<I>($iter: &mut ::lexpar::parser::UnshiftIter<I>)
738            -> ::lexpar::parser::Result<$ret_type, $term_type>
739        where I: Iterator<Item = $term_type>
740        {
741            $(parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);)*
742
743            #[allow(unreachable_code)]
744            match $iter.peek() {
745                Some(_) => Err(::lexpar::parser::ParseError::UnexpectedRoot),
746                None => Err(::lexpar::parser::ParseError::Eof)
747            }
748        }
749    };
750
751    // Nonterm handler
752    //
753    // |iter| => expr
754    {
755        @NONTERM $iter: ident; $term_type: ty;
756
757        $nonterm: ident : $ret_type: ty => |$iter_name: ident| $logic: expr $(,)*
758    } => {
759        fn $nonterm<I>($iter_name: &mut ::lexpar::parser::UnshiftIter<I>)
760            -> ::lexpar::parser::Result<$ret_type, $term_type>
761        where I: Iterator<Item = $term_type>
762        {
763            $logic
764        }
765    };
766
767    // Fold syntax
768    //
769    // #[fold(<acc>)] <nonterm>: <type> => {
770    //     [ rules+ ] => logic,
771    //     [@] => start_acc
772    // }
773    {
774        @NONTERM $iter: ident; $term_type: ty;
775
776        #[fold($acc: ident)]
777        $nonterm: ident : $ret_type: ty => {
778            [$($rule_token: tt)*] => $logic: expr,
779            [@] => $acc_expr: expr
780            $(,)*
781        } $(,)*
782    } => {
783        #[allow(unused_variables)]
784        parse_rules!(@NONTERM $iter; $term_type; $nonterm: $ret_type => |$iter| {
785            use ::lexpar::parser::{self, UnshiftIter};
786
787            type ParserResult = parser::Result<$ret_type, $term_type>;
788
789            fn matcher_root<I>($iter: &mut UnshiftIter<I>, $acc: $ret_type) -> ParserResult
790            where I: Iterator<Item = $term_type>
791            {
792                #[allow(unused_mut)]
793                let mut $acc = $acc;
794
795                parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);
796
797                #[allow(unreachable_code)]
798                Ok($acc)
799            }
800
801            fn matcher<I>($iter: &mut UnshiftIter<I>, $acc: $ret_type, __ur: &mut bool) -> ParserResult
802            where I: Iterator<Item = $term_type>
803            {
804                #[allow(unused_mut)]
805                let mut $acc = $acc;
806
807                parse_rules!(@ROOT_RULE $iter; $term_type; $($rule_token)* => $logic);
808
809                if $iter.peek().is_some() {
810                    *__ur = true;
811                }
812
813                #[allow(unreachable_code)]
814                Ok($acc)
815            }
816
817            let mut acc = $acc_expr;
818            let mut unexpected_root = false;
819
820            macro_rules! matcher {
821                ($matcher: expr => $end_cond: expr) => {
822                    match $matcher {
823                        Ok(res) => {
824                            if $end_cond {
825                                return Ok(res);
826                            } else {
827                                acc = res;
828                            }
829                        },
830                        Err(err) => return Err(err)
831                    }
832                };
833            }
834
835            matcher!((matcher_root)($iter, acc) => $iter.peek().is_none());
836
837            loop {
838                matcher!{
839                    (matcher)($iter, acc, &mut unexpected_root) =>
840                    $iter.peek().is_none() || unexpected_root
841                }
842            }
843        });
844    };
845
846    // Infix binop syntax
847    //
848    // #[binop(infix)] <nonterm>: <type> => <primary_nonterm>
849    // where <prec_type> => |<lhs>, <rhs>| {
850    //     (<op> | <precedence> => logic),+
851    // }
852    {
853        @NONTERM $iter: ident; $term_type: ty;
854
855        #[binop($affix: ident)]
856        $nonterm: ident : $prim_type: ty => $primary: ident where
857        $prec_type: ty => |$lhs: ident, $rhs: ident| {
858            $($op: pat | $precedence: expr => $logic: expr),+
859            $(,)*
860        }
861        $(,)*
862    } => {
863        /*
864        parse_expression ()
865            return parse_binop (parse_primary (), 0)
866
867        parse_binop (lhs, min_precedence)
868            lookahead := peek next token
869            while lookahead is a binary operator whose precedence is >= min_precedence
870                op := lookahead
871                advance to next token
872                rhs := parse_primary ()
873                lookahead := peek next token
874                while lookahead is a binary operator whose precedence is greater
875                        than op's, or a right-associative operator
876                        whose precedence is equal to op's
877                    rhs := parse_binop (rhs, lookahead's precedence)
878                    lookahead := peek next token
879                lhs := the result of applying op with operands lhs and rhs
880            return lhs
881        */
882
883        parse_rules!(@NONTERM $iter; $term_type; $nonterm: $prim_type => |$iter| {
884            use ::lexpar::parser::{self, UnshiftIter};
885            use ::lexpar::parser::ParseError::*;
886
887            type ParserResult = parser::Result<$prim_type, $term_type>;
888
889            fn prec(term: &$term_type) -> Option<$prec_type> {
890                #[allow(unused_variables)]
891                match term {
892                    $($op => Some($precedence)),+,
893                    _ => None
894                }
895            }
896
897            fn eval(__term: &$term_type, $lhs: $prim_type, $rhs: $prim_type) -> $prim_type {
898                match __term {
899                    $($op => $logic),+,
900                    _ => unreachable!()
901                }
902            }
903
904            fn parse_binop<I>($iter: &mut UnshiftIter<I>, mut lhs: $prim_type, min_prec: $prec_type)
905                -> ParserResult where I: Iterator<Item = $term_type>
906            {
907                while let Some(la) = $iter.next() {
908                    match prec(&la) {
909                        Some(precedence) if precedence >= min_prec => {
910                            let op = la;
911                            let mut rhs = match $primary($iter) {
912                                Ok(rhs) => rhs,
913                                Err(Eof) | Err(UnexpectedRoot) => break,
914                                Err(err) => return Err(err)
915                            };
916
917                            while let Some(la_inner) = $iter.next() {
918                                match prec(&la_inner) {
919                                    Some(next_prec) if next_prec > precedence => {
920                                        $iter.unshift(la_inner);
921                                        rhs = match parse_binop($iter, rhs, next_prec) {
922                                            Ok(rhs) => rhs,
923                                            Err(err) => return Err(err)
924                                        };
925                                    },
926                                    _ => {
927                                        $iter.unshift(la_inner);
928                                        break;
929                                    }
930                                }
931                            }
932
933                            lhs = eval(&op, lhs, rhs);
934                        },
935                        _ => {
936                            $iter.unshift(la);
937                            break;
938                        }
939                    }
940                }
941
942                Ok(lhs)
943            }
944
945            // with the + repetition it's guaranteed that there will be at least one element
946            let min_prec = vec![ $($precedence),+ ].into_iter().min().unwrap();
947
948            let lhs = $primary($iter)?;
949            parse_binop($iter, lhs, min_prec)
950        });
951    };
952
953    // Epsilon
954    {
955        @ROOT_RULE $iter: ident; $term_type: ty;
956
957        @ => $logic: expr
958    } => {
959        return Ok($logic);
960    };
961
962    // First rule and more rules
963    {
964        @ROOT_RULE $iter: ident; $term_type: ty;
965
966        $term: pat, $($rule_token: tt)+
967    } => {
968        let item = $iter.next();
969
970        match item {
971            Some($term) => {
972                return parse_rules!(@RULE $iter; $($rule_token)+);
973            },
974            // Skip to the next branch of the nonterm
975            Some(_) => {
976                $iter.unshift(item.unwrap());
977            },
978            // Let the nonterm handle the eof
979            // This is so we can enter an Epsilon root rule if it exists
980            None => ()
981        }
982    };
983
984    // Only rule
985    {
986        @ROOT_RULE $iter: ident; $term_type: ty;
987
988        $term: pat => $logic: expr
989    } => {
990        let item = $iter.next();
991
992        match item {
993            Some($term) => {
994                return Ok($logic);
995            },
996            // Skip to the next branch of the nonterm
997            Some(_) => {
998                $iter.unshift(item.unwrap());
999            },
1000            // Let the nonterm handle the eof
1001            // This is so we can enter an Epsilon root rule if it exists
1002            None => ()
1003        }
1004    };
1005
1006    // First nonterm and more rules
1007    {
1008        @ROOT_RULE $iter: ident; $term_type: ty;
1009
1010        // A hack to allow the mut specifier
1011        $($id: ident)+ : $nonterm: expr, $($rule_token: tt)+
1012    } => {
1013        let __temp = $nonterm($iter);
1014
1015        if let Err(::lexpar::parser::ParseError::UnexpectedRoot) = __temp {
1016            // Skip to the next branch of the nonterm
1017        } else {
1018            let $($id)+ = __temp?;
1019            return parse_rules!(@RULE $iter; $($rule_token)+);
1020        }
1021    };
1022
1023    // Only nonterm
1024    {
1025        @ROOT_RULE $iter: ident; $term_type: ty;
1026
1027        // A hack to allow the mut specifier
1028        $($id: ident)+ : $nonterm: expr => $logic: expr
1029    } => {
1030        let __temp = $nonterm($iter);
1031
1032        if let Err(::lexpar::parser::ParseError::UnexpectedRoot) = __temp {
1033            // Skip to the next branch of the nonterm
1034        } else {
1035            let $($id)+ = __temp?;
1036            return Ok($logic);
1037        }
1038    };
1039
1040    // One and more rules
1041    {
1042        @RULE $iter: ident;
1043
1044        $term: pat, $($rule_token: tt)+
1045    } => {
1046        match $iter.next() {
1047            Some($term) => {
1048                parse_rules!(@RULE $iter; $($rule_token)+)
1049            },
1050            Some(u) => Err(::lexpar::parser::ParseError::Unexpected(u)),
1051            None => Err(::lexpar::parser::ParseError::Eof)
1052        }
1053    };
1054
1055    // Last rule
1056    {
1057        @RULE $iter: ident;
1058
1059        $term: pat => $logic: expr
1060    } => {
1061        match $iter.next() {
1062            Some($term) => Ok($logic),
1063            Some(u) => Err(::lexpar::parser::ParseError::Unexpected(u)),
1064            None => Err(::lexpar::parser::ParseError::Eof)
1065        }
1066    };
1067
1068    // Nonterm and more rules
1069    {
1070        @RULE $iter: ident;
1071
1072        // A hack to allow the mut specifier
1073        $($id: ident)+ : $nonterm: expr, $($rule_token: tt)+
1074    } => {
1075        {
1076            #[allow(unused_variables)]
1077            let $($id)+ = $nonterm($iter)?;
1078
1079            parse_rules!(@RULE $iter; $($rule_token)+)
1080        }
1081    };
1082
1083    // Last nonterm
1084    {
1085        @RULE $iter: ident;
1086
1087        // A hack to allow the mut specifier
1088        $($id: ident)+ : $nonterm: expr => $logic: expr
1089    } => {
1090        {
1091            #[allow(unused_variables)]
1092            let $($id)+ = $nonterm($iter)?;
1093
1094            Ok($logic)
1095        }
1096    };
1097}
lexpar/parser.rs

lexpar/
parser.rs