fuel_pest_derive/lib.rs
1// pest. The Elegant Parser
2// Copyright (c) 2018 Dragoș Tiselice
3//
4// Licensed under the Apache License, Version 2.0
5// <LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0> or the MIT
6// license <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
7// option. All files in the project carrying such notice may not be copied,
8// modified, or distributed except according to those terms.
9
10//! # pest. The Elegant Parser
11//!
12//! pest is a general purpose parser written in Rust with a focus on accessibility, correctness,
13//! and performance. It uses parsing expression grammars (or [PEG]) as input, which are similar in
14//! spirit to regular expressions, but which offer the enhanced expressivity needed to parse
15//! complex languages.
16//!
17//! [PEG]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
18//!
19//! ## Getting started
20//!
21//! The recommended way to start parsing with pest is to read the official [book].
22//!
23//! Other helpful resources:
24//!
25//! * API reference on [docs.rs]
26//! * play with grammars and share them on our [fiddle]
27//! * leave feedback, ask questions, or greet us on [Gitter]
28//!
29//! [book]: https://pest-parser.github.io/book
30//! [docs.rs]: https://docs.rs/pest
31//! [fiddle]: https://pest-parser.github.io/#editor
32//! [Gitter]: https://gitter.im/dragostis/pest
33//!
34//! ## `.pest` files
35//!
36//! Grammar definitions reside in custom `.pest` files located in the `src` directory. Their path is
37//! relative to `src` and is specified between the `derive` attribute and empty `struct` that
38//! `Parser` will be derived on.
39//!
40//! ```ignore
41//! #[derive(Parser)]
42//! #[grammar = "path/to/my_grammar.pest"] // relative to src
43//! struct MyParser;
44//! ```
45//!
46//! ## Inline grammars
47//!
48//! Grammars can also be inlined by using the `#[grammar_inline = "..."]` attribute.
49//!
50//! ## Grammar
51//!
52//! A grammar is a series of rules separated by whitespace, possibly containing comments.
53//!
54//! ### Comments
55//!
56//! Comments start with `//` and end at the end of the line.
57//!
58//! ```ignore
59//! // a comment
60//! ```
61//!
62//! ### Rules
63//!
64//! Rules have the following form:
65//!
66//! ```ignore
67//! name = optional_modifier { expression }
68//! ```
69//!
70//! The name of the rule is formed from alphanumeric characters or `_` with the condition that the
71//! first character is not a digit and is used to create token pairs. When the rule starts being
72//! parsed, the starting part of the token is being produced, with the ending part being produced
73//! when the rule finishes parsing.
74//!
75//! The following token pair notation `a(b(), c())` denotes the tokens: start `a`, start `b`, end
76//! `b`, start `c`, end `c`, end `a`.
77//!
78//! #### Modifiers
79//!
80//! Modifiers are optional and can be one of `_`, `@`, `$`, or `!`. These modifiers change the
81//! behavior of the rules.
82//!
83//! 1. Silent (`_`)
84//!
85//! Silent rules do not create token pairs during parsing, nor are they error-reported.
86//!
87//! ```ignore
88//! a = _{ "a" }
89//! b = { a ~ "b" }
90//! ```
91//!
92//! Parsing `"ab"` produces the token pair `b()`.
93//!
94//! 2. Atomic (`@`)
95//!
96//! Atomic rules do not accept whitespace or comments within their expressions and have a
97//! cascading effect on any rule they call. I.e. rules that are not atomic but are called by atomic
98//! rules behave atomically.
99//!
100//! Any rules called by atomic rules do not generate token pairs.
101//!
102//! ```ignore
103//! a = { "a" }
104//! b = @{ a ~ "b" }
105//!
106//! WHITESPACE = _{ " " }
107//! ```
108//!
109//! Parsing `"ab"` produces the token pair `b()`, while `"a b"` produces an error.
110//!
111//! 3. Compound-atomic (`$`)
112//!
113//! Compound-atomic are identical to atomic rules with the exception that rules called by them are
114//! not forbidden from generating token pairs.
115//!
116//! ```ignore
117//! a = { "a" }
118//! b = ${ a ~ "b" }
119//!
120//! WHITESPACE = _{ " " }
121//! ```
122//!
123//! Parsing `"ab"` produces the token pairs `b(a())`, while `"a b"` produces an error.
124//!
125//! 4. Non-atomic (`!`)
126//!
127//! Non-atomic are identical to normal rules with the exception that they stop the cascading effect
128//! of atomic and compound-atomic rules.
129//!
130//! ```ignore
131//! a = { "a" }
132//! b = !{ a ~ "b" }
133//! c = @{ b }
134//!
135//! WHITESPACE = _{ " " }
136//! ```
137//!
138//! Parsing both `"ab"` and `"a b"` produce the token pairs `c(a())`.
139//!
140//! #### Expressions
141//!
142//! Expressions can be either terminals or non-terminals.
143//!
144//! 1. Terminals
145//!
146//! | Terminal | Usage |
147//! |------------|----------------------------------------------------------------|
148//! | `"a"` | matches the exact string `"a"` |
149//! | `^"a"` | matches the exact string `"a"` case insensitively (ASCII only) |
150//! | `'a'..'z'` | matches one character between `'a'` and `'z'` |
151//! | `a` | matches rule `a` |
152//!
153//! Strings and characters follow
154//! [Rust's escape mechanisms](https://doc.rust-lang.org/reference/tokens.html#byte-escapes), while
155//! identifiers can contain alpha-numeric characters and underscores (`_`), as long as they do not
156//! start with a digit.
157//!
158//! 2. Non-terminals
159//!
160//! | Non-terminal | Usage |
161//! |-----------------------|------------------------------------------------------------|
162//! | `(e)` | matches `e` |
163//! | `e1 ~ e2` | matches the sequence `e1` `e2` |
164//! | <code>e1 \| e2</code> | matches either `e1` or `e2` |
165//! | `e*` | matches `e` zero or more times |
166//! | `e+` | matches `e` one or more times |
167//! | `e{n}` | matches `e` exactly `n` times |
168//! | `e{, n}` | matches `e` at most `n` times |
169//! | `e{n,} ` | matches `e` at least `n` times |
170//! | `e{m, n}` | matches `e` between `m` and `n` times inclusively |
171//! | `e?` | optionally matches `e` |
172//! | `&e` | matches `e` without making progress |
173//! | `!e` | matches if `e` doesn't match without making progress |
174//! | `PUSH(e)` | matches `e` and pushes it's captured string down the stack |
175//!
176//! where `e`, `e1`, and `e2` are expressions.
177//!
178//! Expressions can modify the stack only if they match the input. For example,
179//! if `e1` in the compound expression `e1 | e2` does not match the input, then
180//! it does not modify the stack, so `e2` sees the stack in the same state as
181//! `e1` did. Repetitions and optionals (`e*`, `e+`, `e{, n}`, `e{n,}`,
182//! `e{m,n}`, `e?`) can modify the stack each time `e` matches. The `!e` and `&e`
183//! expressions are a special case; they never modify the stack.
184//!
185//! ## Special rules
186//!
187//! Special rules can be called within the grammar. They are:
188//!
189//! * `WHITESPACE` - runs between rules and sub-rules
190//! * `COMMENT` - runs between rules and sub-rules
191//! * `ANY` - matches exactly one `char`
192//! * `SOI` - (start-of-input) matches only when a `Parser` is still at the starting position
193//! * `EOI` - (end-of-input) matches only when a `Parser` has reached its end
194//! * `POP` - pops a string from the stack and matches it
195//! * `POP_ALL` - pops the entire state of the stack and matches it
196//! * `PEEK` - peeks a string from the stack and matches it
197//! * `PEEK[a..b]` - peeks part of the stack and matches it
198//! * `PEEK_ALL` - peeks the entire state of the stack and matches it
199//! * `DROP` - drops the top of the stack (fails to match if the stack is empty)
200//!
201//! `WHITESPACE` and `COMMENT` should be defined manually if needed. All other rules cannot be
202//! overridden.
203//!
204//! ## `WHITESPACE` and `COMMENT`
205//!
206//! When defined, these rules get matched automatically in sequences (`~`) and repetitions
207//! (`*`, `+`) between expressions. Atomic rules and those rules called by atomic rules are exempt
208//! from this behavior.
209//!
210//! These rules should be defined so as to match one whitespace character and one comment only since
211//! they are run in repetitions.
212//!
213//! If both `WHITESPACE` and `COMMENT` are defined, this grammar:
214//!
215//! ```ignore
216//! a = { b ~ c }
217//! ```
218//!
219//! is effectively transformed into this one behind the scenes:
220//!
221//! ```ignore
222//! a = { b ~ WHITESPACE* ~ (COMMENT ~ WHITESPACE*)* ~ c }
223//! ```
224//!
225//! ## `PUSH`, `POP`, `DROP`, and `PEEK`
226//!
227//! `PUSH(e)` simply pushes the captured string of the expression `e` down a stack. This stack can
228//! then later be used to match grammar based on its content with `POP` and `PEEK`.
229//!
230//! `PEEK` always matches the string at the top of stack. So, if the stack contains `["b", "a"]`
231//! (`"a"` being on top), this grammar:
232//!
233//! ```ignore
234//! a = { PEEK }
235//! ```
236//!
237//! is effectively transformed into at parse time:
238//!
239//! ```ignore
240//! a = { "a" }
241//! ```
242//!
243//! `POP` works the same way with the exception that it pops the string off of the stack if the
244//! match worked. With the stack from above, if `POP` matches `"a"`, the stack will be mutated
245//! to `["b"]`.
246//!
247//! `DROP` makes it possible to remove the string at the top of the stack
248//! without matching it. If the stack is nonempty, `DROP` drops the top of the
249//! stack. If the stack is empty, then `DROP` fails to match.
250//!
251//! ### Advanced peeking
252//!
253//! `PEEK[start..end]` and `PEEK_ALL` allow to peek deeper into the stack. The syntax works exactly
254//! like Rust’s exclusive slice syntax. Additionally, negative indices can be used to indicate an
255//! offset from the top. If the end lies before or at the start, the expression matches (as does
256//! a `PEEK_ALL` on an empty stack). With the stack `["c", "b", "a"]` (`"a"` on top):
257//!
258//! ```ignore
259//! fill = PUSH("c") ~ PUSH("b") ~ PUSH("a")
260//! v = { PEEK_ALL } = { "a" ~ "b" ~ "c" } // top to bottom
261//! w = { PEEK[..] } = { "c" ~ "b" ~ "a" } // bottom to top
262//! x = { PEEK[1..2] } = { PEEK[1..-1] } = { "b" }
263//! y = { PEEK[..-2] } = { PEEK[0..1] } = { "a" }
264//! z = { PEEK[1..] } = { PEEK[-2..3] } = { "c" ~ "b" }
265//! n = { PEEK[2..-2] } = { PEEK[2..1] } = { "" }
266//! ```
267//!
268//! For historical reasons, `PEEK_ALL` matches from top to bottom, while `PEEK[start..end]` matches
269//! from bottom to top. There is currectly no syntax to match a slice of the stack top to bottom.
270//!
271//! ## `Rule`
272//!
273//! All rules defined or used in the grammar populate a generated `enum` called `Rule`. This
274//! implements `pest`'s `RuleType` and can be used throughout the API.
275//!
276//! ## `Built-in rules`
277//!
278//! Pest also comes with a number of built-in rules for convenience. They are:
279//!
280//! * `ASCII_DIGIT` - matches a numeric character from 0..9
281//! * `ASCII_NONZERO_DIGIT` - matches a numeric character from 1..9
282//! * `ASCII_BIN_DIGIT` - matches a numeric character from 0..1
283//! * `ASCII_OCT_DIGIT` - matches a numeric character from 0..7
284//! * `ASCII_HEX_DIGIT` - matches a numeric character from 0..9 or a..f or A..F
285//! * `ASCII_ALPHA_LOWER` - matches a character from a..z
286//! * `ASCII_ALPHA_UPPER` - matches a character from A..Z
287//! * `ASCII_ALPHA` - matches a character from a..z or A..Z
288//! * `ASCII_ALPHANUMERIC` - matches a character from a..z or A..Z or 0..9
289//! * `ASCII` - matches a character from \x00..\x7f
290//! * `NEWLINE` - matches either "\n" or "\r\n" or "\r"
291
292#![doc(html_root_url = "https://docs.rs/pest_derive")]
293extern crate pest_generator;
294extern crate proc_macro;
295
296use proc_macro::TokenStream;
297
298#[proc_macro_derive(Parser, attributes(grammar, grammar_inline))]
299pub fn derive_parser(input: TokenStream) -> TokenStream {
300 pest_generator::derive_parser(input.into(), true).into()
301}