Skip to main content

fuel_pest_derive/
lib.rs

1// pest. The Elegant Parser
2// Copyright (c) 2018 Dragoș Tiselice
3//
4// Licensed under the Apache License, Version 2.0
5// <LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0> or the MIT
6// license <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
7// option. All files in the project carrying such notice may not be copied,
8// modified, or distributed except according to those terms.
9
10//! # pest. The Elegant Parser
11//!
12//! pest is a general purpose parser written in Rust with a focus on accessibility, correctness,
13//! and performance. It uses parsing expression grammars (or [PEG]) as input, which are similar in
14//! spirit to regular expressions, but which offer the enhanced expressivity needed to parse
15//! complex languages.
16//!
17//! [PEG]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
18//!
19//! ## Getting started
20//!
21//! The recommended way to start parsing with pest is to read the official [book].
22//!
23//! Other helpful resources:
24//!
25//! * API reference on [docs.rs]
26//! * play with grammars and share them on our [fiddle]
27//! * leave feedback, ask questions, or greet us on [Gitter]
28//!
29//! [book]: https://pest-parser.github.io/book
30//! [docs.rs]: https://docs.rs/pest
31//! [fiddle]: https://pest-parser.github.io/#editor
32//! [Gitter]: https://gitter.im/dragostis/pest
33//!
34//! ## `.pest` files
35//!
36//! Grammar definitions reside in custom `.pest` files located in the `src` directory. Their path is
37//! relative to `src` and is specified between the `derive` attribute and empty `struct` that
38//! `Parser` will be derived on.
39//!
40//! ```ignore
41//! #[derive(Parser)]
42//! #[grammar = "path/to/my_grammar.pest"] // relative to src
43//! struct MyParser;
44//! ```
45//!
46//! ## Inline grammars
47//!
48//! Grammars can also be inlined by using the `#[grammar_inline = "..."]` attribute.
49//!
50//! ## Grammar
51//!
52//! A grammar is a series of rules separated by whitespace, possibly containing comments.
53//!
54//! ### Comments
55//!
56//! Comments start with `//` and end at the end of the line.
57//!
58//! ```ignore
59//! // a comment
60//! ```
61//!
62//! ### Rules
63//!
64//! Rules have the following form:
65//!
66//! ```ignore
67//! name = optional_modifier { expression }
68//! ```
69//!
70//! The name of the rule is formed from alphanumeric characters or `_` with the condition that the
71//! first character is not a digit and is used to create token pairs. When the rule starts being
72//! parsed, the starting part of the token is being produced, with the ending part being produced
73//! when the rule finishes parsing.
74//!
75//! The following token pair notation `a(b(), c())` denotes the tokens: start `a`, start `b`, end
76//! `b`, start `c`, end `c`, end `a`.
77//!
78//! #### Modifiers
79//!
80//! Modifiers are optional and can be one of `_`, `@`, `$`, or `!`. These modifiers change the
81//! behavior of the rules.
82//!
83//! 1. Silent (`_`)
84//!
85//!     Silent rules do not create token pairs during parsing, nor are they error-reported.
86//!
87//!     ```ignore
88//!     a = _{ "a" }
89//!     b =  { a ~ "b" }
90//!     ```
91//!
92//!     Parsing `"ab"` produces the token pair `b()`.
93//!
94//! 2. Atomic (`@`)
95//!
96//!     Atomic rules do not accept whitespace or comments within their expressions and have a
97//!     cascading effect on any rule they call. I.e. rules that are not atomic but are called by atomic
98//!     rules behave atomically.
99//!
100//!     Any rules called by atomic rules do not generate token pairs.
101//!
102//!     ```ignore
103//!     a =  { "a" }
104//!     b = @{ a ~ "b" }
105//!
106//!     WHITESPACE = _{ " " }
107//!     ```
108//!
109//!     Parsing `"ab"` produces the token pair `b()`, while `"a   b"` produces an error.
110//!
111//! 3. Compound-atomic (`$`)
112//!
113//!     Compound-atomic are identical to atomic rules with the exception that rules called by them are
114//!     not forbidden from generating token pairs.
115//!
116//!     ```ignore
117//!     a =  { "a" }
118//!     b = ${ a ~ "b" }
119//!
120//!     WHITESPACE = _{ " " }
121//!     ```
122//!
123//!     Parsing `"ab"` produces the token pairs `b(a())`, while `"a   b"` produces an error.
124//!
125//! 4. Non-atomic (`!`)
126//!
127//!     Non-atomic are identical to normal rules with the exception that they stop the cascading effect
128//!     of atomic and compound-atomic rules.
129//!
130//!     ```ignore
131//!     a =  { "a" }
132//!     b = !{ a ~ "b" }
133//!     c = @{ b }
134//!
135//!     WHITESPACE = _{ " " }
136//!     ```
137//!
138//!     Parsing both `"ab"` and `"a   b"` produce the token pairs `c(a())`.
139//!
140//! #### Expressions
141//!
142//! Expressions can be either terminals or non-terminals.
143//!
144//! 1. Terminals
145//!
146//!     | Terminal   | Usage                                                          |
147//!     |------------|----------------------------------------------------------------|
148//!     | `"a"`      | matches the exact string `"a"`                                 |
149//!     | `^"a"`     | matches the exact string `"a"` case insensitively (ASCII only) |
150//!     | `'a'..'z'` | matches one character between `'a'` and `'z'`                  |
151//!     | `a`        | matches rule `a`                                               |
152//!
153//! Strings and characters follow
154//! [Rust's escape mechanisms](https://doc.rust-lang.org/reference/tokens.html#byte-escapes), while
155//! identifiers can contain alpha-numeric characters and underscores (`_`), as long as they do not
156//! start with a digit.
157//!
158//! 2. Non-terminals
159//!
160//!     | Non-terminal          | Usage                                                      |
161//!     |-----------------------|------------------------------------------------------------|
162//!     | `(e)`                 | matches `e`                                                |
163//!     | `e1 ~ e2`             | matches the sequence `e1` `e2`                             |
164//!     | <code>e1 \| e2</code> | matches either `e1` or `e2`                                |
165//!     | `e*`                  | matches `e` zero or more times                             |
166//!     | `e+`                  | matches `e` one or more times                              |
167//!     | `e{n}`                | matches `e` exactly `n` times                              |
168//!     | `e{, n}`              | matches `e` at most `n` times                              |
169//!     | `e{n,} `              | matches `e` at least `n` times                             |
170//!     | `e{m, n}`             | matches `e` between `m` and `n` times inclusively          |
171//!     | `e?`                  | optionally matches `e`                                     |
172//!     | `&e`                  | matches `e` without making progress                        |
173//!     | `!e`                  | matches if `e` doesn't match without making progress       |
174//!     | `PUSH(e)`             | matches `e` and pushes it's captured string down the stack |
175//!
176//!     where `e`, `e1`, and `e2` are expressions.
177//!
178//! Expressions can modify the stack only if they match the input. For example,
179//! if `e1` in the compound expression `e1 | e2` does not match the input, then
180//! it does not modify the stack, so `e2` sees the stack in the same state as
181//! `e1` did. Repetitions and optionals (`e*`, `e+`, `e{, n}`, `e{n,}`,
182//! `e{m,n}`, `e?`) can modify the stack each time `e` matches. The `!e` and `&e`
183//! expressions are a special case; they never modify the stack.
184//!
185//! ## Special rules
186//!
187//! Special rules can be called within the grammar. They are:
188//!
189//! * `WHITESPACE` - runs between rules and sub-rules
190//! * `COMMENT` - runs between rules and sub-rules
191//! * `ANY` - matches exactly one `char`
192//! * `SOI` - (start-of-input) matches only when a `Parser` is still at the starting position
193//! * `EOI` - (end-of-input) matches only when a `Parser` has reached its end
194//! * `POP` - pops a string from the stack and matches it
195//! * `POP_ALL` - pops the entire state of the stack and matches it
196//! * `PEEK` - peeks a string from the stack and matches it
197//! * `PEEK[a..b]` - peeks part of the stack and matches it
198//! * `PEEK_ALL` - peeks the entire state of the stack and matches it
199//! * `DROP` - drops the top of the stack (fails to match if the stack is empty)
200//!
201//! `WHITESPACE` and `COMMENT` should be defined manually if needed. All other rules cannot be
202//! overridden.
203//!
204//! ## `WHITESPACE` and `COMMENT`
205//!
206//! When defined, these rules get matched automatically in sequences (`~`) and repetitions
207//! (`*`, `+`) between expressions. Atomic rules and those rules called by atomic rules are exempt
208//! from this behavior.
209//!
210//! These rules should be defined so as to match one whitespace character and one comment only since
211//! they are run in repetitions.
212//!
213//! If both `WHITESPACE` and `COMMENT` are defined, this grammar:
214//!
215//! ```ignore
216//! a = { b ~ c }
217//! ```
218//!
219//! is effectively transformed into this one behind the scenes:
220//!
221//! ```ignore
222//! a = { b ~ WHITESPACE* ~ (COMMENT ~ WHITESPACE*)* ~ c }
223//! ```
224//!
225//! ## `PUSH`, `POP`, `DROP`, and `PEEK`
226//!
227//! `PUSH(e)` simply pushes the captured string of the expression `e` down a stack. This stack can
228//! then later be used to match grammar based on its content with `POP` and `PEEK`.
229//!
230//! `PEEK` always matches the string at the top of stack. So, if the stack contains `["b", "a"]`
231//! (`"a"` being on top), this grammar:
232//!
233//! ```ignore
234//! a = { PEEK }
235//! ```
236//!
237//! is effectively transformed into at parse time:
238//!
239//! ```ignore
240//! a = { "a" }
241//! ```
242//!
243//! `POP` works the same way with the exception that it pops the string off of the stack if the
244//! match worked. With the stack from above, if `POP` matches `"a"`, the stack will be mutated
245//! to `["b"]`.
246//!
247//! `DROP` makes it possible to remove the string at the top of the stack
248//! without matching it. If the stack is nonempty, `DROP` drops the top of the
249//! stack. If the stack is empty, then `DROP` fails to match.
250//!
251//! ### Advanced peeking
252//!
253//! `PEEK[start..end]` and `PEEK_ALL` allow to peek deeper into the stack. The syntax works exactly
254//! like Rust’s exclusive slice syntax. Additionally, negative indices can be used to indicate an
255//! offset from the top. If the end lies before or at the start, the expression matches (as does
256//! a `PEEK_ALL` on an empty stack). With the stack `["c", "b", "a"]` (`"a"` on top):
257//!
258//! ```ignore
259//! fill = PUSH("c") ~ PUSH("b") ~ PUSH("a")
260//! v = { PEEK_ALL } = { "a" ~ "b" ~ "c" }  // top to bottom
261//! w = { PEEK[..] } = { "c" ~ "b" ~ "a" }  // bottom to top
262//! x = { PEEK[1..2] } = { PEEK[1..-1] } = { "b" }
263//! y = { PEEK[..-2] } = { PEEK[0..1] } = { "a" }
264//! z = { PEEK[1..] } = { PEEK[-2..3] } = { "c" ~ "b" }
265//! n = { PEEK[2..-2] } = { PEEK[2..1] } = { "" }
266//! ```
267//!
268//! For historical reasons, `PEEK_ALL` matches from top to bottom, while `PEEK[start..end]` matches
269//! from bottom to top. There is currectly no syntax to match a slice of the stack top to bottom.
270//!
271//! ## `Rule`
272//!
273//! All rules defined or used in the grammar populate a generated `enum` called `Rule`. This
274//! implements `pest`'s `RuleType` and can be used throughout the API.
275//!
276//! ## `Built-in rules`
277//!
278//! Pest also comes with a number of built-in rules for convenience. They are:
279//!
280//! * `ASCII_DIGIT` - matches a numeric character from 0..9
281//! * `ASCII_NONZERO_DIGIT` - matches a numeric character from 1..9
282//! * `ASCII_BIN_DIGIT` - matches a numeric character from 0..1
283//! * `ASCII_OCT_DIGIT` - matches a numeric character from 0..7
284//! * `ASCII_HEX_DIGIT` - matches a numeric character from 0..9 or a..f or A..F
285//! * `ASCII_ALPHA_LOWER` - matches a character from a..z
286//! * `ASCII_ALPHA_UPPER` - matches a character from A..Z
287//! * `ASCII_ALPHA` - matches a character from a..z or A..Z
288//! * `ASCII_ALPHANUMERIC` - matches a character from a..z or A..Z or 0..9
289//! * `ASCII` - matches a character from \x00..\x7f
290//! * `NEWLINE` - matches either "\n" or "\r\n" or "\r"
291
292#![doc(html_root_url = "https://docs.rs/pest_derive")]
293extern crate pest_generator;
294extern crate proc_macro;
295
296use proc_macro::TokenStream;
297
298#[proc_macro_derive(Parser, attributes(grammar, grammar_inline))]
299pub fn derive_parser(input: TokenStream) -> TokenStream {
300    pest_generator::derive_parser(input.into(), true).into()
301}