tree_builder/
lib.rs

1//! # TreeBuilder, a lightweight, nom-based parser generator library for Rust
2//!
3//! TreeBuilder is a parser generator focused on the correct parsing of context-free grammars
4//! from ASCII input strings. It generates recursive-descent, backtracking parsers with as little code
5//! as possible and with convenient features. The parsers it generates aim to be very easy to use and
6//! to be compatable with Nom parsers if the user wishes to use Nom together with
7//! TreeBuilder. TreeBuilder will not be supporting left-recursive grammars in this version.
8//!
9//! TreeBuilder's [Parser] trait shows the type signature of all
10//! TreeBuilder-generated parsers.
11//!
12//! Example:
13//!
14//! ```rust
15//! use tree_builder::{build_tree, Parser};
16//!
17//! build_tree! {
18//!     // Parses a Hex color
19//!     HexColor #=> "#",
20//!                  HexDigit,
21//!                  HexDigit?, HexDigit?, HexDigit?, HexDigit?, HexDigit?;
22//!
23//!     // Parses a Hex digit
24//!     HexDigit #=> [0-9a-fA-F];
25//! }
26//!
27//! fn main() {
28//!     // Will pass
29//!     HexColor::parse("#abd325").unwrap();
30//!     HexColor::parse("#asd000").unwrap();
31//!     // Will fail
32//!     HexColor::parse("#whatever").unwrap();
33//! }
34//! ```
35//!
36//! ## Kinds of rules
37//!
38//! TreeBuilder can either generate parsers whose output data structure is just
39//! a String, or parsers whose output data structure is a more complicated tuple
40//! type. If you use the arrow **#=>** to separate the rule name from it's
41//! definition, you have created a lexical rule which outputs String, but if you
42//! use **=>**, you have the option to specify what parts of a rule definition
43//! you wish to keep.
44//!
45//! You specify the parts you wish to keep using the **@** operator (called
46//! Include). Data structures for parsers are based on the includes you have
47//! made, from left to right.
48//!
49//! Let's take for example the rules Arraya and ArrayElems which parse JSON
50//! arrays below:
51//!
52//! ```rust
53//! build_tree!{
54//!     JValue => /* Definition of a JSON value */;
55//!
56//!     Array => "[", #s*, @ArrayElems?, #s*, "]";
57//!     ArrayElems => @JValue, #s*, @(",", #s*, @JValue, #s*)*;
58//! }
59//! ```
60//!
61//! The data structures generated for the rules above are:
62//!
63//! ```rust
64//! struct Array(Optional<Box<ArrayElems>>);
65//!
66//! struct ArrayElems(Box<JValue>, Vec<Box<JValue>>);
67//! ```
68//!
69//! When you use alternations, the data structures generated are enums, whose
70//! variant names are either inferred when you only include a single
71//! non-terminal in an alternation, or need to be specified.
72//!
73//! Alternation rule example:
74//!
75//! ```rust
76//! build_tree!{
77//! // Rule representing a JSON value
78//! JValue => @JString
79//!        |  @Number
80//!        |  @Object
81//!        |  @Array
82//!        |  "true" <True>
83//!        |  "false" <False>
84//!        |  "null" <Null>;
85//! }
86//! ```
87//!
88//! And the data structure generated by this rule:
89//!
90//! ```rust
91//! enum JValue {
92//!     JString(Box<JString>),
93//!     Number(Box<Number>),
94//!     Object(Box<Object>),
95//!     Array(Box<Array>),
96//!     True(),
97//!     False(),
98//!     Null(),
99//! }
100//! ```
101//!
102//! Keep in mind that in the rules denoted by **=>**, you cannot use
103//! alternations inside of groupings at the moment. This limitation will
104//! probably be fixed in the next update.
105//!
106//! ## TreeBuilder's language
107//!
108//! TreeBuilder allows for the specification of grammars which use terminals,
109//! non-terminals, metacharacters and groupings/sub-parsers. Different operators
110//! can be applied to the aforementioned elements which define how many times
111//! one of them may be repeated.
112//!
113//! ### Available terms:
114//!
115//! 1. **Terminals**: Terminals are written just like strings in any C-like
116//!    language. They are defined as a string of characters which can be either
117//!    any character except ’\’ and ’”’, or a character escape sequence such as
118//!    ’\n’, delimited by double quotes.
119//!    The possible escape sequences are:
120//!      * \n (newline)
121//!      * \r (carriage return)
122//!      * \t (tab)
123//!      * \b (non-destructive backspace)
124//!      * \f (form feed)
125//!      * \\ (backslash)
126//!      * \” (double quote)
127//!
128//!    Some examples of terminals:
129//!      * ”for”
130//!      * ”if”
131//!      * ”a123\nb!?.\tc\””
132//!
133//! 2. **Metacharacters**: A metacharacter is a special character that carries a
134//!    specific meaning or functionality within a grammar specification. Unlike
135//!    terminals, which parse their string equivalent, metacharacters have
136//!    special interpretations and serve as building blocks for constructing
137//!    powerful and elegant parser rules with as little code as possible. Here
138//!    are the metacharacters and their meanings:
139//!
140//!    * **. (Dot)**: Matches any single character except a newline. It acts as a wildcard, repre-
141//!      senting any character in the input string.
142//!    * **\[ \] (Character Class)**: Matches any single character specified inside the square brack-
143//!    ets. For example, \[aeiou\] matches any vowel character.
144//!    * **[^ ] (Negated Character Class)**: Matches any single character that is not specified
145//!    inside the square brackets. For example, \[^0-9\] matches any non-digit character.
146//!    * **#d**: Matches any digit character. It is equivalent to the character class \[0-9\]. For
147//!    example, #d would match any digit from 0 to 9.
148//!    * **#D**: Matches any non-digit character. It is equivalent to the character class \[^0-9\].
149//!    For example, #D would match any character which isn’t a digit.
150//!    * **#w**: Matches any word character. It includes alphanumeric characters (a-z, A-Z, and
151//!    0-9) as well as the underscore (_) character. It is equivalent to the character class
152//!    \[a-zA-Z0-9_\].
153//!    * **#W**: Matches any non-word character. It excludes alphanumeric characters (a-z, A-
154//!    Z, and 0-9) as well as the underscore (_) character, matching anything else. It is
155//!    equivalent to the character class \[^a-zA-Z0-9_\].
156//!    * **#s**: Matches any ASCII whitespace character, so matches any one of \f, \n, \r, \t.
157//!
158//! 3. **Nonterminals**: Their syntax is exactly the same as ASCII Rust identifiers. A string be-
159//!    ginning with either an alphabetic character or an underscore, followed by a string of either
160//!    alphanumerics, an underscore, or both. Their use in the definition of grammars is to spec-
161//!    ify that another rule is part of the rule you are currently specifying. So the parser which
162//!    will be generated from a rule which uses a nonterminal will apply the parser related to the
163//!    nonterminal at the point specified by the user.
164//!
165//! 4. **Groupings**: grouping refers to the act of enclosing a sequence of elements or subexpres-
166//!    sions within parentheses. It allows for establishing precedence and controlling the evalu-
167//!    ation order of elements within a grammar specification. These are grouped as one of the
168//!    smaller elements of the language because other operators can be applied groupings just like
169//!    they can be applied to the other terms.
170//!
171//! ### Available operators:
172//!
173//! | Operator | Description        |
174//! |----------|--------------------|
175//! | e \| e   | Alternation        |
176//! | @e       | Include            |
177//! | e?       | Optional           |
178//! | e*       | Zero-or-more       |
179//! | e+       | One-or-more        |
180//!
181//! Alternations in TreeBuilder are ordered-choice ones, meaning that out of all
182//! the alternatives that you may define for a rule, only one of the
183//! alternatives can succeed. This occurs because TreeBuilder will try
184//! alternatives from the one defined first all the way to the last one in
185//! order, and if any of the alternatives successfully parses for an input
186//! string, the alternations after it are ignored completely.
187//!
188
189pub use __private::nom::error::convert_error;
190/// Accepts a single grammar rule. This macro will only generate a parser for
191/// a data structure, it expects you to supply with a data structure yourself.
192/// This was added to allow for custom derives and visibility modifiers for data
193/// structures.
194///
195/// Example:
196/// ```
197/// use tree_builder::ast_parser_maker;
198///
199/// #[derive(Clone, Debug)]
200/// pub struct Digit(String);
201///
202/// ast_parser_maker!{
203///     Digit => @#d*
204/// }
205/// ```
206pub use tree_builder_macro::ast_parser_maker;
207/// Accepts grammar specifications separated by semicolons. This macro will generate
208/// parsers and data structures for all the grammar specifications inputted to it.
209///
210/// Example:
211/// ```
212/// use tree_builder::build_tree;
213///
214/// build_tree!{
215///     Digit => #d;
216///     Letter => [a-zA-Z];
217/// }
218/// ```
219pub use tree_builder_macro::build_tree;
220/// Accepts a single grammar rule. This macro will generate
221/// a parser and data structure for the grammar specification inputted to it.
222///
223/// Example:
224/// ```
225/// use tree_builder::rule;
226///
227/// rule!{
228///     Digit => #d
229/// }
230/// ```
231pub use tree_builder_macro::rule;
232/// Module to allow for the access of Nom and other dependencies by the
233/// generated parsers
234pub mod __private;
235/// Module in which new parsers written for this library are kept
236pub mod public_parsers;
237
238/// Trait which specifies the function "parse", the function which gets
239/// generated by TreeBuilder.
240pub trait Parser {
241    fn parse(input: &str) -> nom::IResult<&str, Box<Self>, nom::error::VerboseError<&str>>;
242}
tree_builder/lib.rs

tree_builder/
lib.rs