1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
//! All the usual imports that the user will need to use.
//!
//! This is not the same as the prelude used by the generated code. Users should this this module,
//! (e.g. `use parce::prelude::*;`), while the generated code uses the internal prelude,
//! (e.g. `use parce::internal_prelude::*;`).
/// Generates a lexer enum and implements the [Lexer](crate::lexer::Lexer) trait for it.
///
/// Must be applied to an enum with patterns as the discriminants. Requires name of lexer to be
/// passed as argument. You will rarely (if ever) need to use the lexer directly,
/// and you can find those docs on the Lexer trait in the main crate.
///
/// This macro also generates a ParserSubmission struct, which has to be publicly accessible to the user,
/// but they shouldn't use it because it is only for internal use (it registers parsers through the
/// inventory crate).
///
/// # Examples
///
/// ## Basic Example
///
/// This lexer matches directly onto the characters 'a', 'b', and the words "Hello World!". Some valid inputs
/// to this lexer would be "ab", "aHello World!b", "abbabbbbaaababHello World!", etc. Notice how whitespace
/// will not be valid unless explicitly allowed inside one of the rules.
///
/// ```
/// use parce::prelude::*; // omitted in future examples
/// #[lexer(BasicLexer)]
/// enum BasicLexemes {
/// A = "'a'",
/// B = 'b',
/// HelloWorld = " 'Hello World!' "
/// }
///
/// assert!(BasicLexer::default().lex("aHello World!b").is_ok());
/// ```
///
/// Typically, all patterns must be surrounded by "double quotes", and all literal strings inside the patterns are
/// surrounded by 'single quotes'. *However* in the special case where the lexeme pattern is a single character,
/// the double quotes can be omitted (as in the B lexeme above).
///
/// ## Or operator
///
/// Lexemes can match on multiple different character sequences. One way to specify that is through
/// the pipe ( | ) "or" operator:
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(OrLexer)]
/// enum OrExampleLexemes {
/// Hello = " 'world' | 'there' ",
/// ValOrVar = " 'va' ('l' | 'r') ",
/// }
///
/// assert!(OrLexer::default().lex("world").is_ok());
/// assert!(OrLexer::default().lex("var").is_ok());
/// ```
///
/// The Hello lexeme would match on either "world" or "there", and the ValOrVar lexeme would match on either
/// "val" or "var". Notice the use of parenthesis in the ValOrVar lexeme.
///
/// ## Character Classes
///
/// In cases where you want to match on many possible characters, using | can be extremely tedious. Instead,
/// you can use character classes like in regex. In fact, any character classes you use are forwarded
/// directly to the [regex] crate, so you can use all of the features available there. See [their docs](https://docs.rs/regex/1.5.4/regex/index.html#character-classes).
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(ClassLexer)]
/// enum ClassLexemes {
/// Digit = " [0-9] ",
/// Alpha = " [[:alpha:]] "
/// }
/// ```
///
/// ## Repetition Operators
///
/// You can require or allow repetitions with the usual operators:
/// - `*`: Any number
/// - `+`: one or more
/// - `?`: one or one
/// - `{n}`: exactly n
/// - `{n,}`: n or more
/// - `{n,m}`: between n and m (inclusive)
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(RepeatLexer)]
/// enum RepeatLexemes {
/// AB = " 'a' ' '* 'b' ",
/// Integer = " [0-9]+ ",
/// // ...etc
/// }
/// ```
///
/// ## Skipped lexemes
///
/// Some lexemes can be skipped, meaning that if they are matched and would be added to the lexemes
/// output, the lexer instead just moves on. This is helpful for whitespace:
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(SkipLexer)]
/// enum SkipLexemes {
/// #[skip] Whitespace = " [ \n\r\t] ",
/// A = 'a',
/// B = 'b'
/// }
/// ```
///
/// This lexer will act like all spaces, tabs, newlines, and carraige returns were removed from the input
/// string. *However* if, during parsing, you capture a portion of the input string in the output,
/// any characters associated with skipped lexemes inside that portion will still be present.
///
/// ## Nested Lexemes
///
/// Lexemes can require other lexemes. This can be helpful to avoid code reuse.
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(NestLexer)]
/// enum NestLexemes {
/// Integer = " [0-9]+ ",
/// Float = " Integer? '.' Integer | Integer '.' "
/// }
/// assert!(NestLexer::default().lex("0.1").is_ok());
/// assert!(NestLexer::default().lex(".1").is_ok());
/// assert!(NestLexer::default().lex("0.").is_ok());
/// ```
///
/// Recursion is usually fine, but left recursive lexemes will not lex, and instead will stack overflow.
/// Your lexers are not checked for left recursion. An example is:
///
/// ```should_panic
/// # use parce::prelude::*;
/// #[lexer(BadLexer)]
/// enum BadLexemes {
/// RecurseLeft = " 'a' | RecurseLeft 'b' ", // <- don't be like this guy
/// }
///
/// // PANICS due to stack overflow!
/// BadLexer::default().lex("baaa");
/// ```
///
/// Technically this is valid syntax, and defines an unambiguous lexer, but in practice it will get stuck
/// in a loop.
///
/// ## Lexeme fragments
///
/// Lexeme fragments are lexemes that will not be matched on their own, but can be nested into other lexemes.
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(FragmentLexer)]
/// enum FragmentLexemes {
/// #[frag] FragA = 'a',
/// AB = " FragA 'b' "
/// }
///
/// assert!(FragmentLexer::default().lex("ab").is_ok());
/// assert!(FragmentLexer::default().lex("a").is_err());
/// ```
///
/// Adding `#[skip]` to a lexeme that is already `#[frag]` will do nothing, and it will behave like a
/// normal fragment.
///
/// ## Modal Lexers
///
/// Lexers can have multiple modes. So far, all of the examples have been single-mode lexers, but you
/// can have as many modes as you like. First, you declare the modes you want to use above the enum.
/// The first mode in the list is the default mode that your lexer will start in.
///
/// Then, for each lexeme, you can assign it to one or modes, and it will only be matched if the lexer
/// is in one of those modes. Also, upon matching a given lexeme you can set the lexer to another mode
/// (which takes effect *after* the current lexeme is matched).
///
/// The mode attribute applies to the lexeme it is applied to and all further lexemes until you
/// use another mode attribute. Lexemes at the top of the enum that don't have a mode applied to them
/// will be in the default mode.
///
/// ```
/// # use parce::prelude::*;
/// #[lexer(ModalLexer)]
/// #[modes(Mode0, Mode1)] // Mode0 is the default
/// enum ModalLexemes {
/// A = 'a', // only in Mode0
/// #[set_mode(Mode1)] B = 'b', // only in Mode0; sets mode to Mode1 after being matched
///
/// #[mode(Mode1)]
/// C = 'c', // only in Mode1
/// #[set_mode(Mode0)] D = 'd', // only in Mode1; sets mode back to Mode0 after being matched
///
/// #[mode(Mode0, Mode1)]
/// E = 'e', // matched in either mode
/// #[skip] Whitespace = " [ \n\t\r] " // matched in either mode, and skipped
/// }
///
/// assert!(ModalLexer::default().lex("aeab ccecd ab cd").is_ok());
/// // Modes: 1--- 2---- 1- 2-
///
/// assert!(ModalLexer::default().lex("aaa c").is_err())
/// // Modes: 1-- x fails because mode was not set to Mode1 before the c.
/// ```
///
/// Applying `#[mode]` or `#[set_mode]` to a fragment lexeme will do nothing. Fragments do not have modes,
/// they can be used in any mode that has a lexeme that requires them. They also cannot set a new mode
/// because they are never matched directly.
pub use lexer;
/// Generates an implementation of the [Parseable](crate::parser::Parseable) and [FromStr](std::str::FromStr)
/// traits for an enum.
///
/// Must be applied to an enum with patterns as the discriminants. Requires name of *lexer* to be
/// passed as argument.
///
/// # Examples
///
/// ## Basic Example
///
///
pub use parser;
pub use crateLexer;
pub use crateParse;
pub use crateParseCompletion;
pub use crate;