1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
//! # Grammar Reference
//! ```text
//! "abc" '-'? hex+ | _[3] & !"abc" & 'a'..'z' | (str: String = '"' !'"'* '"') !~','
//! ```
//! The `gramex` grammar syntax is inspired by standard metasyntax languages (notably **Wirth syntax notation (WSN)**) and regular expressions, but with a native Rust flavor.
//!
//! The `gramex` grammar works on a stream of tokens, this stream must implement the [`MatchAble`](`crate::MatchAble`) trait.
//!
//! `gramex` is composed of expressions that define patterns to match against that stream.
//!
//! `gramex` features the following expressions:
//! - **[`unit`](#unit)**: A matcher with modifiers.
//! - **[`range`](#range)**: Matches a token against an inclusive range.
//! - **[`sequence`](#sequence)**: Matches a sequence of expressions.
//! - **[`or`](#or)**: Matches exactly one of the given expressions.
//! - **[`and`](#and)**: Matches multiple expressions against the exact same input.
//! - **[`imply`](#imply)**: matches an expression if a codition expression matches.
//! - **[`capture`](#capture)**: Matches a section and captures its value.
//!
//! **Precedence:** `capture` > `range` > `unit` > `and` > `sequence` > `imply` > `or`.
//!
//! # Atoms
//! ```text
//! 'a' 1 "abc" _
//! ident path::to::matcher list<alpha+, ','>
//! { value.field } { |value, ind, status| ... } ( 'a' 'b' 'c' )
//! ```
//! Atoms are the fundamental units of the grammar, they perform the actual token-to-token matching.
//!
//! Atoms primarily resolve into values for which the stream implements [`MatchBy`](crate::MatchBy).
//!
//! Atoms come in the following types:
//! #### Literal
//! Any [literal](https://doc.rust-lang.org/stable/reference/tokens.html#literals) (strings, numbers, booleans, floats) that the stream implements [`MatchBy`](crate::MatchBy) for.
//! ```rust
//! # use crate::{matches};
//! assert!(matches!("abc": str, 'a' "bc"));
//! assert!(!matches!("cba": str, "abc"));
//! ```
//! #### Path
//! A [path](https://doc.rust-lang.org/stable/reference/paths.html#simple-paths) to a matching item (constant, static, function, or unit struct).
//! ```rust
//! # use crate::*;
//! mod example {
//! pub const PAT: &str = "b";
//! }
//! static PAT1: &str = "a";
//! let pat2 = "c";
//! assert!(matches!("abc": str, PAT1 example::PAT pat2));
//! ```
//! #### any (`_`)
//! matches any single token. Under the hood, this calls [`MatchAble::skip_1`](crate::MatchAble::skip_1).
//! #### Block
//! A [block expression](https://doc.rust-lang.org/stable/reference/expressions/block-expr.html) that resolves to a matching value. It is evaluated on each iteration.
//! ```rust
//! # use crate::*;
//! let pats = ['a', 'b'];
//! assert!(matches!("abc": str, { pats[0] } { pats[1] } { a(|v| v == 'c') } ));
//! ```
//! #### Group
//! An expression wrapped inside parentheses.
//! ```rust
//! # use crate::*;
//! assert!(matches!("ab123": str, ('a'..'z')+ ('0'..'9' | '-')*));
//! ```
//! #### Call
//! Matches using a [parameterized matcher](glossary#parameterized-matcher), called with a set of matchers created from the passed expressions.
//! ```rust
//! # use crate::*;
//! assert!(matches!("a,b,c": str, list<alpha+, ','>));
//! ```
//!
//! # Unit
//! ```text
//! "abc" !'a' ~"dec" hex? !('a'..'z')[3..5]
//! ```
//! A unit is an [`atom`](#atom) combined with modifiers.
//!
//! ### Modifiers
//! Modifiers are operators prefixed to the atom that change the behavior of it.
//!
//! #### Not (`!`)
//! Matches exactly one token if the atom doesn't match.
//!
//! It consumes exactly one token upon success, even if the negated pattern spans multiple tokens.
//!
//! It will fail on incomplete input.
//! ```rust
//! # use crate::*;
//! assert!(matches!("b": str, !'a'));
//! assert!(matches!("b": str, !"abc"));
//! assert!(!matches!("abc": str, !"abc"));
//! ```
//! #### Near (`~`)
//! Matches an atom without advancing the stream.
//!
//! Can be prefixed with the `!` modifier to invert the result of the lookahead (does not fail on incomplete input).
//! ```rust
//! # use crate::*;
//! assert!(matches!("abc": str, ~'a' ~"abc" !~"dec" _[3]));
//! ```
//!
//! ### Repetition
//! Repetition are operators suffixed to the atom that specifies how many times it should be matched.
//!
//! #### Optional (`?`)
//! Matches the atom 0 or 1 time.
//! ```rust
//! # use crate::*;
//! assert!(matches!("ab": str, 'a'? 'b' 'c'?));
//! ```
//! #### Multi (`*`)
//! Matches the atom 0 or more times.
//! ```rust
//! # use crate::*;
//! assert!(matches!("aaac": str, 'a'* 'b'* 'c'*));
//! ```
//! #### Plus (`+`)
//! Matches the atom 1 or more times.
//! ```rust
//! # use crate::*;
//! assert!(matches!("aaab": str, 'a'+ 'b'+));
//! assert!(!matches!("aaac": str, 'a'+ 'b'+ 'c'+));
//! ```
//! #### Exact (`[count]`)
//! Matches the atom exactly `count` times.
//! ```rust
//! # use crate::*;
//! assert!(matches!("aaabb": str, 'a'[3] 'b'[2]));
//! assert!(!matches!("a": str, 'a'[2]));
//! assert!(!matches!("aaa": str, 'a'[2]));
//! ```
//! #### Range (`[min..max]`)
//! Matches the atom between `min` and `max` (inclusive) times.
//!
//! `min` and `max` are optional, they default to `0` and infinity, respectively.
//! ```rust
//! # use crate::*;
//! assert!(matches!("aaabbcccc": str, 'a'[2..4] 'b'[..3] 'c'[3..]));
//! assert!(!matches!("a": str, 'a'[2..]));
//! assert!(!matches!("aaaaa": str, 'a'[2..4]));
//! ```
//! <br>
//!
//! **Note**: `?` is `[0..1]`, `*` is `[0..]`, `+` is `[1..]`, and no repetition implies `[1]`.
//!
//! Unbounded repetition is greedy, stopping only at a mismatch or the end of input. You can ([`&` expression](#and)) to control the bounds.
//! ```rust
//! # use crate::*;
//! // locate the end `abc` then slice it and run `"ab"*` on it
//! assert!(matches!("ababababc": str, !"abc"* & "ab"* _[3]));
//! ```
//!
//! With the [`!`](#not) modifier, repetition takes precedence. With the [`~`](#near) modifier, the `~` takes precedence.
//! ```rust
//! # use crate::*;
//! assert!(!matches!("bbc": str, !~'b'[3] ~'b'[2] !'a'[3]));
//! assert!(!matches!("bbc": str, !~('b'[3]) ~('b'[2]) (!'a')[3]));
//! ```
//!
//! # Range
//! Range expressions (`lh`..`rh`) match a single token and check if it falls between `lh` and `rh` inclusively.
//!
//! `lh` and `rh` must only be literals, paths, or block atoms (without any modifiers or repetitions), or a group containing such atoms.
//!
//! `lh` and `rh` can be of different atom types but must resolve to the same underlying type.
//!
//! Range expressions resolve to [`RangeInclusive`](std::ops::RangeInclusive).
//! ```rust
//! # use crate::*;
//! assert!(matches!("abc": str, 'a'..'z' !('0'..'9') (('a'..'z'))+));
//! assert!(!matches!("1": str, 'a'..'z'));
//! ```
//!
//! # Sequence
//! A sequence expression is a list of expressions separated by whitespace and matched in order.
//! ```rust
//! # use crate::*;
//! assert!(matches!("abc": str, 'a' 'b' 'c'));
//! ```
//!
//! # Or
//! An `or` expression is a `|`-separated list of expressions that matches exactly one of them.
//!
//! The first expression that matches wins, and the rest are ignored. If none match, the error returned is from the last expression evaluated.
//!
//! `or` expressions have the lowest precedence, meaning they wrap all expressions until the next `|` or the end of the group.
//! ```rust
//! # use crate::*;
//! assert!(matches!("a": str, 'a' | 'b' 'c' | "abc"));
//! assert!(matches!("bcd": str, ('a' | 'b' 'c' | "abc") 'd'));
//! assert!(!matches!("d": str, 'a' | 'b' 'c' | "abc"));
//! // 'a' wins, leaving "bc" unmatched, which fails the overall match
//! assert!(!matches!("abc": str, 'a' | 'b' 'c' | "abc"));
//! ```
//!
//! # And
//! An `and` expression is an `&`-separated list of expressions that matches all of them against the exact same input.
//!
//! The first expression specifies the bounded matched section. The rest of the expressions then match against that specific section, ignoring any excess input. The first failure wins.
//!
//! `and` expressions have higher precedence than sequences.
//! ```rust
//! # use crate::*;
//! assert!(matches!("abc": str, ('a'..'z')[3] & ('a' _ _)));
//! assert!(matches!("abc": str, ('a'..'z')[3] & 'a' & (_ 'b')));
//! assert!(!matches!("abc": str, ('a'..'z')[3] & !"abc" & { touch(|_| print!("not reached") }));
//! ```
//!
//! # Imply
//! An `imply` expression (`cond -> expr`) is an expression that matches `expr` expression if a `cond` expression matches.
//!
//! it matches `expr` on the same input of `cond`, and matches nothing if `cond` fails.
//!
//! it has higher precedence than `sequence`s and `and`s, but lower precedence than `or`s.
//!
//! ```
//! assert!(matches!("abc123": str, alpha -> (alpha | dec)+));
//! assert!(matches!("": str, alpha -> (alpha | dec)+));
//! ```
//!
//! # Captures
//! ```text
//! (ident = 'a'..'z' | 'A'..'Z' | '0'..'9' | '_')
//! (value*: String = nb | str | ident)
//! ```
//! Captures are matched sections that are extracted for later use.
//!
//! Captures are defined inside parentheses with the syntax `(name = expr)`, where `name` is the identifier and `expr` is the matched expression.
//!
//! Captures can occur in any expression except inside modified/repeated units, and inside call atom arguments.
//! ```text
//! (allowed1 = 1) (allowed2 = 2) & (allowed3 = 3) | (allowed4 = 4) ((allowed5 = 5 (allowed6 = 6)))
//! !(not_allowed1 = 1) (not_allowed2= 2))? list<(not_allowed3 = 3), ','>
//! ```
//! Captures can be repeated by suffixing their name with a [repetition](#repetition) operator.
//!
//! `?` resolves to [`Option<T>`] and others resolve to [`Vec<T>`], where `T` is the type of the capture.
//! ```rust
//! # use crate::*;
//! assert_eq!(try_match!("abcd": str, 'a' (bc? = "bc") 'd').unwrap().bc, Some("bc"));
//! assert_eq!(try_match!("ad": str, 'a' (bc? = "bc") 'd').unwrap().bc, None);
//! assert_eq!(try_match!("abcbcbcd": str, 'a' (bc[2..5] = "bc") 'd').unwrap().bc, vec!["bc", "bc", "bc"]);
//! ```
//!
//! ### Capture Types
//! #### Normal
//! The matched expression does not contain any nested captures.
//!
//! These captures inherit the type of [`MatchAble::Slice`].
//! ```rust
//! # use crate::*;
//! assert_eq!(try_match!("abcd": str, 'a' (bc = "bc") 'd').unwrap().bc, "bc");
//! ```
//!
//! #### Term
//! Matches a local unparameterized [term](crate::gramex!#term) and inherits the type of that term capture.
//!
//! Their matched expression must be a lone, unmodified [`path` atom](#path) referring to that term.
//! ```rust
//! # use crate::*;
//! gramex! {
//! for str;
//! let ident = ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')+;
//! let value = (ident = ident);
//! }
//! assert_eq!(match_value("abc").unwrap().ident, "abc");
//! ```
//!
//! #### Imply
//! Matches an [`imply`](#imply) expression, resolve to `None` if `cond` fails and `Some(type)` if `expr` matches, where `type` is the capture type of `expr`.
//!
//! inherit the `expr` own capture type in type maps.
//!
//! ```
//! assert_eq(try_match("abc": str, 'a' (b = 'b' -> "bc")).unwrap().b, Some("bc"));
//! assert_eq(try_match("a": str, 'a' (b = 'b' -> "bc")).unwrap().b, None);
//! ```
//!
//! #### Structured
//! Captures that contain nested captures inside them.
//!
//! These nested captures can occur in any allowed place except inside an [`or`](#or) expression.
//!
//! Structured captures generate their own type, a struct containing their inner captures as fields, plus a `matched` field containing the whole matched section.
//! ```rust
//! # use crate::*;
//! let capture = try_match!("abcd": str, 'a' (bc = (b = 'b') (c = 'c')) 'd').unwrap().bc;
//! assert_eq!(capture.matched, "bc");
//! assert_eq!(capture.b, "b");
//! assert_eq!(capture.c, "c");
//! ```
//!
//! #### Enumerated
//! The matched expression is an [`or`](#or) expression containing nested captures.
//!
//! The inner captures do not need to be the `or` braches directly, they can be nested deeper inside it. However, no two captures can exist in the exact same branch.
//!
//! Not all branches need to contain captures.
//!
//! Enumerated captures generate their own enum type, representing the inner captures as variants, plus a default `None` variant if not all branches have captures.
//! ```rust
//! # use crate::*;
//! gramex! { for str; let example = 'a' (bc = "bc" | (b2c3 = "bbccc") | (b3c1 = "bbbc")) 'd'; }
//! use example_captures::root_types::bc as BC;
//! assert_eq!(match_example("abcd").unwrap().bc, BC::None);
//! assert_eq!(match_example("abbcccd").unwrap().bc, BC::b2c3("bbccc"));
//! assert_eq!(match_example("abbbcd").unwrap().bc, BC::b3c1("bbbc"));
//! ```
//!
//! ### Capture Mapping
//! Captures can specify their mapped type through `(name: Type = expr)`. `Type` can be any Rust type, but it must implement[`From<T>`] if a mapping block is not provided.
//! ```rust
//! # use crate::*;
//! assert_eq!(try_match!("abcd": str, 'a' (bc: String = "bc") 'd').unwrap().bc, "bc");
//! ```
//!
//! Captures can also be dynamically mapped using a block, defined as `(name = expr => { map_block })`. The `map_block` is a regular Rust block that resolves to a `Fn(T) -> U`, converting the capture's matched type into the desired mapped type.
//!
//! Map blocks can be used even if no explicit type is specified, transforming the data while retaining the original type.
//! ```rust
//! # use crate::*;
//! assert_eq!(try_match!("abcd": str, 'a' (bc = "bc" => { |v| &v[1..] }) 'd').unwrap().bc, "c");
//! assert_eq!(try_match!("abcd": str, 'a' (bc: String = "bc" => { |v| v.to_uppercase() }) 'd').unwrap().bc, "BC");
//! ```
//!
//! Type specifiers and map blocks control the capture's base type before it is passed to a repetition container (like `Vec`), not the inverse.
//!
//! For more info about capture types, see the [`gramex`](crate::gramex!#capture-types) macro documentation.
use crate::*;
use RangeInclusive;