1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
//! Parser Combinators for Unique Results.
//!
//! This crate provides building blocks to create parsers (and lexers) that return at most one output.
//!
//! # Quickstart
//!
//! The following shows a CSV line parser realised in parcours:
//!
//! ~~~
//! use parcours::{Parser, Combinator};
//!
//! // CSV line parser 👇 input 👇 state
//! fn csv_line<'a>() -> impl Parser<&'a str, (), O = Vec<&'a str>> {
//! use parcours::str::{matches, take_while};
//! take_while(|c, _| *c != b',').separated_by(matches(","))
//! }
//!
//! assert_eq!(
//! // 👇 input 👇 state
//! csv_line().parse("apples,oranges,pears", &mut ()),
//! // 👇 output 👇 remaining input
//! Some((vec!["apples", "oranges", "pears"], ""))
//! );
//! ~~~
//!
//! # Examples
//!
//! parcours provides a few examples that are each designed to teach a few concepts:
//!
//! * [JSON](../../../examples/json.rs): zero-copy string parsing, combinators
//! * [Lambda calculus](../../../examples/lambda.rs): error handling & mutable state
//! * [bc](../../../examples/bc.rs): separate lexer/parser, precedence climbing
//!
//! # Topics
//!
//! ## Error Recovery
//!
//! My opinion on error recovery has been shaped by the lecture of
//! [Don't Panic! Better, Fewer, Syntax Errors for LR Parsers](https://arxiv.org/pdf/1804.07133.pdf)
//! by Lukas Diekmann and Laurence Tratt, who write in their introduction:
//!
//! > When error recovery works well, it is a useful productivity gain.
//! > Unfortunately, most current error recovery approaches are simplistic.
//! > The most common grammar-neutral approach to error recovery are those algorithms described as
//! > "panic mode" [...] which skip input until the parser finds something it is able to parse.
//! > A more grammar-specific variation of this idea is to skip input until a
//! > pre-determined synchronisation token (e.g. ';' in Java) is reached [...],
//! > or to try inserting a single synchronisation token.
//! > Such strategies are often unsuccessful, leading to a cascade of spurious syntax errors [...].
//! > Programmers quickly learn that only the location of the first error in a file --- not
//! > the reported repair, nor the location of subsequent errors --- can be relied upon to be accurate.
//!
//! Their proposed solution is for LR parsers, which parcours cannot adapt.
//! Therefore, instead of adopting a "simplistic" error recovery approach,
//! I leave all control about error recovery to the user of the parcours.
//! While parcours can be used to emit multiple error messages due to its mutable state,
//! I have found it most useful to emit only the first syntax error message, as
//! I find myself only looking at the first syntax error anyway (as observed in the citation).
//! If you want support for multiple parse errors that are automatically derived and good,
//! you may consider using the [grmtools](https://github.com/softdevteam/grmtools) suite of the authors.
pub use ;
/// A parser takes input and mutable state, and maybe yields an output and remaining input.
///
/// # History
///
/// This trait, and especially its function [`Parser::parse`] shapes everything that goes on in this crate.
/// It is inspired to a great extent by a rhyme:
///
/// > *A Parser for Things
/// > is a function from Strings
/// > to Lists of Pairs
/// > of Things and Strings!*
/// > (Fritz Ruehr, ["Dr. Seuss on Parser Monads"](https://willamette.edu/~fruehr/haskell/seuss.html))
///
/// Ruehr uses the following type signature for parsers in Haskell:
///
/// ~~~ haskell
/// type Parser a = String -> [(a, String)]
/// ~~~
///
/// The function signature in this crate diverges from that in several aspects:
///
/// * It generalises the type of input to be `I`,
/// whereas it is always `String` in the Haskell case.
/// * It takes some mutable reference to a state `S`,
/// which is passed to every sub-parser and can be very useful,
/// for example to keep track of errors.
/// * The largest change, however, is the output type, which is
/// an `Option` instead of a list in Haskell.
///
/// The last change deserves some justification.
/// I tried for some time to recreate Ruehr's approach,
/// using `Iterator` as output type,
/// because this most closely resembles Haskell's lazy lists.
/// (The codename for this experiment was "parasite", standing for
/// "parsers as iterators".)
/// I imagined the ideal type signature to resemble this:
///
/// ~~~
/// trait Parser<I> {
/// type O;
/// fn parse(&self, x: I) -> Box<dyn Iterator<Item = (Self::O, I)>>;
/// }
/// ~~~
///
/// However, as I pursued with this approach,
/// I found myself quickly in lifetime and typing hell.
/// Everything was so hard that I stepped back and asked myself:
///
/// > *Is this too much Voodoo?*
/// > (Terry Davis, ["Hardest Question in Programming"](https://www.youtube.com/watch?v=EAeYn1xWqm8))
///
/// I then thought that for my purposes,
/// I could actually do with a single output (instead of arbitrarily many),
/// thus eliminating lifetime hell that stems from returning a `dyn Iterator`.
/// And thus `parcours` was born.
///
/// ## The Importance of Being Your`self`
///
/// You might ask why [`Parser::parse`] takes `self` instead of `&self`.
/// After all, the former consumes a parser when calling it, whereas
/// the latter allows calling a parser multiple times.
///
/// To answer your question, curious reader,
/// let us suppose that the [`Parser`] trait was defined as follows:
///
/// ~~~
/// pub trait Parser<I, S = ()> {
/// type O;
/// fn parse(&self, input: I, state: &mut S) -> Option<(Self::O, I)>;
/// }
/// ~~~
///
/// This is basically the same as what we have in parcours, only that
/// `parse` takes `&self`, not `self`.
///
/// The problem with this approach is that it effectively
/// **prohibits the construction of parsers that take [`FnOnce`] closures**.
/// That is because we cannot call an [`FnOnce`] if we have a reference to it:
///
/// ~~~ compile_fail
/// fn call_fn_once(f: &impl FnOnce()) {
/// f()
/// }
/// ~~~
///
/// And that means that if we have only a *reference* to a parser,
/// the parser cannot call any [`FnOnce`] closure that it might possess.
/// Which is logical, if you consider that a function taking `&self`
/// can be called arbitrarily often on some object,
/// yet [`FnOnce`] means that it can be called, well, only once.
/// This clashes.
///
/// So what's the big deal about [`FnOnce`]? Can't we just use [`Fn`] and take `&self`?
/// Yes, we can. That's actually the first thing I tried.
/// But this has a double negative impact on performance.
/// First, while it allows us to take a `Box<dyn Parser>` and call `parse` on it,
/// I found in benchmarks that just putting a single `Box` around a (heavily used) parser
/// increased runtime by about 10%.
/// So just being able to box a parser might tempt people to do it,
/// without realising the negative impact on performance.
/// Furthermore, when we use [`Fn`], we tend to have to move and clone more data.
/// Consider the following example, which is a very inefficient implementation of
/// [`core::iter::once`]:
///
/// ~~~ compile_fail
/// pub fn stupid_once<T>(x: T) -> impl Iterator<Item = T> {
/// core::iter::once(0).map(|_| x)
/// }
/// ~~~
///
/// Here, the compiler errors lead us quickly to the following solution:
///
/// ~~~
/// pub fn stupid_once<T: Clone>(x: T) -> impl Iterator<Item = T> {
/// core::iter::once(0).map(move |_| x.clone())
/// }
/// ~~~
///
/// This means that we *have* to clone `x` here,
/// even if we know that it will be returned just once.
/// That is because [`Iterator::map`] does not accept [`FnOnce`].
/// This pattern showed up a lot when using
/// the alternative `Parser` trait where `parse()` takes `&self`.
/// Aside from looking verbose, it is bad for performance.
///
/// On the other hand, take the following implementation:
///
/// ~~~
/// pub fn nice_once<T>(x: T) -> Option<T> {
/// Some(0).map(|_| x)
/// }
/// ~~~
///
/// Here, because [`Option::map`] takes [`FnOnce`], we neither have to move nor clone `x`.
///
/// All this means that once there was the decision for [`Parser::parse`] to return [`Option`],
/// there was a clear incentive to take `self` in [`Parser::parse`] in order to allow for [`FnOnce`].
/// Construct a parser from a function.
///
/// This is similar to [`core::iter::from_fn`].
/// It can be used to write a custom parser without having to write a lot of code.
/// For example, consider the following parser `value(o)` that simply returns
/// `o` without consuming any input:
///
/// ~~~
/// # use parcours::Parser;
/// fn value<O>(o: O) -> Value<O> {
/// Value(o)
/// }
///
/// struct Value<O>(O);
///
/// impl<I, S, O> Parser<I, S> for Value<O> {
/// type O = O;
///
/// fn parse(self, input: I, state: &mut S) -> Option<(Self::O, I)> {
/// Some((self.0, input))
/// }
/// }
/// ~~~
///
/// `value` can be written more concisely as:
///
/// ~~~
/// # use parcours::{Parser, from_fn};
/// fn value<I, S, O>(o: O) -> impl Parser<I, S, O = O> {
/// from_fn(|input, _state| Some((o, input)))
/// }
/// ~~~
/// A parser returned by [`from_fn`].
;
/// Lazily construct a parser from a function.
///
/// # Use cases
///
/// ## Construct recursive parsers
///
/// This allows you to create recursive parsers, like so:
///
/// ~~~
/// # use parcours::{Parser, lazy};
/// fn rec<I, S>() -> impl Parser<I, S, O = ()> {
/// lazy!(rec)
/// }
/// ~~~
///
/// This is equivalent to writing:
///
/// ~~~
/// # use parcours::{Parser, from_fn};
/// fn rec<I, S>() -> impl Parser<I, S, O = ()> {
/// from_fn(|input, state| rec().parse(input, state))
/// }
///
/// ~~~
///
/// You can also create recursive parsers from functions that take arguments:
///
/// ~~~
/// # use parcours::{Parser, Combinator, lazy};
/// fn rec<I: Clone, S, T: Clone>(x: T) -> impl Parser<I, S, O = ()> {
/// let x1 = x.clone();
/// lazy!(|| rec(x1)).or(lazy!(|| rec(x)))
/// }
/// ~~~
///
/// ## Reduce build times by type erasure
///
/// Like in `chumsky`, we can sometimes get very large parser types, even without recursion.
/// Such large parser types can significantly increase build times.
/// Consider the following example:
///
/// ~~~
/// # use parcours::{Parser, Combinator, lazy};
/// fn exp_type<I: Clone, S, P: Parser<I, S> + Clone>(p0: P) -> impl Parser<I, S, O = P::O> {
/// let p1 = p0.clone().or(p0);
/// let p2 = p1.clone().or(p1);
/// let p3 = p2.clone().or(p2);
/// let p4 = p3.clone().or(p3);
/// let p5 = p4.clone().or(p4);
/// let p6 = p5.clone().or(p5);
/// let p7 = p6.clone().or(p6);
/// let p8 = p7.clone().or(p7);
/// let p8 = lazy!(|| p8); // <---- erase type, comment out to see effect
/// let p9 = p8.clone().or(p8);
/// let p10 = p9.clone().or(p9);
/// let p11 = p10.clone().or(p10);
/// let p12 = p11.clone().or(p11);
/// let p13 = p12.clone().or(p12);
/// let p14 = p13.clone().or(p13);
/// let p15 = p14.clone().or(p14);
/// let p16 = p15.clone().or(p15);
/// p16
/// }
/// ~~~
///
/// Here, we construct a number of parsers, where
/// the type of each parser contains *twice* the type of the previous parser.
/// Therefore, the type of the last parser has a size that is
/// *exponential* in the number of parsers.
///
/// Or let's rather say that the type of the last parser *would* have exponential size
/// *if* we wouldn't use [`lazy!`] to treat `p8`.
/// (Feel free to comment out this line if you want your CPU to get some exercise.)
/// Indeed, this line erases the type of `p8`, so the size of the total parser remains tractable.
///
/// While this is an artificial example,
/// such large types may well occur in your parsers ---
/// they have certainly occured in mine!
/// In such cases, putting a little [`lazy!`] at strategic places
/// should reduce your build time.
/// It may also decrease your runtime --- if you frequently construct the parser.
///
/// ## Make [`Parser`] clonable
///
/// If you have a parser that does not implement [`Clone`],
/// you might make it clonable by wrapping it in a little [`lazy!`] call:
///
/// ~~~
/// # use parcours::{Parser, lazy, str::take_while};
/// fn non_clonable<'a>() -> impl Parser<&'a str, O = &'a str> {
/// take_while(|_, _| true)
/// }
/// fn clonable<'a>() -> impl Parser<&'a str, O = &'a str> + Clone {
/// lazy!(clonable)
/// }
/// ~~~
///
/// # Background
///
/// In parcours, like in many other parser combinator libraries in Rust,
/// the type of a parser is determined by how the parser is constructed.
/// For example, if we have parsers `p1: P1` and `p2: P2` as well as a function `f: F`,
/// the combined parser `p1.or(p2).map(f)` has the type
/// `Map<Any<(P1, P2)>, F>`:
///
/// ~~~
/// # use parcours::Parser;
/// # use parcours::combinator::{Combinator, Any, Map};
/// fn map_any<I: Clone, S, X, Y, P1, P2, F>(p1: P1, p2: P2, f: F) -> Map<Any<(P1, P2)>, F>
/// where
/// P1: Parser<I, S, O = X>,
/// P2: Parser<I, S, O = X>,
/// F: FnOnce(X) -> Y,
/// {
/// p1.or(p2).map(f)
/// }
/// ~~~
///
/// That means that the type of the parser grows with the size of its definition.
/// The same mechanism occurs when using traits like [`Iterator`].
///
/// What happens when we want to construct an `Iterator` recursively?
///
/// ~~~ compile_fail
/// fn rec_iter() -> impl Iterator<Item = u8> {
/// core::iter::once(0).chain(rec_iter())
/// }
/// ~~~
///
/// This fails, because the compiler points out to us that
/// the return type of `rec_iter` is a "recursive opaque type".
/// In particular, the return type would look something like
/// `Chain<Once<u8>, Chain<Once<u8>, ...>>` and be of infinite size!
/// Rust does not allow that.
///
/// So how do we get around this restriction?
/// When using iterators, we can simply put a `Box` around its definition:
///
/// ~~~
/// fn rec_iter() -> Box<dyn Iterator<Item = u8>> {
/// Box::new(core::iter::once(0).chain(rec_iter()))
/// }
/// ~~~
///
/// The `Box` around our definition effectively erases its type,
/// which allows for the recursive definition to go through.
/// This approach is also used, for example, in the `chumsky` crate.
///
/// I originally tried to use the same approach in parcours;
/// however, I hurt myself against another restriction:
/// In particular, if we have a `Box<dyn Trait>` and we want to call
/// a function of that trait, that function is not allowed to take `self`.
/// It is allowed, however, to take `&self` or `&mut self`.
/// That is why we can box an [`Iterator`], because
/// [`Iterator::next`] takes `&mut self`.
/// Same story for `chumsky`, whose `Parser::parse` function takes `&self`.
/// See the documentation of [`Parser`] for why [`Parser::parse`] takes `self` instead of `&self`.
///
/// So how does [`lazy!`] now actually work?
/// As you can see from its definition, `lazy!(p)` expands to
/// `from_fn(|input, state| p().parse(input, state))`.
/// It turns out that the result of this has a very small type, namely `FromFn<F>`, where
/// `F` *does not contain any information about how the parser `p()` was constructed*.
/// So [`lazy!`] effectively performs type erasure, like boxing an [`Iterator`] did.
/// And this is also precisely what allows us to construct recursive parsers with it.
///
/// I asked myself whether this technique could be transferred over to [`Iterator`].
/// The compiler guided me towards this solution:
///
/// ~~~ compile_fail
/// fn rec_iter() -> impl Iterator<Item = u8> {
/// let mut iter = core::iter::once(0).chain(rec_iter());
/// core::iter::from_fn(move || iter.next())
/// }
/// ~~~
///
/// It does not work! Here, the problem is that
/// [`core::iter::from_fn`] always only yields a *single* element,
/// which means that we *have* to construct `iter` *outside* the closure,
/// if we want to return more than one element.
/// But this means that we again hit the nasty "recursive opaque type" error.
/// So the [`from_fn`] trick that [`lazy!`] so successfully exploits
/// cannot be used for [`Iterator`]. There, we are stuck with `Box` for good.
///
/// # Limitations
///
/// Sometimes, unfortunately, [`lazy!`] doesn't cut it,
/// in particular when we apply it to parsers returned by closures.
/// In such cases, the compiler complains that we have to `move` stuff into the closures.
/// So here, we may fall back to using [`from_fn`]:
///
/// ~~~
/// # use parcours::{Parser, Combinator, from_fn};
/// fn exp_type<I: Clone, S, P: Parser<I, S>>(p0: impl Fn() -> P + Copy) -> impl Parser<I, S, O = P::O> {
/// let p1 = move || from_fn(move |i, s| p0().parse(i, s)).or(from_fn(move |i, s| p0().parse(i, s)));
/// let p2 = move || from_fn(move |i, s| p1().parse(i, s)).or(from_fn(move |i, s| p1().parse(i, s)));
/// p2()
/// }
/// ~~~
///
/// However, I noticed that in such cases, the build time remains exponential.
/// [`from_fn`] seems to fail to erase types! Yikes.
/// At this point, I have to confess that I do not really know *why* the [`from_fn`] trick works.
/// I just conclude that it must erase types, because we can then construct recursive parsers,
/// but I do not really understand why.
/// If you can point me to the cause, I would be most glad ...
///
/// # Future Work
///
/// On a related subject, I tried really hard to make [`lazy!`] a regular function, but I failed ...
/// For example, consider the following approach:
///
/// ~~~ compile_fail
/// # use parcours::{Parser, from_fn};
/// pub fn lazy2<I, S, P: Parser<I, S>, F: FnOnce() -> P>(f: F) -> impl Parser<I, S, O = P::O> {
/// from_fn(|input, state| f().parse(input, state))
/// }
///
/// fn rec<I, S>() -> impl Parser<I, S, O = ()> {
/// lazy2(rec)
/// }
/// ~~~
///
/// This fails because Rust encounters a "recursive opaque type" in `rec`.
///
/// If you can figure out a way to make [`lazy!`] a function, I would gladly accept a PR.
/// Lazily construct a parser from a function.
///
/// This can be useful to make a parser that implements [`Clone`] from
/// a function that returns a parser which does *not* implement [`Clone`].
///
/// This function complements the [`lazy!`] macro, as
/// it can be applied in places where [`lazy!`] does not work.
/// In particular, when the compiler suggests that we use `move` when using [`lazy!`]
/// (which seems to happen when we give it a closure), consider this function.
///
/// However, unlike [`lazy!`], this function is not suitable for type erasure,
/// so it cannot be used to construct recursive parsers.
/// A parser returned by [`lazy()`].
;
/// Pattern matching for successful cases.
///
/// When using functions like [`slice::first_filter_map`],
/// we frequently end up with patterns like this:
///
/// ~~~
/// # use parcours::{Parser, slice};
/// let p = || slice::first_filter_map(|first, _state| match first {
/// 0 => Some(false),
/// 1 => Some(true),
/// _ => None,
/// });
/// assert_eq!(p().parse(&[0], &mut ()), Some((false, &[] as &[usize])));
/// assert_eq!(p().parse(&[1], &mut ()), Some((true, &[] as &[usize])));
/// ~~~
///
/// In the `match` statement, we have a few cases that yield [`Some`],
/// followed by a final one yielding [`None`].
/// Because this is such a common case, `select!` allows us to write it like that:
///
/// ~~~
/// # use parcours::{Parser, select, slice};
/// let p = || slice::first_filter_map(select!(
/// 0 => false,
/// 1 => true,
/// ));
/// assert_eq!(p().parse(&[0], &mut ()), Some((false, &[] as &[usize])));
/// assert_eq!(p().parse(&[1], &mut ()), Some((true, &[] as &[usize])));
/// ~~~
///
/// We can also use pattern guards like in normal `match` statements.
///
/// This is inspired by [chumsky's `select!` macro](https://github.com/zesterer/chumsky/blob/40fe7d1966f375b3c676d01e04c5dca08f7615ac/src/lib.rs#L1486).
);
}