mpl 0.3.1

One-rule TDPL/PEG parsing language with a static-codegen backend (FastParse) that beats pest, peg, nom, winnow, and chumsky on equal-work benchmarks.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
# Minimal Parsing Language (MPL)

[![Crate](https://img.shields.io/crates/v/mpl.svg)](https://crates.io/crates/mpl)
[![API](https://docs.rs/mpl/badge.svg)](https://docs.rs/mpl)
[![CI](https://github.com/kurotakazuki/mpl/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/kurotakazuki/mpl/actions/workflows/ci.yml)

A small parsing language for Rust built around a single rule form `A = B C / D`,
with a static-codegen backend (`FastParse`) that beats `pest`, `peg`, `nom`,
`winnow`, `chumsky`, and even `serde_json` on equal-work recognition benchmarks.

MPL is deliberately minimal: TDPL/GTDPL is equivalent to PEG in recognition power,
and one rule shape is enough to encode sequence, choice, recursion, repetition,
and negative predicates. The compile-time analyzer (`mpl-macro`) detects char-class
cascades, multi-prefix dispatch, tail loops, and direct left recursion in your
grammar, then generates one inlined Rust function per rule with the appropriate
fast path automatically.

## Highlights

- **Static code generation** via `#[derive(FastParse)]` reads your `.mplg` and emits one
  function per rule — no `HashMap` rule lookup, no trait dispatch.
- **Six automatic optimizations** at codegen time:
  - char-class cascade → `[bool; 256]` byte table
  - multi-prefix cascade → `match input[pos]` first-byte dispatch
  - `R = X R / ()``loop {}` (LLVM TCO is unreliable for `Result<u32,()>`)
  - direct left recursion → Squirrel-style seed-growing, position-only memo
  - Mizushima cut analysis: rules with disjoint FIRST sets are exempt from memoisation
  - thin `CheckState` for recognition-only paths (peg-compatible calling convention)
- **Squirrel left recursion**: `R = R x / Y` works directly — no grammar rewriting,
  no `precedence!` macro, no manual seed handling.
- **Beats every comparator** measured (criterion, sample size 100, `RUSTFLAGS=-C target-cpu=native`).

## Performance

All numbers below are recognition-only (`bool` return), apples-to-apples.
Lower is better; first-place per row in **bold**.

Benchmarks were re-run on 2026-05-30 with `RUSTFLAGS="-C target-cpu=native"`
after dependency refresh (`pest 2.8.6`, `pest_derive 2.8.6`, `peg 0.8.6`,
`winnow 1.0.3`, `nom 8.0.0`, `chumsky 0.13.0`, `criterion 0.8.2`).

### Parentheses (no backtracking, depth N)

| Size | **mpl-fast-check** | peg | winnow | nom | pest | chumsky |
|---|---|---|---|---|---|---|
| nested-1024 | **2.26 µs** | 8.70 µs | 8.68 µs | 7.50 µs | 93.8 µs | 23.7 µs |
| nested-4096 | **9.23 µs** | 35.6 µs | 33.5 µs | 20.8 µs | 389 µs | 93.2 µs |
| flat-1024 | **2.11 µs** | 4.37 µs | 4.76 µs | 4.92 µs | 96.7 µs | 17.8 µs |

### JSON (full subset with escapes)

`mpl/json.mplg` mirrors the JSON spec for whitespace, string escapes
(`\\n`, `\\t`, `\\\\`, `\\"`, etc.), objects, arrays, integers, and the
literals. peg and pest grammars match the same subset.

Each parser library produces a different default output shape, so we
stratify the comparison into three apples-to-apples tiers.

#### Tier 1: Recognition only (no parse tree)

| Shape (~ size) | **mpl-fast-gen** | peg `()` | mpl/peg |
|---|---|---|---|
| json-1024 (~85 KB) | **148 µs** | 261 µs | mpl 1.76× faster |
| canada-1024 | **18.2 µs** | 45.1 µs | mpl 2.48× faster |
| twitter-256 | **34.3 µs** | 62.3 µs | mpl 1.82× faster |
| citm-1024 | **87.8 µs** | 171.9 µs | mpl 1.96× faster |

#### Tier 2: Flat token / Pair queue construction

`mpl-fast-gen-tokens` is `fast_parse` (`Vec<Token>`); `pest` always
emits its `Pair` queue (`Vec<QueueableToken>`). Both produce navigable
token streams that the caller walks — same kind of work.

| Shape | mpl-fast-gen-tokens | pest | mpl/pest |
|---|---|---|---|
| json-1024 | **1.52 ms** | 1.78 ms | mpl 1.17× faster |
| canada-1024 | 406 µs | **241 µs** | pest 1.68× faster |
| twitter-256 | **362 µs** | 397 µs | mpl 1.10× faster |
| citm-1024 | **1.10 ms** | 1.15 ms | mpl 1.05× faster |

`pest` wins on `canada-1024` because mpl's single-rule TDPL form emits
a `Start`/`End` pair per variable per match — finer-grained than pest's
non-silent-rule queue. For the deeply-nested coordinate arrays in
canada this multiplies the token count. Every other shape mpl wins.

#### Tier 3: Typed AST construction

Only `serde_json` builds a typed `Value` tree directly — that is much
more work than recognition or flat tokens, so a fair direct comparison
would need an `mpl::Token` → `Value` walker (not yet shipped). For
reference, here is what each tier looks like on the same input:

| Shape | mpl-fast-gen-recognize | mpl-fast-gen-tokens | serde_json (`Value`) |
|---|---|---|---|
| json-1024 | 148 µs | 1.52 ms | 637 µs |
| canada-1024 | 18.2 µs | 406 µs | 94.5 µs |
| twitter-256 | 34.3 µs | 362 µs | 125 µs |
| citm-1024 | 87.8 µs | 1.10 ms | 473 µs |

`serde_json` is the de-facto reference and is between the recognition
tier and the token tier — it builds nested `Vec<Value>` / `BTreeMap`
allocations directly, which is faster than mpl's per-variable Start/End
emission for these shapes but slower than pure recognition.

### What "AST output" means in mpl

mpl-fast-gen exposes three output tiers; the row labels in the bench
tables map to these directly.

| Method | Output | Tier |
|---|---|---|
| `fast_recognize_only(&[u8]) -> bool` | nothing — just a yes/no | Tier 1 |
| `fast_recognize(&[u8]) -> bool` | builds tokens internally; returns bool | Tier 2 (close to pest's `parse(...).is_ok()`) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ParseError>` | flat token buffer | Tier 2 (apples-to-apples with pest's `Pairs`) |

The flat token buffer is navigable as a typed AST view via
[`mpl::fast::ParseTree`] — `Pair`/`Leaf`/`NodeView`/`Children` provide
zero-copy iteration, span ranges, and parent/child traversal. This
matches pest's `Pairs` API.

Building a typed `Value`-style enum from the token buffer (e.g. for a
JSON deserializer) is a small additional walker that the caller writes;
the cost is added on top of `fast_parse`.

### Left-recursive (`LSum = LSum N / N`)

`peg` here uses `#[cache_left_rec]` (van Rossum seed-growing, the same family
as MPL's Squirrel pass).

| Size | **mpl-fast-gen** | peg cache-lr | mpl-squirrel-recognize (trait) |
|---|---|---|---|
| lsum-64 | **1.95 µs** | 2.28 µs | 28.8 µs |
| lsum-256 | **7.92 µs** | 9.04 µs | 118 µs |
| lsum-1024 | **30.0 µs** | 36.0 µs | 464 µs |

Across `lsum-4` → `lsum-1024`, `mpl-fast-gen` scales 4×-per-step (linear) just
like `peg`, and beats it by 1.04×–1.27× on every size measured.

Bench harness and grammars are in [examples/paren_bench/](examples/paren_bench/).

## Quick Start

`Cargo.toml`:
```toml
[dependencies]
mpl = "0.3.1"
mpl-macro = "0.2.1"
```

The fast path is `#[derive(FastParse)]`. Write a grammar file and derive:

```rust
use mpl_macro::FastParse;

#[derive(FastParse)]
#[mplg = "parens.mplg"]
pub struct ParenParser;

fn main() {
    let p = ParenParser;
    assert!(p.fast_recognize(b"(()(()))"));
    assert!(!p.fast_recognize(b"(()"));
}
```

Where `parens.mplg`:

```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```

The derive emits four entry points on the parser struct:

| Method | What it returns | When to use |
|---|---|---|
| `fast_recognize_only(&[u8]) -> bool` | bool, no AST | hot recognition; thin `CheckState`, peg-compatible calling convention |
| `fast_recognize(&[u8]) -> bool` | bool, builds tokens internally | parity with `pest` (always materialises a token queue) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ()>` | flat token buffer | walk the parse tree |
| `fast_recognize_in(&[u8], &Bump) -> bool` | bool, externally-supplied arena | high-throughput pipelines that reset `Bump` between parses |

For an AST view over the flat token buffer, use [`mpl::fast::ParseTree`] —
it provides `Pair` / `Leaf` / `Children` iterators without any extra allocation.

## Three API tiers

| Tier | API | Codegen | Tradeoff |
|---|---|---|---|
| 1. Hand-rolled | `Parser::parse` (trait) | none — interprets `RightRule` data | full control over rules, slowest |
| 2. Legacy derive | `#[derive(Parse)]` | generates `Rules` HashMap impl | API stable since 0.2; same speed as tier 1 |
| 3. **Fast derive** | `#[derive(FastParse)]` | one static `fn` per rule + auto-optimisations | fastest path; recommended for new code |

Tiers 1 and 2 stay supported for back-compat. New code should use tier 3.
Migration: typically a one-line change from `#[derive(Parse)]` to `#[derive(FastParse)]`,
plus replacing `parser.parse(...)` with `parser.fast_parse(...)`.

## How the codegen earns its speed

The `FastParse` derive runs four compile-time analyses against your grammar
and picks the cheapest valid lowering for each rule. Concretely, for the
included BubFns (99-rule arithmetic) grammar:

- `OrOrExpr = AndAndExpr OrOrExpr1 / AndAndExpr` and `Power = Atom PowerAndFactor / Atom`  flagged as needing memoisation by Mizushima cut analysis (FIRST sets overlap).
  Other 92 rules are LL(1)-disjoint and skip the cache entirely.
- `Variable = UppercaseX () / Variable1` and `DecDigit = '0' () / DecDigit1` cascades →
  collapsed into `[bool; 256]` byte tables.
- `Atom = ExprInParens / FloatLiteral / IntegerLiteral / Function / Variable / Constant`  first-byte dispatch (single `match input[pos]`).
- `ZeroOrMoreDecDigits = DecDigit ZeroOrMoreDecDigits / ()`  `while let Ok(p) = parse_dec_digit::<false>(state, p) { ... }` (no recursion).

Recognition-only callers go through a thin `CheckState` (input + furthest pos
only); emit-mode callers go through `ParserState` and produce flat `Token`
buffers (Sampson-style — no nested boxes).

## Left recursion: `Squirrel` algorithm

When a rule has the shape `R = R X / Y`, `FastParse` automatically routes
recognition through Squirrel-style seed-growing memoisation. No grammar
rewriting needed — the algorithm is invoked transparently.

```text
LSum = LSum N / N
N = { Char('1') } () / N1
...
```

Generated code holds a position-only `(rule_id, pos) -> Option<u32>` memo
per parse and iterates seed expansion until convergence. This matches peg's
`#[cache_left_rec]` family and was the basis on which we beat peg on
`lsum-256` and `lsum-1024`.

## Legacy APIs (tier 1 & 2)

The original 0.2 APIs remain supported. Pick one only if you need
features `FastParse` does not yet cover (custom `Output` type integration,
runtime-built grammars, or non-byte input).

### Tier 2: legacy `#[derive(Parse)]`

Generates a `Rules` HashMap impl plus the `Parser` trait method calls.
Identical perf characteristics to tier 1; identical AST shape.

`parentheses.mplg`:
```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```

`main.rs`:
```rust
use mpl::parser::Parser;
use mpl::span::StartAndLenSpan;
use mpl::trees::AST;
use mpl_macro::Parse;

#[derive(Parse, Debug)]
#[mplg = "parentheses.mplg"]
pub struct ParenParser;

fn main() {
    let parser = ParenParser;
    let input = b"(()(()))";

    let all_of_the_span =
        StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);
    let result: Result<
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &ParenRules, &ParenVariable::Open, &all_of_the_span);

    assert!(result.is_ok());
}
```

The derive strips a trailing `Parser` from the struct name and uses the
remaining stem (here `Paren`) to generate `ParenVariable` and `ParenRules`.

### Tier 1: build a parser in Rust

If you want the same one-rule model with complete control over rules and outputs, define them directly in Rust.

```rust
use crate::ParenthesesVariable::*;
use mpl::parser::Parser;
use mpl::rules::{RightRule, RightRuleKind::*, Rules};
use mpl::span::{StartAndLenSpan, Start, Len};
use mpl::output::Output;
use mpl::symbols::{StrTerminal, StrTerminal::*, Variable};
use mpl::trees::AST;
use std::collections::HashMap;

#[derive(Clone, Debug, Hash, Eq, PartialEq)]
enum ParenthesesVariable {
    Open,
    Parentheses,
    Close,
}

impl Variable for ParenthesesVariable {}

struct ParenthesesParser;

impl<'i, V, P, L, R, O> Parser<'i, str, StrTerminal<'i>, V, StartAndLenSpan<P, L>, P, R, O>
    for ParenthesesParser
where
    V: Variable,
    P: Start<str, L>,
    L: Len<str, P>,
    R: Rules<StrTerminal<'i>, V>,
    O: Output<'i, str, V, StartAndLenSpan<P, L>>,
{
}

/// ```
/// Open = '(' Parentheses / ()
/// Parentheses = Open Close / f
/// Close = ")" Open / f
/// ```
fn main() {
    let parser = ParenthesesParser;
    let mut rules = HashMap::new();

    rules.insert(
        Open,
        RightRule::from_right_rule_kind((T(Char('(')), V(Parentheses)), Empty),
    );
    rules.insert(
        Parentheses,
        RightRule::from_right_rule_kind((V(Open), V(Close)), Failure),
    );
    rules.insert(
        Close,
        RightRule::from_right_rule_kind((T(Str(")")), V(Open)), Failure),
    );

    let input = "(()(()))";

    // all of the span
    let all_of_the_span = StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);

    let result: Result<
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &rules, &Open, &all_of_the_span);

    if let Ok(ast) = result {
        println!("{}", ast);
    }
}
```

## Examples
<!-- - [Number](tests/number.rs) -->
- [Parentheses]examples/paren
<!-- - [Wav Riff](tests/wav_riff.rs) -->

<!-- ### Parsers written with MPL
- [WAV AST](https://github.com/kurotakazuki/wav_ast) : RIFF waveform Audio Format -->

## MPL
### Definition of MPL grammar
A MPL grammar `G` is a tuple `G = (V, Σ, R, S)` in which:
- `V` is a finite set of variables.
- `Σ` is a finite set of original terminal symbols.
- `T` is an union of `Σ` or `M`&cup; M) (`M` (= {(), f}) is a finite set of metasymbols).
- `R` is a finite set of rules of the form
    - `A = B C / D`  
    A in V (A &isin; V),  
    B, C, D in E (E = T &cup; V) (T &cap; V = &empty;) (B, C, D &isin; E).  
    For any variable A there is exactly one rule with A to the left of `=`.
- S in V (S &isin; V) is the start variable.

#### Empty
`()` is a metasymbol that always succeeds without consuming input.

```rust ignore
Empty = () () / ()
```

#### Failure
`f` is a metasymbol that always fails without consuming input.

```rust ignore
Failure = f f / f
```

### Extended MPL
Since one of the goals of MPL is to create an AST, it also supports two features in terms of ease of use and speed.

#### Any
`?` is a metasymbol representing any single input like wildcard character. This succeeds if there is any input left, and fails if there is no input left.

```rust ignore
Any = ? () / f
```

To extend the difinition of MPL grammar, let ? &isin; M.

#### All
`*` is a metasymbol representing All remaining input like wildcard character. This will succeed even if the remaining inputs are zero.

```rust ignore
All = * () / f
```

Same as `All = ? All / ()`.

To extend the difinition of MPL grammar, let * &isin; M.

<!---
#### Variable type
Variables can have a type.

If the variable contains a type, it will include the value of that type, such as a token, when the AST is created. Therefore rules decomposed from variable including rule has a role like lexical analysis. The following syntax is a lexical syntax for numbers.

```
Number: String = Digit Numeral / f
Numeral = Digit Numeral / ()
Digit = Zero () / f
Zero = "0" () / One
One = "1" () / Two
// ...
Nine = "9" () / f
```

An error (TODO: maybe failure would be better) will occur if the input cannot be converted to the variable type.

To extend the difinition of MPL grammar, change `A = B C / D` to `A = B C / D` or `A: TYPE = B C / D`, or seperate definition of `V` by including type or not.
--->


<!-- #### Terminal symbol type
Terminal symbols supports several types.

```
A = "A" "abc" / [0, 0, 0]
```

Supports `&str, &[u8]` at this moment. -->

## Difference between TDPL and MPL
The core design choice in MPL is to reduce the grammar to one rule form. TDPL has two rule forms, and PEGs were later shown to be reducible to TDPL/GTDPL with equivalent effective recognition power.

> `A..BC/D`, A,B,C,D in V.  
> `A..a`, a in &sum; &cup; {&epsilon;, f}, f is a metasymbol not in &sum; and &epsilon; is the null string.

MPL, on the other hand, keeps everything in a single form:

> `A = B C / D`

That is the main simplification and the main selling point of the language: MPL aims to keep TDPL/PEG-class expressive power in a much smaller surface syntax.


## MPLG (MPL Grammar) syntax
<!-- ### In PEG like grammar
```rust ignore
// Hierarchical syntax
MPLG = (Line)*
Line = (LineComment / Rule / ()) EndOfLine
Rule = Variable " = " E Space E " / " E
E = TerminalSymbol / Variable

// Lexical syntax
// Variable
Variable = Uppercase (Alphabet / DecDigit)*

// Terminal symbol
TerminalSymbol = Expr

// Expr
Expr = LiteralExpr

// Literal
LiteralExpr = MetasymbolLiteral / StringLiteral

// Metasymbol
MetasymbolLiteral = EmptyLiteral / FailureLiteral / AnyLiteral / AllLiteral
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' () / f
AllLiteral = '*' () / f

// String
StringLiteral = "\"" (NotStringLetter / QuoteEscape / ?)* "\""
NotStringLetter = !("\"")

// Letters
Alphabet = Lowercase / Uppercase
UppercaseAToF = "A" / "B" / "C" / "D" / "E" / "F"
LowercaseAToF = "a" / "b" / "c" / "d" / "e" / "f"
Uppercase = UppercaseAToF / "G" / "H" / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" / "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
Lowercase = LowercaseAToF / "g" / "h" / "i" / "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" / "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z"

QuoteEscape = "\\'" / "\\\""
EndOfLine = "\r\n" / '\n'
Space = " "

// Digits
BinDigit = "0" / "1"
OctDigit = BinDigit / "2" / "3" / "4" / "5" / "6" / "7"
DecDigit = OctDigit / "8" / "9"
HexDigit = DecDigit / UppercaseAToF / LowercaseAToF

// Comment
LineComment = "//" (!(EndOfLine) ?)*
``` -->

### In MPL grammar
```rust ignore
// Hierarchical syntax
Mplg = ZeroOrMoreLines () / f
ZeroOrMoreLines = Line ZeroOrMoreLines / ()

Line = Line1 EndOfLine / f
Line1 = LineComment () / Line2
Line2 = Rule () / ()

Rule = Variable Rule1 / f
Rule1 = " = " Rule2 / f
Rule2 = E Rule3 / f
Rule3 = Space Rule4 / f
Rule4 = E Rule5 / f
Rule5 = " / " Rule6 / f
Rule6 = E () / f
E = TerminalSymbol () / Variable


// Lexical syntax
// Variable
Variable = Identifier () / f

// Terminal symbol
TerminalSymbol = MetasymbolLiteral () / OriginalSymbolExpr

// Expr
Expr = ExprWithoutBlock () / f

// Without Block
ExprWithoutBlock = LiteralExpr () / ExprWithoutBlock1
ExprWithoutBlock1 = StructExpr () / f

// Struct
StructExpr = StructExprStruct () / StructExpr1
StructExpr1 = StructExprTuple () / StructExprUnit

StructExprStruct = f f / f

StructExprTuple = PathInExpr StructExprTuple1 / f
StructExprTuple1 = '(' StructExprTuple2 / f
StructExprTuple2 = ZeroOrMoreExpr ')' / f
ZeroOrMoreExpr = Expr () / f

StructExprUnit = PathInExpr () / f

// PathInExpr
PathInExpr = ZeroOrOneDoubleColon OneOrMorePathExprSegment / f
ZeroOrOneDoubleColon = "::" () / ()
OneOrMorePathExprSegment = PathExprSegment () / f

PathExprSegment = PathIdentSegment PathExprSegment1 / f
PathExprSegment1 = "::" GenericArgs / ()

PathIdentSegment = Identifier () / f

GenericArgs = f f / f

// Literal
LiteralExpr = CharLiteral () / LiteralExpr1
LiteralExpr1 = StringLiteral () / LiteralExpr2
LiteralExpr2 = IntegerLiteral () / f

// Metasymbol
MetasymbolLiteral = EmptyLiteral () / MetasymbolLiteral1
MetasymbolLiteral1 = FailureLiteral () / MetasymbolLiteral2
MetasymbolLiteral2 = AnyLiteral () / MetasymbolLiteral3
MetasymbolLiteral3 = AllLiteral () / f
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' ZeroOrMoreAny / f
ZeroOrMoreAny = '?' ZeroOrMoreAny / ()
AllLiteral = '*' () / f

// Original symbol
OriginalSymbolExpr = "{ " OriginalSymbolExpr1 / f
OriginalSymbolExpr1 = ExprWithoutBlock " }" / f

// Char
CharLiteral = '\'' CharLiteral1 / f
CharLiteral1 = InnerCharLiteral '\'' / f
InnerCharLiteral = NotCharLetter InnerCharLiteral1 / f
NotCharLetter = '\'' * / ()
InnerCharLiteral1 = QuoteEscape () / ?

// String
StringLiteral = '"' StringLiteral1 / f
StringLiteral1 = InnerStringLiteral '"' / f
InnerStringLiteral = InnerStringLiteralLetter InnerStringLiteral / ()
// InnerStringLiteralLetter
InnerStringLiteralLetter = NotStringLetter InnerStringLiteralLetter1 / f
NotStringLetter = '"' * / ()
InnerStringLiteralLetter1 = QuoteEscape () / ?

// Integer
IntegerLiteral = IntegerLiterals () / f
IntegerLiterals = DecLiteral () / f
DecLiteral = DecDigit ZeroOrMoreDecDigit / f
ZeroOrMoreDecDigit = DecDigitOrUnderscore ZeroOrMoreDecDigit / ()
DecDigitOrUnderscore = DecDigit () / '_'

// IDENTIFIER
Identifier = Uppercase ZeroOrMoreIdentifierContinue / f
ZeroOrMoreIdentifierContinue =  IdentifierContinue ZeroOrMoreIdentifierContinue / ()
IdentifierContinue =  Alphabet () / DecDigit


// Letters
Alphabet = Lowercase () / Uppercase
// Lowercase
LowercaseAToF = LowercaseAToF1 () / f
LowercaseAToF1 = 'a' () / LowercaseAToF2
LowercaseAToF2 = 'b' () / LowercaseAToF3
LowercaseAToF3 = 'c' () / LowercaseAToF4
LowercaseAToF4 = 'd' () / LowercaseAToF5
LowercaseAToF5 = 'e' () / LowercaseAToF6
LowercaseAToF6 = 'f' () / f
Lowercase = LowercaseAToF () / Lowercase1
Lowercase1 = 'g' () / Lowercase2
Lowercase2 = 'h' () / Lowercase3
Lowercase3 = 'i' () / Lowercase4
Lowercase4 = 'j' () / Lowercase5
Lowercase5 = 'k' () / Lowercase6
Lowercase6 = 'l' () / Lowercase7
Lowercase7 = 'm' () / Lowercase8
Lowercase8 = 'n' () / Lowercase9
Lowercase9 = 'o' () / Lowercase10
Lowercase10 = 'p' () / Lowercase11
Lowercase11 = 'q' () / Lowercase12
Lowercase12 = 'r' () / Lowercase13
Lowercase13 = 's' () / Lowercase14
Lowercase14 = 't' () / Lowercase15
Lowercase15 = 'u' () / Lowercase16
Lowercase16 = 'v' () / Lowercase17
Lowercase17 = 'w' () / Lowercase18
Lowercase18 = 'x' () / Lowercase19
Lowercase19 = 'y' () / Lowercase20
Lowercase20 = 'z' () / f
// Uppercase
UppercaseAToF = UppercaseAToF1 () / f
UppercaseAToF1 = 'A' () / UppercaseAToF2
UppercaseAToF2 = 'B' () / UppercaseAToF3
UppercaseAToF3 = 'C' () / UppercaseAToF4
UppercaseAToF4 = 'D' () / UppercaseAToF5
UppercaseAToF5 = 'E' () / UppercaseAToF6
UppercaseAToF6 = 'F' () / f
Uppercase = UppercaseAToF () / Uppercase1
Uppercase1 = 'G' () / Uppercase2
Uppercase2 = 'H' () / Uppercase3
Uppercase3 = 'I' () / Uppercase4
Uppercase4 = 'J' () / Uppercase5
Uppercase5 = 'K' () / Uppercase6
Uppercase6 = 'L' () / Uppercase7
Uppercase7 = 'M' () / Uppercase8
Uppercase8 = 'N' () / Uppercase9
Uppercase9 = 'O' () / Uppercase10
Uppercase10 = 'P' () / Uppercase11
Uppercase11 = 'Q' () / Uppercase12
Uppercase12 = 'R' () / Uppercase13
Uppercase13 = 'S' () / Uppercase14
Uppercase14 = 'T' () / Uppercase15
Uppercase15 = 'U' () / Uppercase16
Uppercase16 = 'V' () / Uppercase17
Uppercase17 = 'W' () / Uppercase18
Uppercase18 = 'X' () / Uppercase19
Uppercase19 = 'Y' () / Uppercase20
Uppercase20 = 'Z' () / f

QuoteEscape = "\\'" () / "\\\""
EndOfLine = "\r\n" () / '\n'
Space = ' ' () / f

// Digits
BinDigit = "0" () / "1"
OctDigit = BinDigit () / OctDigit1
OctDigit1 = "2" () / OctDigit2
OctDigit2 = "3" () / OctDigit3
OctDigit3 = "4" () / OctDigit4
OctDigit4 = "5" () / OctDigit5
OctDigit5 = "6" () / OctDigit6
OctDigit6 = "7" () / f
DecDigit = OctDigit () / DecDigit1
DecDigit1 = "8" () / DecDigit2
DecDigit2 = "9" () / f

// Comment
LineComment = "//" InnerLineComment / f
InnerLineComment = AnyExceptLF InnerLineComment / ()
AnyExceptLF = AnyExceptLF1 ? / f
AnyExceptLF1 = EndOfLine * / ()
```

<!---
```rust ignore
// Hierarchical syntax
MPLG = (Line)*
Line = (Rule / LineComment / ()) EndOfLine
Rule = Variable " = " E Space E " / " E
E = TerminalSymbol / Variable

// Lexical syntax
// Variable
Variable = Uppercase (Alphabet / DecDigit)*

// Terminal symbol
TerminalSymbol = Expr

// Expr
Expr = LiteralExpr / ArrayExpr

// Array
ArrayExpr = "[" ArrayElements? "]"
ArrayElements = Expr ("," Expr)* ","? / Expr ";" Expr

// Literal
LiteralExpr = MetasymbolLiteral / StringLiteral / IntegerLiteral
// LiteralExpr = MetasymbolLiteral / CharLiteral / StringLiteral / IntegerLiteral / FloatLiteral

// Metasymbol
MetasymbolLiteral = EmptyLiteral / FailureLiteral / AnyLiteral / AllLiteral
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' () / f
AllLiteral = '*' () / f

// String
// TODO Multibyte character may not work.
StringLiteral = "\"" (!("\"" / "\\" / IsolatedCR) . / QuoteEscape / ASCIIEscape / UnicodeEscape / StringContinue)* "\""
StringContinue = "\\" &"\n" 

// Char
// CharLiteral = "\'" (!("\'" / "\n" / "\r" / "\t") . / QuoteEscape / ASCIIEscape / UnicodeEscape) '\''
QuoteEscape = "\\'" / "\\\""
ASCIIEscape = "\\x" OctDigit HexDigit / "\\n" / "\\r" / "\\t" / "\\\\" / "\\0"
UnicodeEscape = "\\u{" (HexDigit "_"*)^1..6 "}"

// Integer
IntegerLiteral = (DecLiteral / HexLiteral) IntegerSuffix?
DecLiteral = DecDigit (DecDigit / "_")*
HexLiteral =  "0x" (HexDigit / "_")* HexDigit (HexDigit / "_")*

IntegerSuffix = "u8" / "u16" / "u32" / "u64" / "u128" / "usize" / "i8" / "i16" / "i32" / "i64" / "i128" / "isize"

// Float
FloatLiteral = DecLiteral "." ()

FloatExponent = ("e" / "E") ("+" / "-")? (DecDigit / "_")* DecDigit (DecDigit / "_")*
FloatSuffix = "f32" / "f64"


// Letters
UppercaseAToF = "A" / "B" / "C" / "D" / "E" / "F"
LowercaseAToF = "a" / "b" / "c" / "d" / "e" / "f"
Uppercase = UppercaseAToF / "G" / "H" / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" / "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
Lowercase = LowercaseAToF / "g" / "h" / "i" / "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" / "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z"
Alphabet = Uppercase / Lowercase

// Digits
BinDigit = "0" / "1"
OctDigit = BinDigit / "2" / "3" / "4" / "5" / "6" / "7"
DecDigit = OctDigit / "8" / "9"
HexDigit = DecDigit / UppercaseAToF / LowercaseAToF

// Comment
LineComment = "//" (!("\n") .)*

IsolatedCR = "\r" !"\n" ()
// TODO: Maybe need EOF
EndOfLine = "\n" / "\r\n"
Space = " "
```

--->

## TODO

### Tasks
- into_first() in CST

### 
- Add { Original } in mplg
- Add functions that easy to get Variable from AST
- Add RowColSpan
- Create parser from MPLG file.
- Error Handling
- Packrat Parsing
- Left Recursion

## Next implementation
- [ ] Add functions that easy to get Variable from AST
- [ ] Can be Variable in Leaf Node
- [ ] Error Handling

## Practice
### Sequence
`A <- e1 e2`
```rust ignore
A = e1 e2 / f
```

### Choice
`A <- e1 / e2`
```rust ignore
A = e1 () / e2
```

### Zero or more
`A <- e*`
```rust ignore
A = e A / ()
```

### Not predicate
`A <- !e ?`
```rust ignore
A = B ? / f
B = e * / ()
```

## References

### Foundations of TDPL / PEG / packrat parsing

- Alexander Birman. *The TMG Recognition Schema.* PhD thesis,
  Princeton University, February 1970. (TDPL — the "one rule"
  formulation that MPL specialises.)
- Alfred V. Aho and Jeffrey D. Ullman. *The Theory of Parsing,
  Translation and Compiling — Vol. I: Parsing.* Prentice Hall, 1972.
- Bryan Ford. 2002. *Packrat parsing: a practical linear-time
  algorithm with backtracking.* PhD dissertation, MIT.
- Bryan Ford. 2004. *Parsing expression grammars: a recognition-based
  syntactic foundation.* POPL 2004, pp. 111–122.

### Algorithms used by mpl-macro's optimisations

- Kota Mizushima, Atusi Maeda, and Yoshinori Yamaguchi. 2010.
  *Packrat parsers can handle practical grammars in mostly constant
  space.* PASTE 2010. **— FIRST-set / cut analysis used by
  `cascade::analyze_char_classes` and `cut::compute_first_sets` to
  exclude LL(1)-disjoint rules from memoisation.**
- Alessandro Warth, James R. Douglass, and Todd Millstein. 2008.
  *Packrat parsers can support left recursion.* PEPM 2008.
  **— seed-growing iteration used in
  [`Parser::squirrel_parse`]https://docs.rs/mpl and the
  `FastParse`-generated LR codegen.**
- Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014.
  *Left recursion in Parsing Expression Grammars.* Science of Computer
  Programming 96, pp. 177–190.
- Luke A. D. Hutchison. 2020. *Pika parsing: reformulating packrat
  parsing as a dynamic programming algorithm solves the left
  recursion and error recovery problems.* arXiv preprint
  [arXiv:2005.06444]https://arxiv.org/abs/2005.06444.
- Luke A. D. Hutchison. 2026. *Squirrel: a unified packrat parsing
  algorithm with linear-time guarantees, optimal error recovery, and
  full left-recursion support.* arXiv preprint
  [arXiv:2601.05012]https://arxiv.org/abs/2601.05012. **— the
  family of position-only seed-growing memoisation MPL's
  `squirrel_recognize` and the `FastParse` LR path implement.**

### Parser data-layout influences

- Adrian Sampson. *Flattening ASTs (and other compiler data
  structures).* 2023.
  [cs.cornell.edu/~asampson/blog/flattening.html]https://www.cs.cornell.edu/~asampson/blog/flattening.html
  **— the flat `Vec<Token>` representation used in `mpl::fast`.**
- Aleksey Kladov ("matklad"). *Resilient LL parsing tutorial.* 2023.
  Background on lossy / lossless syntax tree design.
- Joshua Haberman ("Reverberate"). *Parsing Protobuf at 2+ GB/s: how I
  learned to love tail calls in C.* blog.reverberate.org, 2021.
  **— motivation for the explicit `loop {}` codegen in
  `cascade::is_tail_loop` (Rust's TCO is unreliable for
  `Result<u32, ()>`-shaped recursive returns).**

All algorithms above are implemented from the published descriptions;
no source code is copied. PEG / TDPL / packrat parsing and their
extensions are not patented to our knowledge.

## License

This project is licensed under either of:

- Apache License 2.0 ([LICENSE-APACHE]LICENSE-APACHE or
  <https://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT]LICENSE-MIT or
  <https://opensource.org/licenses/MIT>)

at your option. All runtime dependencies (`bumpalo`, `proc-macro2`,
`quote`, `syn`) are dual-licensed `MIT OR Apache-2.0`, compatible with
both.