mpl-macro 0.2.1

# Minimal Parsing Language (MPL)

[![Crate](https://img.shields.io/crates/v/mpl.svg)](https://crates.io/crates/mpl)
[![API](https://docs.rs/mpl/badge.svg)](https://docs.rs/mpl)
[![CI](https://github.com/kurotakazuki/mpl/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/kurotakazuki/mpl/actions/workflows/ci.yml)

A small parsing language for Rust built around a single rule form `A = B C / D`,
with a static-codegen backend (`FastParse`) that beats `pest`, `peg`, `nom`,
`winnow`, `chumsky`, and even `serde_json` on equal-work recognition benchmarks.

MPL is deliberately minimal: TDPL/GTDPL is equivalent to PEG in recognition power,
and one rule shape is enough to encode sequence, choice, recursion, repetition,
and negative predicates. The compile-time analyzer (`mpl-macro`) detects char-class
cascades, multi-prefix dispatch, tail loops, and direct left recursion in your
grammar, then generates one inlined Rust function per rule with the appropriate
fast path automatically.

## Highlights

- **Static code generation** via `#[derive(FastParse)]` reads your `.mplg` and emits one
  function per rule — no `HashMap` rule lookup, no trait dispatch.
- **Six automatic optimizations** at codegen time:
  - char-class cascade → `[bool; 256]` byte table
  - multi-prefix cascade → `match input[pos]` first-byte dispatch
  - `R = X R / ()` → `loop {}` (LLVM TCO is unreliable for `Result<u32,()>`)
  - direct left recursion → Squirrel-style seed-growing, position-only memo
  - Mizushima cut analysis: rules with disjoint FIRST sets are exempt from memoisation
  - thin `CheckState` for recognition-only paths (peg-compatible calling convention)
- **Squirrel left recursion**: `R = R x / Y` works directly — no grammar rewriting,
  no `precedence!` macro, no manual seed handling.
- **Beats every comparator** measured (criterion, sample size 100, `RUSTFLAGS=-C target-cpu=native`).

## Performance

All numbers below are recognition-only (`bool` return), apples-to-apples.
Lower is better; first-place per row in **bold**.

Benchmarks were re-run on 2026-05-30 with `RUSTFLAGS="-C target-cpu=native"`
after dependency refresh (`pest 2.8.6`, `pest_derive 2.8.6`, `peg 0.8.6`,
`winnow 1.0.3`, `nom 8.0.0`, `chumsky 0.13.0`, `criterion 0.8.2`).

### Parentheses (no backtracking, depth N)

| Size | **mpl-fast-check** | peg | winnow | nom | pest | chumsky |
|---|---|---|---|---|---|---|
| nested-1024 | **2.26 µs** | 8.70 µs | 8.68 µs | 7.50 µs | 93.8 µs | 23.7 µs |
| nested-4096 | **9.23 µs** | 35.6 µs | 33.5 µs | 20.8 µs | 389 µs | 93.2 µs |
| flat-1024 | **2.11 µs** | 4.37 µs | 4.76 µs | 4.92 µs | 96.7 µs | 17.8 µs |

### JSON (full subset with escapes)

`mpl/json.mplg` mirrors the JSON spec for whitespace, string escapes
(`\\n`, `\\t`, `\\\\`, `\\"`, etc.), objects, arrays, integers, and the
literals. peg and pest grammars match the same subset.

Each parser library produces a different default output shape, so we
stratify the comparison into three apples-to-apples tiers.

#### Tier 1: Recognition only (no parse tree)

| Shape (~ size) | **mpl-fast-gen** | peg `()` | mpl/peg |
|---|---|---|---|
| json-1024 (~85 KB) | **148 µs** | 261 µs | mpl 1.76× faster |
| canada-1024 | **18.2 µs** | 45.1 µs | mpl 2.48× faster |
| twitter-256 | **34.3 µs** | 62.3 µs | mpl 1.82× faster |
| citm-1024 | **87.8 µs** | 171.9 µs | mpl 1.96× faster |

#### Tier 2: Flat token / Pair queue construction

`mpl-fast-gen-tokens` is `fast_parse` (`Vec<Token>`); `pest` always
emits its `Pair` queue (`Vec<QueueableToken>`). Both produce navigable
token streams that the caller walks — same kind of work.

| Shape | mpl-fast-gen-tokens | pest | mpl/pest |
|---|---|---|---|
| json-1024 | **1.52 ms** | 1.78 ms | mpl 1.17× faster |
| canada-1024 | 406 µs | **241 µs** | pest 1.68× faster |
| twitter-256 | **362 µs** | 397 µs | mpl 1.10× faster |
| citm-1024 | **1.10 ms** | 1.15 ms | mpl 1.05× faster |

`pest` wins on `canada-1024` because mpl's single-rule TDPL form emits
a `Start`/`End` pair per variable per match — finer-grained than pest's
non-silent-rule queue. For the deeply-nested coordinate arrays in
canada this multiplies the token count. Every other shape mpl wins.

#### Tier 3: Typed AST construction

Only `serde_json` builds a typed `Value` tree directly — that is much
more work than recognition or flat tokens, so a fair direct comparison
would need an `mpl::Token` → `Value` walker (not yet shipped). For
reference, here is what each tier looks like on the same input:

| Shape | mpl-fast-gen-recognize | mpl-fast-gen-tokens | serde_json (`Value`) |
|---|---|---|---|
| json-1024 | 148 µs | 1.52 ms | 637 µs |
| canada-1024 | 18.2 µs | 406 µs | 94.5 µs |
| twitter-256 | 34.3 µs | 362 µs | 125 µs |
| citm-1024 | 87.8 µs | 1.10 ms | 473 µs |

`serde_json` is the de-facto reference and is between the recognition
tier and the token tier — it builds nested `Vec<Value>` / `BTreeMap`
allocations directly, which is faster than mpl's per-variable Start/End
emission for these shapes but slower than pure recognition.

### What "AST output" means in mpl

mpl-fast-gen exposes three output tiers; the row labels in the bench
tables map to these directly.

| Method | Output | Tier |
|---|---|---|
| `fast_recognize_only(&[u8]) -> bool` | nothing — just a yes/no | Tier 1 |
| `fast_recognize(&[u8]) -> bool` | builds tokens internally; returns bool | Tier 2 (close to pest's `parse(...).is_ok()`) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ParseError>` | flat token buffer | Tier 2 (apples-to-apples with pest's `Pairs`) |

The flat token buffer is navigable as a typed AST view via
[`mpl::fast::ParseTree`] — `Pair`/`Leaf`/`NodeView`/`Children` provide
zero-copy iteration, span ranges, and parent/child traversal. This
matches pest's `Pairs` API.

Building a typed `Value`-style enum from the token buffer (e.g. for a
JSON deserializer) is a small additional walker that the caller writes;
the cost is added on top of `fast_parse`.

### Left-recursive (`LSum = LSum N / N`)

`peg` here uses `#[cache_left_rec]` (van Rossum seed-growing, the same family
as MPL's Squirrel pass).

| Size | **mpl-fast-gen** | peg cache-lr | mpl-squirrel-recognize (trait) |
|---|---|---|---|
| lsum-64 | **1.95 µs** | 2.28 µs | 28.8 µs |
| lsum-256 | **7.92 µs** | 9.04 µs | 118 µs |
| lsum-1024 | **30.0 µs** | 36.0 µs | 464 µs |

Across `lsum-4` → `lsum-1024`, `mpl-fast-gen` scales 4×-per-step (linear) just
like `peg`, and beats it by 1.04×–1.27× on every size measured.

Bench harness and grammars are in [examples/paren_bench/](examples/paren_bench/).

## Quick Start

`Cargo.toml`:
```toml
[dependencies]
mpl = "0.3.1"
mpl-macro = "0.2.1"
```

The fast path is `#[derive(FastParse)]`. Write a grammar file and derive:

```rust
use mpl_macro::FastParse;

#[derive(FastParse)]
#[mplg = "parens.mplg"]
pub struct ParenParser;

fn main() {
    let p = ParenParser;
    assert!(p.fast_recognize(b"(()(()))"));
    assert!(!p.fast_recognize(b"(()"));
}
```

Where `parens.mplg`:

```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```

The derive emits four entry points on the parser struct:

| Method | What it returns | When to use |
|---|---|---|
| `fast_recognize_only(&[u8]) -> bool` | bool, no AST | hot recognition; thin `CheckState`, peg-compatible calling convention |
| `fast_recognize(&[u8]) -> bool` | bool, builds tokens internally | parity with `pest` (always materialises a token queue) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ()>` | flat token buffer | walk the parse tree |
| `fast_recognize_in(&[u8], &Bump) -> bool` | bool, externally-supplied arena | high-throughput pipelines that reset `Bump` between parses |

For an AST view over the flat token buffer, use [`mpl::fast::ParseTree`] —
it provides `Pair` / `Leaf` / `Children` iterators without any extra allocation.

## Three API tiers

| Tier | API | Codegen | Tradeoff |
|---|---|---|---|
| 1. Hand-rolled | `Parser::parse` (trait) | none — interprets `RightRule` data | full control over rules, slowest |
| 2. Legacy derive | `#[derive(Parse)]` | generates `Rules` HashMap impl | API stable since 0.2; same speed as tier 1 |
| 3. **Fast derive** | `#[derive(FastParse)]` | one static `fn` per rule + auto-optimisations | fastest path; recommended for new code |

Tiers 1 and 2 stay supported for back-compat. New code should use tier 3.
Migration: typically a one-line change from `#[derive(Parse)]` to `#[derive(FastParse)]`,
plus replacing `parser.parse(...)` with `parser.fast_parse(...)`.

## How the codegen earns its speed

The `FastParse` derive runs four compile-time analyses against your grammar
and picks the cheapest valid lowering for each rule. Concretely, for the
included BubFns (99-rule arithmetic) grammar:

- `OrOrExpr = AndAndExpr OrOrExpr1 / AndAndExpr` and `Power = Atom PowerAndFactor / Atom` →
  flagged as needing memoisation by Mizushima cut analysis (FIRST sets overlap).
  Other 92 rules are LL(1)-disjoint and skip the cache entirely.
- `Variable = UppercaseX () / Variable1` and `DecDigit = '0' () / DecDigit1` cascades →
  collapsed into `[bool; 256]` byte tables.
- `Atom = ExprInParens / FloatLiteral / IntegerLiteral / Function / Variable / Constant` →
  first-byte dispatch (single `match input[pos]`).
- `ZeroOrMoreDecDigits = DecDigit ZeroOrMoreDecDigits / ()` →
  `while let Ok(p) = parse_dec_digit::<false>(state, p) { ... }` (no recursion).

Recognition-only callers go through a thin `CheckState` (input + furthest pos
only); emit-mode callers go through `ParserState` and produce flat `Token`
buffers (Sampson-style — no nested boxes).

## Left recursion: `Squirrel` algorithm

When a rule has the shape `R = R X / Y`, `FastParse` automatically routes
recognition through Squirrel-style seed-growing memoisation. No grammar
rewriting needed — the algorithm is invoked transparently.

```text
LSum = LSum N / N
N = { Char('1') } () / N1
...
```

Generated code holds a position-only `(rule_id, pos) -> Option<u32>` memo
per parse and iterates seed expansion until convergence. This matches peg's
`#[cache_left_rec]` family and was the basis on which we beat peg on
`lsum-256` and `lsum-1024`.

## Legacy APIs (tier 1 & 2)

The original 0.2 APIs remain supported. Pick one only if you need
features `FastParse` does not yet cover (custom `Output` type integration,
runtime-built grammars, or non-byte input).

### Tier 2: legacy `#[derive(Parse)]`

Generates a `Rules` HashMap impl plus the `Parser` trait method calls.
Identical perf characteristics to tier 1; identical AST shape.

`parentheses.mplg`:
```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```

`main.rs`:
```rust
use mpl::parser::Parser;
use mpl::span::StartAndLenSpan;
use mpl::trees::AST;
use mpl_macro::Parse;

#[derive(Parse, Debug)]
#[mplg = "parentheses.mplg"]
pub struct ParenParser;

fn main() {
    let parser = ParenParser;
    let input = b"(()(()))";

    let all_of_the_span =
        StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);
    let result: Result<
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &ParenRules, &ParenVariable::Open, &all_of_the_span);

    assert!(result.is_ok());
}
```

The derive strips a trailing `Parser` from the struct name and uses the
remaining stem (here `Paren`) to generate `ParenVariable` and `ParenRules`.

### Tier 1: build a parser in Rust

If you want the same one-rule model with complete control over rules and outputs, define them directly in Rust.

```rust
use crate::ParenthesesVariable::*;
use mpl::parser::Parser;
use mpl::rules::{RightRule, RightRuleKind::*, Rules};
use mpl::span::{StartAndLenSpan, Start, Len};
use mpl::output::Output;
use mpl::symbols::{StrTerminal, StrTerminal::*, Variable};
use mpl::trees::AST;
use std::collections::HashMap;

#[derive(Clone, Debug, Hash, Eq, PartialEq)]
enum ParenthesesVariable {
    Open,
    Parentheses,
    Close,
}

impl Variable for ParenthesesVariable {}

struct ParenthesesParser;

impl<'i, V, P, L, R, O> Parser<'i, str, StrTerminal<'i>, V, StartAndLenSpan<P, L>, P, R, O>
    for ParenthesesParser
where
    V: Variable,
    P: Start<str, L>,
    L: Len<str, P>,
    R: Rules<StrTerminal<'i>, V>,
    O: Output<'i, str, V, StartAndLenSpan<P, L>>,
{
}

/// ```
/// Open = '(' Parentheses / ()
/// Parentheses = Open Close / f
/// Close = ")" Open / f
/// ```
fn main() {
    let parser = ParenthesesParser;
    let mut rules = HashMap::new();

    rules.insert(
        Open,
        RightRule::from_right_rule_kind((T(Char('(')), V(Parentheses)), Empty),
    );
    rules.insert(
        Parentheses,
        RightRule::from_right_rule_kind((V(Open), V(Close)), Failure),
    );
    rules.insert(
        Close,
        RightRule::from_right_rule_kind((T(Str(")")), V(Open)), Failure),
    );

    let input = "(()(()))";

    // all of the span
    let all_of_the_span = StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);

    let result: Result<
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &rules, &Open, &all_of_the_span);

    if let Ok(ast) = result {
        println!("{}", ast);
    }
}
```

## Examples
<!-- - [Number](tests/number.rs) -->
- [Parentheses](examples/paren)
<!-- - [Wav Riff](tests/wav_riff.rs) -->

<!-- ### Parsers written with MPL
- [WAV AST](https://github.com/kurotakazuki/wav_ast) : RIFF waveform Audio Format -->

## MPL
### Definition of MPL grammar
A MPL grammar `G` is a tuple `G = (V, Σ, R, S)` in which:
- `V` is a finite set of variables.
- `Σ` is a finite set of original terminal symbols.
- `T` is an union of `Σ` or `M` (Σ &cup; M) (`M` (= {(), f}) is a finite set of metasymbols).
- `R` is a finite set of rules of the form
    - `A = B C / D`  
    A in V (A &isin; V),  
    B, C, D in E (E = T &cup; V) (T &cap; V = &empty;) (B, C, D &isin; E).  
    For any variable A there is exactly one rule with A to the left of `=`.
- S in V (S &isin; V) is the start variable.

#### Empty
`()` is a metasymbol that always succeeds without consuming input.

```rust ignore
Empty = () () / ()
```

#### Failure
`f` is a metasymbol that always fails without consuming input.

```rust ignore
Failure = f f / f
```

### Extended MPL
Since one of the goals of MPL is to create an AST, it also supports two features in terms of ease of use and speed.

#### Any
`?` is a metasymbol representing any single input like wildcard character. This succeeds if there is any input left, and fails if there is no input left.

```rust ignore
Any = ? () / f
```

To extend the difinition of MPL grammar, let ? &isin; M.

#### All
`*` is a metasymbol representing All remaining input like wildcard character. This will succeed even if the remaining inputs are zero.

```rust ignore
All = * () / f
```

Same as `All = ? All / ()`.

To extend the difinition of MPL grammar, let * &isin; M.

<!---
#### Variable type
Variables can have a type.

If the variable contains a type, it will include the value of that type, such as a token, when the AST is created. Therefore rules decomposed from variable including rule has a role like lexical analysis. The following syntax is a lexical syntax for numbers.

```
Number: String = Digit Numeral / f
Numeral = Digit Numeral / ()
Digit = Zero () / f
Zero = "0" () / One
One = "1" () / Two
// ...
Nine = "9" () / f
```

An error (TODO: maybe failure would be better) will occur if the input cannot be converted to the variable type.

To extend the difinition of MPL grammar, change `A = B C / D` to `A = B C / D` or `A: TYPE = B C / D`, or seperate definition of `V` by including type or not.
--->


<!-- #### Terminal symbol type
Terminal symbols supports several types.

```
A = "A" "abc" / [0, 0, 0]
```

Supports `&str, &[u8]` at this moment. -->

## Difference between TDPL and MPL
The core design choice in MPL is to reduce the grammar to one rule form. TDPL has two rule forms, and PEGs were later shown to be reducible to TDPL/GTDPL with equivalent effective recognition power.

> `A..BC/D`, A,B,C,D in V.  
> `A..a`, a in &sum; &cup; {&epsilon;, f}, f is a metasymbol not in &sum; and &epsilon; is the null string.

MPL, on the other hand, keeps everything in a single form:

> `A = B C / D`

That is the main simplification and the main selling point of the language: MPL aims to keep TDPL/PEG-class expressive power in a much smaller surface syntax.


## MPLG (MPL Grammar) syntax
<!-- ### In PEG like grammar
```rust ignore
// Hierarchical syntax
MPLG = (Line)*
Line = (LineComment / Rule / ()) EndOfLine
Rule = Variable " = " E Space E " / " E
E = TerminalSymbol / Variable

// Lexical syntax
// Variable
Variable = Uppercase (Alphabet / DecDigit)*

// Terminal symbol
TerminalSymbol = Expr

// Expr
Expr = LiteralExpr

// Literal
LiteralExpr = MetasymbolLiteral / StringLiteral

// Metasymbol
MetasymbolLiteral = EmptyLiteral / FailureLiteral / AnyLiteral / AllLiteral
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' () / f
AllLiteral = '*' () / f

// String
StringLiteral = "\"" (NotStringLetter / QuoteEscape / ?)* "\""
NotStringLetter = !("\"")

// Letters
Alphabet = Lowercase / Uppercase
UppercaseAToF = "A" / "B" / "C" / "D" / "E" / "F"
LowercaseAToF = "a" / "b" / "c" / "d" / "e" / "f"
Uppercase = UppercaseAToF / "G" / "H" / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" / "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
Lowercase = LowercaseAToF / "g" / "h" / "i" / "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" / "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z"

QuoteEscape = "\\'" / "\\\""
EndOfLine = "\r\n" / '\n'
Space = " "

// Digits
BinDigit = "0" / "1"
OctDigit = BinDigit / "2" / "3" / "4" / "5" / "6" / "7"
DecDigit = OctDigit / "8" / "9"
HexDigit = DecDigit / UppercaseAToF / LowercaseAToF

// Comment
LineComment = "//" (!(EndOfLine) ?)*
``` -->

### In MPL grammar
```rust ignore
// Hierarchical syntax
Mplg = ZeroOrMoreLines () / f
ZeroOrMoreLines = Line ZeroOrMoreLines / ()

Line = Line1 EndOfLine / f
Line1 = LineComment () / Line2
Line2 = Rule () / ()

Rule = Variable Rule1 / f
Rule1 = " = " Rule2 / f
Rule2 = E Rule3 / f
Rule3 = Space Rule4 / f
Rule4 = E Rule5 / f
Rule5 = " / " Rule6 / f
Rule6 = E () / f
E = TerminalSymbol () / Variable


// Lexical syntax
// Variable
Variable = Identifier () / f

// Terminal symbol
TerminalSymbol = MetasymbolLiteral () / OriginalSymbolExpr

// Expr
Expr = ExprWithoutBlock () / f

// Without Block
ExprWithoutBlock = LiteralExpr () / ExprWithoutBlock1
ExprWithoutBlock1 = StructExpr () / f

// Struct
StructExpr = StructExprStruct () / StructExpr1
StructExpr1 = StructExprTuple () / StructExprUnit

StructExprStruct = f f / f

StructExprTuple = PathInExpr StructExprTuple1 / f
StructExprTuple1 = '(' StructExprTuple2 / f
StructExprTuple2 = ZeroOrMoreExpr ')' / f
ZeroOrMoreExpr = Expr () / f

StructExprUnit = PathInExpr () / f

// PathInExpr
PathInExpr = ZeroOrOneDoubleColon OneOrMorePathExprSegment / f
ZeroOrOneDoubleColon = "::" () / ()
OneOrMorePathExprSegment = PathExprSegment () / f

PathExprSegment = PathIdentSegment PathExprSegment1 / f
PathExprSegment1 = "::" GenericArgs / ()

PathIdentSegment = Identifier () / f

GenericArgs = f f / f

// Literal
LiteralExpr = CharLiteral () / LiteralExpr1
LiteralExpr1 = StringLiteral () / LiteralExpr2
LiteralExpr2 = IntegerLiteral () / f

// Metasymbol
MetasymbolLiteral = EmptyLiteral () / MetasymbolLiteral1
MetasymbolLiteral1 = FailureLiteral () / MetasymbolLiteral2
MetasymbolLiteral2 = AnyLiteral () / MetasymbolLiteral3
MetasymbolLiteral3 = AllLiteral () / f
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' ZeroOrMoreAny / f
ZeroOrMoreAny = '?' ZeroOrMoreAny / ()
AllLiteral = '*' () / f

// Original symbol
OriginalSymbolExpr = "{ " OriginalSymbolExpr1 / f
OriginalSymbolExpr1 = ExprWithoutBlock " }" / f

// Char
CharLiteral = '\'' CharLiteral1 / f
CharLiteral1 = InnerCharLiteral '\'' / f
InnerCharLiteral = NotCharLetter InnerCharLiteral1 / f
NotCharLetter = '\'' * / ()
InnerCharLiteral1 = QuoteEscape () / ?

// String
StringLiteral = '"' StringLiteral1 / f
StringLiteral1 = InnerStringLiteral '"' / f
InnerStringLiteral = InnerStringLiteralLetter InnerStringLiteral / ()
// InnerStringLiteralLetter
InnerStringLiteralLetter = NotStringLetter InnerStringLiteralLetter1 / f
NotStringLetter = '"' * / ()
InnerStringLiteralLetter1 = QuoteEscape () / ?

// Integer
IntegerLiteral = IntegerLiterals () / f
IntegerLiterals = DecLiteral () / f
DecLiteral = DecDigit ZeroOrMoreDecDigit / f
ZeroOrMoreDecDigit = DecDigitOrUnderscore ZeroOrMoreDecDigit / ()
DecDigitOrUnderscore = DecDigit () / '_'

// IDENTIFIER
Identifier = Uppercase ZeroOrMoreIdentifierContinue / f
ZeroOrMoreIdentifierContinue =  IdentifierContinue ZeroOrMoreIdentifierContinue / ()
IdentifierContinue =  Alphabet () / DecDigit


// Letters
Alphabet = Lowercase () / Uppercase
// Lowercase
LowercaseAToF = LowercaseAToF1 () / f
LowercaseAToF1 = 'a' () / LowercaseAToF2
LowercaseAToF2 = 'b' () / LowercaseAToF3
LowercaseAToF3 = 'c' () / LowercaseAToF4
LowercaseAToF4 = 'd' () / LowercaseAToF5
LowercaseAToF5 = 'e' () / LowercaseAToF6
LowercaseAToF6 = 'f' () / f
Lowercase = LowercaseAToF () / Lowercase1
Lowercase1 = 'g' () / Lowercase2
Lowercase2 = 'h' () / Lowercase3
Lowercase3 = 'i' () / Lowercase4
Lowercase4 = 'j' () / Lowercase5
Lowercase5 = 'k' () / Lowercase6
Lowercase6 = 'l' () / Lowercase7
Lowercase7 = 'm' () / Lowercase8
Lowercase8 = 'n' () / Lowercase9
Lowercase9 = 'o' () / Lowercase10
Lowercase10 = 'p' () / Lowercase11
Lowercase11 = 'q' () / Lowercase12
Lowercase12 = 'r' () / Lowercase13
Lowercase13 = 's' () / Lowercase14
Lowercase14 = 't' () / Lowercase15
Lowercase15 = 'u' () / Lowercase16
Lowercase16 = 'v' () / Lowercase17
Lowercase17 = 'w' () / Lowercase18
Lowercase18 = 'x' () / Lowercase19
Lowercase19 = 'y' () / Lowercase20
Lowercase20 = 'z' () / f
// Uppercase
UppercaseAToF = UppercaseAToF1 () / f
UppercaseAToF1 = 'A' () / UppercaseAToF2
UppercaseAToF2 = 'B' () / UppercaseAToF3
UppercaseAToF3 = 'C' () / UppercaseAToF4
UppercaseAToF4 = 'D' () / UppercaseAToF5
UppercaseAToF5 = 'E' () / UppercaseAToF6
UppercaseAToF6 = 'F' () / f
Uppercase = UppercaseAToF () / Uppercase1
Uppercase1 = 'G' () / Uppercase2
Uppercase2 = 'H' () / Uppercase3
Uppercase3 = 'I' () / Uppercase4
Uppercase4 = 'J' () / Uppercase5
Uppercase5 = 'K' () / Uppercase6
Uppercase6 = 'L' () / Uppercase7
Uppercase7 = 'M' () / Uppercase8
Uppercase8 = 'N' () / Uppercase9
Uppercase9 = 'O' () / Uppercase10
Uppercase10 = 'P' () / Uppercase11
Uppercase11 = 'Q' () / Uppercase12
Uppercase12 = 'R' () / Uppercase13
Uppercase13 = 'S' () / Uppercase14
Uppercase14 = 'T' () / Uppercase15
Uppercase15 = 'U' () / Uppercase16
Uppercase16 = 'V' () / Uppercase17
Uppercase17 = 'W' () / Uppercase18
Uppercase18 = 'X' () / Uppercase19
Uppercase19 = 'Y' () / Uppercase20
Uppercase20 = 'Z' () / f

QuoteEscape = "\\'" () / "\\\""
EndOfLine = "\r\n" () / '\n'
Space = ' ' () / f

// Digits
BinDigit = "0" () / "1"
OctDigit = BinDigit () / OctDigit1
OctDigit1 = "2" () / OctDigit2
OctDigit2 = "3" () / OctDigit3
OctDigit3 = "4" () / OctDigit4
OctDigit4 = "5" () / OctDigit5
OctDigit5 = "6" () / OctDigit6
OctDigit6 = "7" () / f
DecDigit = OctDigit () / DecDigit1
DecDigit1 = "8" () / DecDigit2
DecDigit2 = "9" () / f

// Comment
LineComment = "//" InnerLineComment / f
InnerLineComment = AnyExceptLF InnerLineComment / ()
AnyExceptLF = AnyExceptLF1 ? / f
AnyExceptLF1 = EndOfLine * / ()
```

<!---
```rust ignore
// Hierarchical syntax
MPLG = (Line)*
Line = (Rule / LineComment / ()) EndOfLine
Rule = Variable " = " E Space E " / " E
E = TerminalSymbol / Variable

// Lexical syntax
// Variable
Variable = Uppercase (Alphabet / DecDigit)*

// Terminal symbol
TerminalSymbol = Expr

// Expr
Expr = LiteralExpr / ArrayExpr

// Array
ArrayExpr = "[" ArrayElements? "]"
ArrayElements = Expr ("," Expr)* ","? / Expr ";" Expr

// Literal
LiteralExpr = MetasymbolLiteral / StringLiteral / IntegerLiteral
// LiteralExpr = MetasymbolLiteral / CharLiteral / StringLiteral / IntegerLiteral / FloatLiteral

// Metasymbol
MetasymbolLiteral = EmptyLiteral / FailureLiteral / AnyLiteral / AllLiteral
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' () / f
AllLiteral = '*' () / f

// String
// TODO Multibyte character may not work.
StringLiteral = "\"" (!("\"" / "\\" / IsolatedCR) . / QuoteEscape / ASCIIEscape / UnicodeEscape / StringContinue)* "\""
StringContinue = "\\" &"\n" 

// Char
// CharLiteral = "\'" (!("\'" / "\n" / "\r" / "\t") . / QuoteEscape / ASCIIEscape / UnicodeEscape) '\''
QuoteEscape = "\\'" / "\\\""
ASCIIEscape = "\\x" OctDigit HexDigit / "\\n" / "\\r" / "\\t" / "\\\\" / "\\0"
UnicodeEscape = "\\u{" (HexDigit "_"*)^1..6 "}"

// Integer
IntegerLiteral = (DecLiteral / HexLiteral) IntegerSuffix?
DecLiteral = DecDigit (DecDigit / "_")*
HexLiteral =  "0x" (HexDigit / "_")* HexDigit (HexDigit / "_")*

IntegerSuffix = "u8" / "u16" / "u32" / "u64" / "u128" / "usize" / "i8" / "i16" / "i32" / "i64" / "i128" / "isize"

// Float
FloatLiteral = DecLiteral "." ()

FloatExponent = ("e" / "E") ("+" / "-")? (DecDigit / "_")* DecDigit (DecDigit / "_")*
FloatSuffix = "f32" / "f64"


// Letters
UppercaseAToF = "A" / "B" / "C" / "D" / "E" / "F"
LowercaseAToF = "a" / "b" / "c" / "d" / "e" / "f"
Uppercase = UppercaseAToF / "G" / "H" / "I" / "J" / "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" / "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
Lowercase = LowercaseAToF / "g" / "h" / "i" / "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" / "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z"
Alphabet = Uppercase / Lowercase

// Digits
BinDigit = "0" / "1"
OctDigit = BinDigit / "2" / "3" / "4" / "5" / "6" / "7"
DecDigit = OctDigit / "8" / "9"
HexDigit = DecDigit / UppercaseAToF / LowercaseAToF

// Comment
LineComment = "//" (!("\n") .)*

IsolatedCR = "\r" !"\n" ()
// TODO: Maybe need EOF
EndOfLine = "\n" / "\r\n"
Space = " "
```

--->

## TODO

### Tasks
- into_first() in CST

### 
- Add { Original } in mplg
- Add functions that easy to get Variable from AST
- Add RowColSpan
- Create parser from MPLG file.
- Error Handling
- Packrat Parsing
- Left Recursion

## Next implementation
- [ ] Add functions that easy to get Variable from AST
- [ ] Can be Variable in Leaf Node
- [ ] Error Handling

## Practice
### Sequence
`A <- e1 e2`
```rust ignore
A = e1 e2 / f
```

### Choice
`A <- e1 / e2`
```rust ignore
A = e1 () / e2
```

### Zero or more
`A <- e*`
```rust ignore
A = e A / ()
```

### Not predicate
`A <- !e ?`
```rust ignore
A = B ? / f
B = e * / ()
```

## References

### Foundations of TDPL / PEG / packrat parsing

- Alexander Birman. *The TMG Recognition Schema.* PhD thesis,
  Princeton University, February 1970. (TDPL — the "one rule"
  formulation that MPL specialises.)
- Alfred V. Aho and Jeffrey D. Ullman. *The Theory of Parsing,
  Translation and Compiling — Vol. I: Parsing.* Prentice Hall, 1972.
- Bryan Ford. 2002. *Packrat parsing: a practical linear-time
  algorithm with backtracking.* PhD dissertation, MIT.
- Bryan Ford. 2004. *Parsing expression grammars: a recognition-based
  syntactic foundation.* POPL 2004, pp. 111–122.

### Algorithms used by mpl-macro's optimisations

- Kota Mizushima, Atusi Maeda, and Yoshinori Yamaguchi. 2010.
  *Packrat parsers can handle practical grammars in mostly constant
  space.* PASTE 2010. **— FIRST-set / cut analysis used by
  `cascade::analyze_char_classes` and `cut::compute_first_sets` to
  exclude LL(1)-disjoint rules from memoisation.**
- Alessandro Warth, James R. Douglass, and Todd Millstein. 2008.
  *Packrat parsers can support left recursion.* PEPM 2008.
  **— seed-growing iteration used in
  [`Parser::squirrel_parse`](https://docs.rs/mpl) and the
  `FastParse`-generated LR codegen.**
- Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014.
  *Left recursion in Parsing Expression Grammars.* Science of Computer
  Programming 96, pp. 177–190.
- Luke A. D. Hutchison. 2020. *Pika parsing: reformulating packrat
  parsing as a dynamic programming algorithm solves the left
  recursion and error recovery problems.* arXiv preprint
  [arXiv:2005.06444](https://arxiv.org/abs/2005.06444).
- Luke A. D. Hutchison. 2026. *Squirrel: a unified packrat parsing
  algorithm with linear-time guarantees, optimal error recovery, and
  full left-recursion support.* arXiv preprint
  [arXiv:2601.05012](https://arxiv.org/abs/2601.05012). **— the
  family of position-only seed-growing memoisation MPL's
  `squirrel_recognize` and the `FastParse` LR path implement.**

### Parser data-layout influences

- Adrian Sampson. *Flattening ASTs (and other compiler data
  structures).* 2023.
  [cs.cornell.edu/~asampson/blog/flattening.html](https://www.cs.cornell.edu/~asampson/blog/flattening.html)
  **— the flat `Vec<Token>` representation used in `mpl::fast`.**
- Aleksey Kladov ("matklad"). *Resilient LL parsing tutorial.* 2023.
  Background on lossy / lossless syntax tree design.
- Joshua Haberman ("Reverberate"). *Parsing Protobuf at 2+ GB/s: how I
  learned to love tail calls in C.* blog.reverberate.org, 2021.
  **— motivation for the explicit `loop {}` codegen in
  `cascade::is_tail_loop` (Rust's TCO is unreliable for
  `Result<u32, ()>`-shaped recursive returns).**

All algorithms above are implemented from the published descriptions;
no source code is copied. PEG / TDPL / packrat parsing and their
extensions are not patented to our knowledge.

## License

This project is licensed under either of:

- Apache License 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or
  <https://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT](LICENSE-MIT) or
  <https://opensource.org/licenses/MIT>)

at your option. All runtime dependencies (`bumpalo`, `proc-macro2`,
`quote`, `syn`) are dual-licensed `MIT OR Apache-2.0`, compatible with
both.