# Minimal Parsing Language (MPL)
[](https://crates.io/crates/mpl)
[](https://docs.rs/mpl)
[](https://github.com/kurotakazuki/mpl/actions/workflows/ci.yml)
A small parsing language for Rust built around a single rule form `A = B C / D`,
with a static-codegen backend (`FastParse`) that beats `pest`, `peg`, `nom`,
`winnow`, `chumsky`, and even `serde_json` on equal-work recognition benchmarks.
MPL is deliberately minimal: TDPL/GTDPL is equivalent to PEG in recognition power,
and one rule shape is enough to encode sequence, choice, recursion, repetition,
and negative predicates. The compile-time analyzer (`mpl-macro`) detects char-class
cascades, multi-prefix dispatch, tail loops, and direct left recursion in your
grammar, then generates one inlined Rust function per rule with the appropriate
fast path automatically.
## Highlights
- **Static code generation** via `#[derive(FastParse)]` reads your `.mplg` and emits one
function per rule — no `HashMap` rule lookup, no trait dispatch.
- **Six automatic optimizations** at codegen time:
- char-class cascade → `[bool; 256]` byte table
- multi-prefix cascade → `match input[pos]` first-byte dispatch
- `R = X R / ()` → `loop {}` (LLVM TCO is unreliable for `Result<u32,()>`)
- direct left recursion → Squirrel-style seed-growing, position-only memo
- Mizushima cut analysis: rules with disjoint FIRST sets are exempt from memoisation
- thin `CheckState` for recognition-only paths (peg-compatible calling convention)
- **Squirrel left recursion**: `R = R x / Y` works directly — no grammar rewriting,
no `precedence!` macro, no manual seed handling.
- **Beats every comparator** measured (criterion, sample size 100, `RUSTFLAGS=-C target-cpu=native`).
## Performance
All numbers below are recognition-only (`bool` return), apples-to-apples.
Lower is better; first-place per row in **bold**.
Benchmarks were re-run on 2026-05-30 with `RUSTFLAGS="-C target-cpu=native"`
after dependency refresh (`pest 2.8.6`, `pest_derive 2.8.6`, `peg 0.8.6`,
`winnow 1.0.3`, `nom 8.0.0`, `chumsky 0.13.0`, `criterion 0.8.2`).
### Parentheses (no backtracking, depth N)
| nested-1024 | **2.26 µs** | 8.70 µs | 8.68 µs | 7.50 µs | 93.8 µs | 23.7 µs |
| nested-4096 | **9.23 µs** | 35.6 µs | 33.5 µs | 20.8 µs | 389 µs | 93.2 µs |
| flat-1024 | **2.11 µs** | 4.37 µs | 4.76 µs | 4.92 µs | 96.7 µs | 17.8 µs |
### JSON (full subset with escapes)
`mpl/json.mplg` mirrors the JSON spec for whitespace, string escapes
(`\\n`, `\\t`, `\\\\`, `\\"`, etc.), objects, arrays, integers, and the
literals. peg and pest grammars match the same subset.
Each parser library produces a different default output shape, so we
stratify the comparison into three apples-to-apples tiers.
#### Tier 1: Recognition only (no parse tree)
| json-1024 (~85 KB) | **148 µs** | 261 µs | mpl 1.76× faster |
| canada-1024 | **18.2 µs** | 45.1 µs | mpl 2.48× faster |
| twitter-256 | **34.3 µs** | 62.3 µs | mpl 1.82× faster |
| citm-1024 | **87.8 µs** | 171.9 µs | mpl 1.96× faster |
#### Tier 2: Flat token / Pair queue construction
`mpl-fast-gen-tokens` is `fast_parse` (`Vec<Token>`); `pest` always
emits its `Pair` queue (`Vec<QueueableToken>`). Both produce navigable
token streams that the caller walks — same kind of work.
| json-1024 | **1.52 ms** | 1.78 ms | mpl 1.17× faster |
| canada-1024 | 406 µs | **241 µs** | pest 1.68× faster |
| twitter-256 | **362 µs** | 397 µs | mpl 1.10× faster |
| citm-1024 | **1.10 ms** | 1.15 ms | mpl 1.05× faster |
`pest` wins on `canada-1024` because mpl's single-rule TDPL form emits
a `Start`/`End` pair per variable per match — finer-grained than pest's
non-silent-rule queue. For the deeply-nested coordinate arrays in
canada this multiplies the token count. Every other shape mpl wins.
#### Tier 3: Typed AST construction
Only `serde_json` builds a typed `Value` tree directly — that is much
more work than recognition or flat tokens, so a fair direct comparison
would need an `mpl::Token` → `Value` walker (not yet shipped). For
reference, here is what each tier looks like on the same input:
| json-1024 | 148 µs | 1.52 ms | 637 µs |
| canada-1024 | 18.2 µs | 406 µs | 94.5 µs |
| twitter-256 | 34.3 µs | 362 µs | 125 µs |
| citm-1024 | 87.8 µs | 1.10 ms | 473 µs |
`serde_json` is the de-facto reference and is between the recognition
tier and the token tier — it builds nested `Vec<Value>` / `BTreeMap`
allocations directly, which is faster than mpl's per-variable Start/End
emission for these shapes but slower than pure recognition.
### What "AST output" means in mpl
mpl-fast-gen exposes three output tiers; the row labels in the bench
tables map to these directly.
| `fast_recognize_only(&[u8]) -> bool` | nothing — just a yes/no | Tier 1 |
| `fast_recognize(&[u8]) -> bool` | builds tokens internally; returns bool | Tier 2 (close to pest's `parse(...).is_ok()`) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ParseError>` | flat token buffer | Tier 2 (apples-to-apples with pest's `Pairs`) |
The flat token buffer is navigable as a typed AST view via
[`mpl::fast::ParseTree`] — `Pair`/`Leaf`/`NodeView`/`Children` provide
zero-copy iteration, span ranges, and parent/child traversal. This
matches pest's `Pairs` API.
Building a typed `Value`-style enum from the token buffer (e.g. for a
JSON deserializer) is a small additional walker that the caller writes;
the cost is added on top of `fast_parse`.
### Left-recursive (`LSum = LSum N / N`)
`peg` here uses `#[cache_left_rec]` (van Rossum seed-growing, the same family
as MPL's Squirrel pass).
| lsum-64 | **1.95 µs** | 2.28 µs | 28.8 µs |
| lsum-256 | **7.92 µs** | 9.04 µs | 118 µs |
| lsum-1024 | **30.0 µs** | 36.0 µs | 464 µs |
Across `lsum-4` → `lsum-1024`, `mpl-fast-gen` scales 4×-per-step (linear) just
like `peg`, and beats it by 1.04×–1.27× on every size measured.
Bench harness and grammars are in [examples/paren_bench/](examples/paren_bench/).
## Quick Start
`Cargo.toml`:
```toml
[dependencies]
mpl = "0.3.1"
mpl-macro = "0.2.1"
```
The fast path is `#[derive(FastParse)]`. Write a grammar file and derive:
```rust
use mpl_macro::FastParse;
#[derive(FastParse)]
#[mplg = "parens.mplg"]
pub struct ParenParser;
fn main() {
let p = ParenParser;
assert!(p.fast_recognize(b"(()(()))"));
assert!(!p.fast_recognize(b"(()"));
}
```
Where `parens.mplg`:
```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```
The derive emits four entry points on the parser struct:
| `fast_recognize_only(&[u8]) -> bool` | bool, no AST | hot recognition; thin `CheckState`, peg-compatible calling convention |
| `fast_recognize(&[u8]) -> bool` | bool, builds tokens internally | parity with `pest` (always materialises a token queue) |
| `fast_parse(&[u8]) -> Result<Vec<Token>, ()>` | flat token buffer | walk the parse tree |
| `fast_recognize_in(&[u8], &Bump) -> bool` | bool, externally-supplied arena | high-throughput pipelines that reset `Bump` between parses |
For an AST view over the flat token buffer, use [`mpl::fast::ParseTree`] —
it provides `Pair` / `Leaf` / `Children` iterators without any extra allocation.
## Three API tiers
| 1. Hand-rolled | `Parser::parse` (trait) | none — interprets `RightRule` data | full control over rules, slowest |
| 2. Legacy derive | `#[derive(Parse)]` | generates `Rules` HashMap impl | API stable since 0.2; same speed as tier 1 |
| 3. **Fast derive** | `#[derive(FastParse)]` | one static `fn` per rule + auto-optimisations | fastest path; recommended for new code |
Tiers 1 and 2 stay supported for back-compat. New code should use tier 3.
Migration: typically a one-line change from `#[derive(Parse)]` to `#[derive(FastParse)]`,
plus replacing `parser.parse(...)` with `parser.fast_parse(...)`.
## How the codegen earns its speed
The `FastParse` derive runs four compile-time analyses against your grammar
and picks the cheapest valid lowering for each rule. Concretely, for the
included BubFns (99-rule arithmetic) grammar:
- `OrOrExpr = AndAndExpr OrOrExpr1 / AndAndExpr` and `Power = Atom PowerAndFactor / Atom` →
flagged as needing memoisation by Mizushima cut analysis (FIRST sets overlap).
Other 92 rules are LL(1)-disjoint and skip the cache entirely.
- `Variable = UppercaseX () / Variable1` and `DecDigit = '0' () / DecDigit1` cascades →
collapsed into `[bool; 256]` byte tables.
- `Atom = ExprInParens / FloatLiteral / IntegerLiteral / Function / Variable / Constant` →
first-byte dispatch (single `match input[pos]`).
- `ZeroOrMoreDecDigits = DecDigit ZeroOrMoreDecDigits / ()` →
`while let Ok(p) = parse_dec_digit::<false>(state, p) { ... }` (no recursion).
Recognition-only callers go through a thin `CheckState` (input + furthest pos
only); emit-mode callers go through `ParserState` and produce flat `Token`
buffers (Sampson-style — no nested boxes).
## Left recursion: `Squirrel` algorithm
When a rule has the shape `R = R X / Y`, `FastParse` automatically routes
recognition through Squirrel-style seed-growing memoisation. No grammar
rewriting needed — the algorithm is invoked transparently.
```text
LSum = LSum N / N
N = { Char('1') } () / N1
...
```
Generated code holds a position-only `(rule_id, pos) -> Option<u32>` memo
per parse and iterates seed expansion until convergence. This matches peg's
`#[cache_left_rec]` family and was the basis on which we beat peg on
`lsum-256` and `lsum-1024`.
## Legacy APIs (tier 1 & 2)
The original 0.2 APIs remain supported. Pick one only if you need
features `FastParse` does not yet cover (custom `Output` type integration,
runtime-built grammars, or non-byte input).
### Tier 2: legacy `#[derive(Parse)]`
Generates a `Rules` HashMap impl plus the `Parser` trait method calls.
Identical perf characteristics to tier 1; identical AST shape.
`parentheses.mplg`:
```text
Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f
```
`main.rs`:
```rust
use mpl::parser::Parser;
use mpl::span::StartAndLenSpan;
use mpl::trees::AST;
use mpl_macro::Parse;
#[derive(Parse, Debug)]
#[mplg = "parentheses.mplg"]
pub struct ParenParser;
fn main() {
let parser = ParenParser;
let input = b"(()(()))";
let all_of_the_span =
StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);
let result: Result<
AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
> = parser.parse(input, &ParenRules, &ParenVariable::Open, &all_of_the_span);
assert!(result.is_ok());
}
```
The derive strips a trailing `Parser` from the struct name and uses the
remaining stem (here `Paren`) to generate `ParenVariable` and `ParenRules`.
### Tier 1: build a parser in Rust
If you want the same one-rule model with complete control over rules and outputs, define them directly in Rust.
```rust
use crate::ParenthesesVariable::*;
use mpl::parser::Parser;
use mpl::rules::{RightRule, RightRuleKind::*, Rules};
use mpl::span::{StartAndLenSpan, Start, Len};
use mpl::output::Output;
use mpl::symbols::{StrTerminal, StrTerminal::*, Variable};
use mpl::trees::AST;
use std::collections::HashMap;
#[derive(Clone, Debug, Hash, Eq, PartialEq)]
enum ParenthesesVariable {
Open,
Parentheses,
Close,
}
impl Variable for ParenthesesVariable {}
struct ParenthesesParser;
impl<'i, V, P, L, R, O> Parser<'i, str, StrTerminal<'i>, V, StartAndLenSpan<P, L>, P, R, O>
for ParenthesesParser
where
V: Variable,
P: Start<str, L>,
L: Len<str, P>,
R: Rules<StrTerminal<'i>, V>,
O: Output<'i, str, V, StartAndLenSpan<P, L>>,
{
}
/// ```
/// Open = '(' Parentheses / ()
/// Parentheses = Open Close / f
/// Close = ")" Open / f
/// ```
fn main() {
let parser = ParenthesesParser;
let mut rules = HashMap::new();
rules.insert(
Open,
RightRule::from_right_rule_kind((T(Char('(')), V(Parentheses)), Empty),
);
rules.insert(
Parentheses,
RightRule::from_right_rule_kind((V(Open), V(Close)), Failure),
);
rules.insert(
Close,
RightRule::from_right_rule_kind((T(Str(")")), V(Open)), Failure),
);
let input = "(()(()))";
// all of the span
let all_of_the_span = StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);
let result: Result<
AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
> = parser.parse(input, &rules, &Open, &all_of_the_span);
if let Ok(ast) = result {
println!("{}", ast);
}
}
```
## Examples
- [Parentheses](examples/paren)
## MPL
### Definition of MPL grammar
A MPL grammar `G` is a tuple `G = (V, Σ, R, S)` in which:
- `V` is a finite set of variables.
- `Σ` is a finite set of original terminal symbols.
- `T` is an union of `Σ` or `M` (Σ ∪ M) (`M` (= {(), f}) is a finite set of metasymbols).
- `R` is a finite set of rules of the form
- `A = B C / D`
A in V (A ∈ V),
B, C, D in E (E = T ∪ V) (T ∩ V = ∅) (B, C, D ∈ E).
For any variable A there is exactly one rule with A to the left of `=`.
- S in V (S ∈ V) is the start variable.
#### Empty
`()` is a metasymbol that always succeeds without consuming input.
```rust ignore
Empty = () () / ()
```
#### Failure
`f` is a metasymbol that always fails without consuming input.
```rust ignore
Failure = f f / f
```
### Extended MPL
Since one of the goals of MPL is to create an AST, it also supports two features in terms of ease of use and speed.
#### Any
`?` is a metasymbol representing any single input like wildcard character. This succeeds if there is any input left, and fails if there is no input left.
```rust ignore
Any = ? () / f
```
To extend the difinition of MPL grammar, let ? ∈ M.
#### All
`*` is a metasymbol representing All remaining input like wildcard character. This will succeed even if the remaining inputs are zero.
```rust ignore
All = * () / f
```
Same as `All = ? All / ()`.
To extend the difinition of MPL grammar, let * ∈ M.
## Difference between TDPL and MPL
The core design choice in MPL is to reduce the grammar to one rule form. TDPL has two rule forms, and PEGs were later shown to be reducible to TDPL/GTDPL with equivalent effective recognition power.
> `A..BC/D`, A,B,C,D in V.
> `A..a`, a in ∑ ∪ {ε, f}, f is a metasymbol not in ∑ and ε is the null string.
MPL, on the other hand, keeps everything in a single form:
> `A = B C / D`
That is the main simplification and the main selling point of the language: MPL aims to keep TDPL/PEG-class expressive power in a much smaller surface syntax.
## MPLG (MPL Grammar) syntax
### In MPL grammar
```rust ignore
// Hierarchical syntax
Mplg = ZeroOrMoreLines () / f
ZeroOrMoreLines = Line ZeroOrMoreLines / ()
Line = Line1 EndOfLine / f
Line1 = LineComment () / Line2
Line2 = Rule () / ()
Rule = Variable Rule1 / f
Rule1 = " = " Rule2 / f
Rule2 = E Rule3 / f
Rule3 = Space Rule4 / f
Rule4 = E Rule5 / f
Rule5 = " / " Rule6 / f
Rule6 = E () / f
E = TerminalSymbol () / Variable
// Lexical syntax
// Variable
Variable = Identifier () / f
// Terminal symbol
TerminalSymbol = MetasymbolLiteral () / OriginalSymbolExpr
// Expr
Expr = ExprWithoutBlock () / f
// Without Block
ExprWithoutBlock = LiteralExpr () / ExprWithoutBlock1
ExprWithoutBlock1 = StructExpr () / f
// Struct
StructExpr = StructExprStruct () / StructExpr1
StructExpr1 = StructExprTuple () / StructExprUnit
StructExprStruct = f f / f
StructExprTuple = PathInExpr StructExprTuple1 / f
StructExprTuple1 = '(' StructExprTuple2 / f
StructExprTuple2 = ZeroOrMoreExpr ')' / f
ZeroOrMoreExpr = Expr () / f
StructExprUnit = PathInExpr () / f
// PathInExpr
PathInExpr = ZeroOrOneDoubleColon OneOrMorePathExprSegment / f
ZeroOrOneDoubleColon = "::" () / ()
OneOrMorePathExprSegment = PathExprSegment () / f
PathExprSegment = PathIdentSegment PathExprSegment1 / f
PathExprSegment1 = "::" GenericArgs / ()
PathIdentSegment = Identifier () / f
GenericArgs = f f / f
// Literal
LiteralExpr = CharLiteral () / LiteralExpr1
LiteralExpr1 = StringLiteral () / LiteralExpr2
LiteralExpr2 = IntegerLiteral () / f
// Metasymbol
MetasymbolLiteral = EmptyLiteral () / MetasymbolLiteral1
MetasymbolLiteral1 = FailureLiteral () / MetasymbolLiteral2
MetasymbolLiteral2 = AnyLiteral () / MetasymbolLiteral3
MetasymbolLiteral3 = AllLiteral () / f
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' ZeroOrMoreAny / f
ZeroOrMoreAny = '?' ZeroOrMoreAny / ()
AllLiteral = '*' () / f
// Original symbol
OriginalSymbolExpr = "{ " OriginalSymbolExpr1 / f
OriginalSymbolExpr1 = ExprWithoutBlock " }" / f
// Char
CharLiteral = '\'' CharLiteral1 / f
CharLiteral1 = InnerCharLiteral '\'' / f
InnerCharLiteral = NotCharLetter InnerCharLiteral1 / f
NotCharLetter = '\'' * / ()
InnerCharLiteral1 = QuoteEscape () / ?
// String
StringLiteral = '"' StringLiteral1 / f
StringLiteral1 = InnerStringLiteral '"' / f
InnerStringLiteral = InnerStringLiteralLetter InnerStringLiteral / ()
// InnerStringLiteralLetter
InnerStringLiteralLetter = NotStringLetter InnerStringLiteralLetter1 / f
NotStringLetter = '"' * / ()
InnerStringLiteralLetter1 = QuoteEscape () / ?
// Integer
IntegerLiteral = IntegerLiterals () / f
IntegerLiterals = DecLiteral () / f
DecLiteral = DecDigit ZeroOrMoreDecDigit / f
ZeroOrMoreDecDigit = DecDigitOrUnderscore ZeroOrMoreDecDigit / ()
DecDigitOrUnderscore = DecDigit () / '_'
// IDENTIFIER
Identifier = Uppercase ZeroOrMoreIdentifierContinue / f
ZeroOrMoreIdentifierContinue = IdentifierContinue ZeroOrMoreIdentifierContinue / ()
IdentifierContinue = Alphabet () / DecDigit
// Letters
Alphabet = Lowercase () / Uppercase
// Lowercase
LowercaseAToF = LowercaseAToF1 () / f
LowercaseAToF1 = 'a' () / LowercaseAToF2
LowercaseAToF2 = 'b' () / LowercaseAToF3
LowercaseAToF3 = 'c' () / LowercaseAToF4
LowercaseAToF4 = 'd' () / LowercaseAToF5
LowercaseAToF5 = 'e' () / LowercaseAToF6
LowercaseAToF6 = 'f' () / f
Lowercase = LowercaseAToF () / Lowercase1
Lowercase1 = 'g' () / Lowercase2
Lowercase2 = 'h' () / Lowercase3
Lowercase3 = 'i' () / Lowercase4
Lowercase4 = 'j' () / Lowercase5
Lowercase5 = 'k' () / Lowercase6
Lowercase6 = 'l' () / Lowercase7
Lowercase7 = 'm' () / Lowercase8
Lowercase8 = 'n' () / Lowercase9
Lowercase9 = 'o' () / Lowercase10
Lowercase10 = 'p' () / Lowercase11
Lowercase11 = 'q' () / Lowercase12
Lowercase12 = 'r' () / Lowercase13
Lowercase13 = 's' () / Lowercase14
Lowercase14 = 't' () / Lowercase15
Lowercase15 = 'u' () / Lowercase16
Lowercase16 = 'v' () / Lowercase17
Lowercase17 = 'w' () / Lowercase18
Lowercase18 = 'x' () / Lowercase19
Lowercase19 = 'y' () / Lowercase20
Lowercase20 = 'z' () / f
// Uppercase
UppercaseAToF = UppercaseAToF1 () / f
UppercaseAToF1 = 'A' () / UppercaseAToF2
UppercaseAToF2 = 'B' () / UppercaseAToF3
UppercaseAToF3 = 'C' () / UppercaseAToF4
UppercaseAToF4 = 'D' () / UppercaseAToF5
UppercaseAToF5 = 'E' () / UppercaseAToF6
UppercaseAToF6 = 'F' () / f
Uppercase = UppercaseAToF () / Uppercase1
Uppercase1 = 'G' () / Uppercase2
Uppercase2 = 'H' () / Uppercase3
Uppercase3 = 'I' () / Uppercase4
Uppercase4 = 'J' () / Uppercase5
Uppercase5 = 'K' () / Uppercase6
Uppercase6 = 'L' () / Uppercase7
Uppercase7 = 'M' () / Uppercase8
Uppercase8 = 'N' () / Uppercase9
Uppercase9 = 'O' () / Uppercase10
Uppercase10 = 'P' () / Uppercase11
Uppercase11 = 'Q' () / Uppercase12
Uppercase12 = 'R' () / Uppercase13
Uppercase13 = 'S' () / Uppercase14
Uppercase14 = 'T' () / Uppercase15
Uppercase15 = 'U' () / Uppercase16
Uppercase16 = 'V' () / Uppercase17
Uppercase17 = 'W' () / Uppercase18
Uppercase18 = 'X' () / Uppercase19
Uppercase19 = 'Y' () / Uppercase20
Uppercase20 = 'Z' () / f
QuoteEscape = "\\'" () / "\\\""
EndOfLine = "\r\n" () / '\n'
Space = ' ' () / f
// Digits
BinDigit = "0" () / "1"
OctDigit = BinDigit () / OctDigit1
OctDigit1 = "2" () / OctDigit2
OctDigit2 = "3" () / OctDigit3
OctDigit3 = "4" () / OctDigit4
OctDigit4 = "5" () / OctDigit5
OctDigit5 = "6" () / OctDigit6
OctDigit6 = "7" () / f
DecDigit = OctDigit () / DecDigit1
DecDigit1 = "8" () / DecDigit2
DecDigit2 = "9" () / f
// Comment
LineComment = "//" InnerLineComment / f
InnerLineComment = AnyExceptLF InnerLineComment / ()
AnyExceptLF = AnyExceptLF1 ? / f
AnyExceptLF1 = EndOfLine * / ()
```
## TODO
### Tasks
- into_first() in CST
###
- Add { Original } in mplg
- Add functions that easy to get Variable from AST
- Add RowColSpan
- Create parser from MPLG file.
- Error Handling
- Packrat Parsing
- Left Recursion
## Next implementation
- [ ] Add functions that easy to get Variable from AST
- [ ] Can be Variable in Leaf Node
- [ ] Error Handling
## Practice
### Sequence
`A <- e1 e2`
```rust ignore
A = e1 e2 / f
```
### Choice
`A <- e1 / e2`
```rust ignore
A = e1 () / e2
```
### Zero or more
`A <- e*`
```rust ignore
A = e A / ()
```
### Not predicate
`A <- !e ?`
```rust ignore
A = B ? / f
B = e * / ()
```
## References
### Foundations of TDPL / PEG / packrat parsing
- Alexander Birman. *The TMG Recognition Schema.* PhD thesis,
Princeton University, February 1970. (TDPL — the "one rule"
formulation that MPL specialises.)
- Alfred V. Aho and Jeffrey D. Ullman. *The Theory of Parsing,
Translation and Compiling — Vol. I: Parsing.* Prentice Hall, 1972.
- Bryan Ford. 2002. *Packrat parsing: a practical linear-time
algorithm with backtracking.* PhD dissertation, MIT.
- Bryan Ford. 2004. *Parsing expression grammars: a recognition-based
syntactic foundation.* POPL 2004, pp. 111–122.
### Algorithms used by mpl-macro's optimisations
- Kota Mizushima, Atusi Maeda, and Yoshinori Yamaguchi. 2010.
*Packrat parsers can handle practical grammars in mostly constant
space.* PASTE 2010. **— FIRST-set / cut analysis used by
`cascade::analyze_char_classes` and `cut::compute_first_sets` to
exclude LL(1)-disjoint rules from memoisation.**
- Alessandro Warth, James R. Douglass, and Todd Millstein. 2008.
*Packrat parsers can support left recursion.* PEPM 2008.
**— seed-growing iteration used in
[`Parser::squirrel_parse`](https://docs.rs/mpl) and the
`FastParse`-generated LR codegen.**
- Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014.
*Left recursion in Parsing Expression Grammars.* Science of Computer
Programming 96, pp. 177–190.
- Luke A. D. Hutchison. 2020. *Pika parsing: reformulating packrat
parsing as a dynamic programming algorithm solves the left
recursion and error recovery problems.* arXiv preprint
[arXiv:2005.06444](https://arxiv.org/abs/2005.06444).
- Luke A. D. Hutchison. 2026. *Squirrel: a unified packrat parsing
algorithm with linear-time guarantees, optimal error recovery, and
full left-recursion support.* arXiv preprint
[arXiv:2601.05012](https://arxiv.org/abs/2601.05012). **— the
family of position-only seed-growing memoisation MPL's
`squirrel_recognize` and the `FastParse` LR path implement.**
### Parser data-layout influences
- Adrian Sampson. *Flattening ASTs (and other compiler data
structures).* 2023.
[cs.cornell.edu/~asampson/blog/flattening.html](https://www.cs.cornell.edu/~asampson/blog/flattening.html)
**— the flat `Vec<Token>` representation used in `mpl::fast`.**
- Aleksey Kladov ("matklad"). *Resilient LL parsing tutorial.* 2023.
Background on lossy / lossless syntax tree design.
- Joshua Haberman ("Reverberate"). *Parsing Protobuf at 2+ GB/s: how I
learned to love tail calls in C.* blog.reverberate.org, 2021.
**— motivation for the explicit `loop {}` codegen in
`cascade::is_tail_loop` (Rust's TCO is unreliable for
`Result<u32, ()>`-shaped recursive returns).**
All algorithms above are implemented from the published descriptions;
no source code is copied. PEG / TDPL / packrat parsing and their
extensions are not patented to our knowledge.
## License
This project is licensed under either of:
- Apache License 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or
<https://www.apache.org/licenses/LICENSE-2.0>)
- MIT License ([LICENSE-MIT](LICENSE-MIT) or
<https://opensource.org/licenses/MIT>)
at your option. All runtime dependencies (`bumpalo`, `proc-macro2`,
`quote`, `syn`) are dual-licensed `MIT OR Apache-2.0`, compatible with
both.