mpl 0.3.1

One-rule TDPL/PEG parsing language with a static-codegen backend (FastParse) that beats pest, peg, nom, winnow, and chumsky on equal-work benchmarks.
Documentation

Minimal Parsing Language (MPL)

Crate API CI

A small parsing language for Rust built around a single rule form A = B C / D, with a static-codegen backend (FastParse) that beats pest, peg, nom, winnow, chumsky, and even serde_json on equal-work recognition benchmarks.

MPL is deliberately minimal: TDPL/GTDPL is equivalent to PEG in recognition power, and one rule shape is enough to encode sequence, choice, recursion, repetition, and negative predicates. The compile-time analyzer (mpl-macro) detects char-class cascades, multi-prefix dispatch, tail loops, and direct left recursion in your grammar, then generates one inlined Rust function per rule with the appropriate fast path automatically.

Highlights

  • Static code generation via #[derive(FastParse)] reads your .mplg and emits one function per rule — no HashMap rule lookup, no trait dispatch.
  • Six automatic optimizations at codegen time:
    • char-class cascade → [bool; 256] byte table
    • multi-prefix cascade → match input[pos] first-byte dispatch
    • R = X R / ()loop {} (LLVM TCO is unreliable for Result<u32,()>)
    • direct left recursion → Squirrel-style seed-growing, position-only memo
    • Mizushima cut analysis: rules with disjoint FIRST sets are exempt from memoisation
    • thin CheckState for recognition-only paths (peg-compatible calling convention)
  • Squirrel left recursion: R = R x / Y works directly — no grammar rewriting, no precedence! macro, no manual seed handling.
  • Beats every comparator measured (criterion, sample size 100, RUSTFLAGS=-C target-cpu=native).

Performance

All numbers below are recognition-only (bool return), apples-to-apples. Lower is better; first-place per row in bold.

Benchmarks were re-run on 2026-05-30 with RUSTFLAGS="-C target-cpu=native" after dependency refresh (pest 2.8.6, pest_derive 2.8.6, peg 0.8.6, winnow 1.0.3, nom 8.0.0, chumsky 0.13.0, criterion 0.8.2).

Parentheses (no backtracking, depth N)

Size mpl-fast-check peg winnow nom pest chumsky
nested-1024 2.26 µs 8.70 µs 8.68 µs 7.50 µs 93.8 µs 23.7 µs
nested-4096 9.23 µs 35.6 µs 33.5 µs 20.8 µs 389 µs 93.2 µs
flat-1024 2.11 µs 4.37 µs 4.76 µs 4.92 µs 96.7 µs 17.8 µs

JSON (full subset with escapes)

mpl/json.mplg mirrors the JSON spec for whitespace, string escapes (\\n, \\t, \\\\, \\", etc.), objects, arrays, integers, and the literals. peg and pest grammars match the same subset.

Each parser library produces a different default output shape, so we stratify the comparison into three apples-to-apples tiers.

Tier 1: Recognition only (no parse tree)

Shape (~ size) mpl-fast-gen peg () mpl/peg
json-1024 (~85 KB) 148 µs 261 µs mpl 1.76× faster
canada-1024 18.2 µs 45.1 µs mpl 2.48× faster
twitter-256 34.3 µs 62.3 µs mpl 1.82× faster
citm-1024 87.8 µs 171.9 µs mpl 1.96× faster

Tier 2: Flat token / Pair queue construction

mpl-fast-gen-tokens is fast_parse (Vec<Token>); pest always emits its Pair queue (Vec<QueueableToken>). Both produce navigable token streams that the caller walks — same kind of work.

Shape mpl-fast-gen-tokens pest mpl/pest
json-1024 1.52 ms 1.78 ms mpl 1.17× faster
canada-1024 406 µs 241 µs pest 1.68× faster
twitter-256 362 µs 397 µs mpl 1.10× faster
citm-1024 1.10 ms 1.15 ms mpl 1.05× faster

pest wins on canada-1024 because mpl's single-rule TDPL form emits a Start/End pair per variable per match — finer-grained than pest's non-silent-rule queue. For the deeply-nested coordinate arrays in canada this multiplies the token count. Every other shape mpl wins.

Tier 3: Typed AST construction

Only serde_json builds a typed Value tree directly — that is much more work than recognition or flat tokens, so a fair direct comparison would need an mpl::TokenValue walker (not yet shipped). For reference, here is what each tier looks like on the same input:

Shape mpl-fast-gen-recognize mpl-fast-gen-tokens serde_json (Value)
json-1024 148 µs 1.52 ms 637 µs
canada-1024 18.2 µs 406 µs 94.5 µs
twitter-256 34.3 µs 362 µs 125 µs
citm-1024 87.8 µs 1.10 ms 473 µs

serde_json is the de-facto reference and is between the recognition tier and the token tier — it builds nested Vec<Value> / BTreeMap allocations directly, which is faster than mpl's per-variable Start/End emission for these shapes but slower than pure recognition.

What "AST output" means in mpl

mpl-fast-gen exposes three output tiers; the row labels in the bench tables map to these directly.

Method Output Tier
fast_recognize_only(&[u8]) -> bool nothing — just a yes/no Tier 1
fast_recognize(&[u8]) -> bool builds tokens internally; returns bool Tier 2 (close to pest's parse(...).is_ok())
fast_parse(&[u8]) -> Result<Vec<Token>, ParseError> flat token buffer Tier 2 (apples-to-apples with pest's Pairs)

The flat token buffer is navigable as a typed AST view via [mpl::fast::ParseTree] — Pair/Leaf/NodeView/Children provide zero-copy iteration, span ranges, and parent/child traversal. This matches pest's Pairs API.

Building a typed Value-style enum from the token buffer (e.g. for a JSON deserializer) is a small additional walker that the caller writes; the cost is added on top of fast_parse.

Left-recursive (LSum = LSum N / N)

peg here uses #[cache_left_rec] (van Rossum seed-growing, the same family as MPL's Squirrel pass).

Size mpl-fast-gen peg cache-lr mpl-squirrel-recognize (trait)
lsum-64 1.95 µs 2.28 µs 28.8 µs
lsum-256 7.92 µs 9.04 µs 118 µs
lsum-1024 30.0 µs 36.0 µs 464 µs

Across lsum-4lsum-1024, mpl-fast-gen scales 4×-per-step (linear) just like peg, and beats it by 1.04×–1.27× on every size measured.

Bench harness and grammars are in examples/paren_bench/.

Quick Start

Cargo.toml:

[dependencies]
mpl = "0.3.1"
mpl-macro = "0.2.1"

The fast path is #[derive(FastParse)]. Write a grammar file and derive:

use mpl_macro::FastParse;

#[derive(FastParse)]
#[mplg = "parens.mplg"]
pub struct ParenParser;

fn main() {
    let p = ParenParser;
    assert!(p.fast_recognize(b"(()(()))"));
    assert!(!p.fast_recognize(b"(()"));
}

Where parens.mplg:

Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f

The derive emits four entry points on the parser struct:

Method What it returns When to use
fast_recognize_only(&[u8]) -> bool bool, no AST hot recognition; thin CheckState, peg-compatible calling convention
fast_recognize(&[u8]) -> bool bool, builds tokens internally parity with pest (always materialises a token queue)
fast_parse(&[u8]) -> Result<Vec<Token>, ()> flat token buffer walk the parse tree
fast_recognize_in(&[u8], &Bump) -> bool bool, externally-supplied arena high-throughput pipelines that reset Bump between parses

For an AST view over the flat token buffer, use [mpl::fast::ParseTree] — it provides Pair / Leaf / Children iterators without any extra allocation.

Three API tiers

Tier API Codegen Tradeoff
1. Hand-rolled Parser::parse (trait) none — interprets RightRule data full control over rules, slowest
2. Legacy derive #[derive(Parse)] generates Rules HashMap impl API stable since 0.2; same speed as tier 1
3. Fast derive #[derive(FastParse)] one static fn per rule + auto-optimisations fastest path; recommended for new code

Tiers 1 and 2 stay supported for back-compat. New code should use tier 3. Migration: typically a one-line change from #[derive(Parse)] to #[derive(FastParse)], plus replacing parser.parse(...) with parser.fast_parse(...).

How the codegen earns its speed

The FastParse derive runs four compile-time analyses against your grammar and picks the cheapest valid lowering for each rule. Concretely, for the included BubFns (99-rule arithmetic) grammar:

  • OrOrExpr = AndAndExpr OrOrExpr1 / AndAndExpr and Power = Atom PowerAndFactor / Atom → flagged as needing memoisation by Mizushima cut analysis (FIRST sets overlap). Other 92 rules are LL(1)-disjoint and skip the cache entirely.
  • Variable = UppercaseX () / Variable1 and DecDigit = '0' () / DecDigit1 cascades → collapsed into [bool; 256] byte tables.
  • Atom = ExprInParens / FloatLiteral / IntegerLiteral / Function / Variable / Constant → first-byte dispatch (single match input[pos]).
  • ZeroOrMoreDecDigits = DecDigit ZeroOrMoreDecDigits / ()while let Ok(p) = parse_dec_digit::<false>(state, p) { ... } (no recursion).

Recognition-only callers go through a thin CheckState (input + furthest pos only); emit-mode callers go through ParserState and produce flat Token buffers (Sampson-style — no nested boxes).

Left recursion: Squirrel algorithm

When a rule has the shape R = R X / Y, FastParse automatically routes recognition through Squirrel-style seed-growing memoisation. No grammar rewriting needed — the algorithm is invoked transparently.

LSum = LSum N / N
N = { Char('1') } () / N1
...

Generated code holds a position-only (rule_id, pos) -> Option<u32> memo per parse and iterates seed expansion until convergence. This matches peg's #[cache_left_rec] family and was the basis on which we beat peg on lsum-256 and lsum-1024.

Legacy APIs (tier 1 & 2)

The original 0.2 APIs remain supported. Pick one only if you need features FastParse does not yet cover (custom Output type integration, runtime-built grammars, or non-byte input).

Tier 2: legacy #[derive(Parse)]

Generates a Rules HashMap impl plus the Parser trait method calls. Identical perf characteristics to tier 1; identical AST shape.

parentheses.mplg:

Open = { Char('(') } Parentheses / ()
Parentheses = Open Close / f
Close = { Char(')') } Open / f

main.rs:

use mpl::parser::Parser;
use mpl::span::StartAndLenSpan;
use mpl::trees::AST;
use mpl_macro::Parse;

#[derive(Parse, Debug)]
#[mplg = "parentheses.mplg"]
pub struct ParenParser;

fn main() {
    let parser = ParenParser;
    let input = b"(()(()))";

    let all_of_the_span =
        StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);
    let result: Result<
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &ParenRules, &ParenVariable::Open, &all_of_the_span);

    assert!(result.is_ok());
}

The derive strips a trailing Parser from the struct name and uses the remaining stem (here Paren) to generate ParenVariable and ParenRules.

Tier 1: build a parser in Rust

If you want the same one-rule model with complete control over rules and outputs, define them directly in Rust.

use crate::ParenthesesVariable::*;
use mpl::parser::Parser;
use mpl::rules::{RightRule, RightRuleKind::*, Rules};
use mpl::span::{StartAndLenSpan, Start, Len};
use mpl::output::Output;
use mpl::symbols::{StrTerminal, StrTerminal::*, Variable};
use mpl::trees::AST;
use std::collections::HashMap;

#[derive(Clone, Debug, Hash, Eq, PartialEq)]
enum ParenthesesVariable {
    Open,
    Parentheses,
    Close,
}

impl Variable for ParenthesesVariable {}

struct ParenthesesParser;

impl<'i, V, P, L, R, O> Parser<'i, str, StrTerminal<'i>, V, StartAndLenSpan<P, L>, P, R, O>
    for ParenthesesParser
where
    V: Variable,
    P: Start<str, L>,
    L: Len<str, P>,
    R: Rules<StrTerminal<'i>, V>,
    O: Output<'i, str, V, StartAndLenSpan<P, L>>,
{
}

/// ```
/// Open = '(' Parentheses / ()
/// Parentheses = Open Close / f
/// Close = ")" Open / f
/// ```
fn main() {
    let parser = ParenthesesParser;
    let mut rules = HashMap::new();

    rules.insert(
        Open,
        RightRule::from_right_rule_kind((T(Char('(')), V(Parentheses)), Empty),
    );
    rules.insert(
        Parentheses,
        RightRule::from_right_rule_kind((V(Open), V(Close)), Failure),
    );
    rules.insert(
        Close,
        RightRule::from_right_rule_kind((T(Str(")")), V(Open)), Failure),
    );

    let input = "(()(()))";

    // all of the span
    let all_of_the_span = StartAndLenSpan::<u32, u16>::from_start_len(0, input.len() as u16);

    let result: Result<
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
        AST<ParenthesesVariable, StartAndLenSpan<u32, u16>, ()>,
    > = parser.parse(input, &rules, &Open, &all_of_the_span);

    if let Ok(ast) = result {
        println!("{}", ast);
    }
}

Examples

MPL

Definition of MPL grammar

A MPL grammar G is a tuple G = (V, Σ, R, S) in which:

  • V is a finite set of variables.
  • Σ is a finite set of original terminal symbols.
  • T is an union of Σ or M (Σ ∪ M) (M (= {(), f}) is a finite set of metasymbols).
  • R is a finite set of rules of the form
    • A = B C / D
      A in V (A ∈ V),
      B, C, D in E (E = T ∪ V) (T ∩ V = ∅) (B, C, D ∈ E).
      For any variable A there is exactly one rule with A to the left of =.
  • S in V (S ∈ V) is the start variable.

Empty

() is a metasymbol that always succeeds without consuming input.

Empty = () () / ()

Failure

f is a metasymbol that always fails without consuming input.

Failure = f f / f

Extended MPL

Since one of the goals of MPL is to create an AST, it also supports two features in terms of ease of use and speed.

Any

? is a metasymbol representing any single input like wildcard character. This succeeds if there is any input left, and fails if there is no input left.

Any = ? () / f

To extend the difinition of MPL grammar, let ? ∈ M.

All

* is a metasymbol representing All remaining input like wildcard character. This will succeed even if the remaining inputs are zero.

All = * () / f

Same as All = ? All / ().

To extend the difinition of MPL grammar, let * ∈ M.

Difference between TDPL and MPL

The core design choice in MPL is to reduce the grammar to one rule form. TDPL has two rule forms, and PEGs were later shown to be reducible to TDPL/GTDPL with equivalent effective recognition power.

A..BC/D, A,B,C,D in V.
A..a, a in ∑ ∪ {ε, f}, f is a metasymbol not in ∑ and ε is the null string.

MPL, on the other hand, keeps everything in a single form:

A = B C / D

That is the main simplification and the main selling point of the language: MPL aims to keep TDPL/PEG-class expressive power in a much smaller surface syntax.

MPLG (MPL Grammar) syntax

In MPL grammar

// Hierarchical syntax
Mplg = ZeroOrMoreLines () / f
ZeroOrMoreLines = Line ZeroOrMoreLines / ()

Line = Line1 EndOfLine / f
Line1 = LineComment () / Line2
Line2 = Rule () / ()

Rule = Variable Rule1 / f
Rule1 = " = " Rule2 / f
Rule2 = E Rule3 / f
Rule3 = Space Rule4 / f
Rule4 = E Rule5 / f
Rule5 = " / " Rule6 / f
Rule6 = E () / f
E = TerminalSymbol () / Variable


// Lexical syntax
// Variable
Variable = Identifier () / f

// Terminal symbol
TerminalSymbol = MetasymbolLiteral () / OriginalSymbolExpr

// Expr
Expr = ExprWithoutBlock () / f

// Without Block
ExprWithoutBlock = LiteralExpr () / ExprWithoutBlock1
ExprWithoutBlock1 = StructExpr () / f

// Struct
StructExpr = StructExprStruct () / StructExpr1
StructExpr1 = StructExprTuple () / StructExprUnit

StructExprStruct = f f / f

StructExprTuple = PathInExpr StructExprTuple1 / f
StructExprTuple1 = '(' StructExprTuple2 / f
StructExprTuple2 = ZeroOrMoreExpr ')' / f
ZeroOrMoreExpr = Expr () / f

StructExprUnit = PathInExpr () / f

// PathInExpr
PathInExpr = ZeroOrOneDoubleColon OneOrMorePathExprSegment / f
ZeroOrOneDoubleColon = "::" () / ()
OneOrMorePathExprSegment = PathExprSegment () / f

PathExprSegment = PathIdentSegment PathExprSegment1 / f
PathExprSegment1 = "::" GenericArgs / ()

PathIdentSegment = Identifier () / f

GenericArgs = f f / f

// Literal
LiteralExpr = CharLiteral () / LiteralExpr1
LiteralExpr1 = StringLiteral () / LiteralExpr2
LiteralExpr2 = IntegerLiteral () / f

// Metasymbol
MetasymbolLiteral = EmptyLiteral () / MetasymbolLiteral1
MetasymbolLiteral1 = FailureLiteral () / MetasymbolLiteral2
MetasymbolLiteral2 = AnyLiteral () / MetasymbolLiteral3
MetasymbolLiteral3 = AllLiteral () / f
EmptyLiteral = "()" () / f
FailureLiteral = 'f' () / f
AnyLiteral = '?' ZeroOrMoreAny / f
ZeroOrMoreAny = '?' ZeroOrMoreAny / ()
AllLiteral = '*' () / f

// Original symbol
OriginalSymbolExpr = "{ " OriginalSymbolExpr1 / f
OriginalSymbolExpr1 = ExprWithoutBlock " }" / f

// Char
CharLiteral = '\'' CharLiteral1 / f
CharLiteral1 = InnerCharLiteral '\'' / f
InnerCharLiteral = NotCharLetter InnerCharLiteral1 / f
NotCharLetter = '\'' * / ()
InnerCharLiteral1 = QuoteEscape () / ?

// String
StringLiteral = '"' StringLiteral1 / f
StringLiteral1 = InnerStringLiteral '"' / f
InnerStringLiteral = InnerStringLiteralLetter InnerStringLiteral / ()
// InnerStringLiteralLetter
InnerStringLiteralLetter = NotStringLetter InnerStringLiteralLetter1 / f
NotStringLetter = '"' * / ()
InnerStringLiteralLetter1 = QuoteEscape () / ?

// Integer
IntegerLiteral = IntegerLiterals () / f
IntegerLiterals = DecLiteral () / f
DecLiteral = DecDigit ZeroOrMoreDecDigit / f
ZeroOrMoreDecDigit = DecDigitOrUnderscore ZeroOrMoreDecDigit / ()
DecDigitOrUnderscore = DecDigit () / '_'

// IDENTIFIER
Identifier = Uppercase ZeroOrMoreIdentifierContinue / f
ZeroOrMoreIdentifierContinue =  IdentifierContinue ZeroOrMoreIdentifierContinue / ()
IdentifierContinue =  Alphabet () / DecDigit


// Letters
Alphabet = Lowercase () / Uppercase
// Lowercase
LowercaseAToF = LowercaseAToF1 () / f
LowercaseAToF1 = 'a' () / LowercaseAToF2
LowercaseAToF2 = 'b' () / LowercaseAToF3
LowercaseAToF3 = 'c' () / LowercaseAToF4
LowercaseAToF4 = 'd' () / LowercaseAToF5
LowercaseAToF5 = 'e' () / LowercaseAToF6
LowercaseAToF6 = 'f' () / f
Lowercase = LowercaseAToF () / Lowercase1
Lowercase1 = 'g' () / Lowercase2
Lowercase2 = 'h' () / Lowercase3
Lowercase3 = 'i' () / Lowercase4
Lowercase4 = 'j' () / Lowercase5
Lowercase5 = 'k' () / Lowercase6
Lowercase6 = 'l' () / Lowercase7
Lowercase7 = 'm' () / Lowercase8
Lowercase8 = 'n' () / Lowercase9
Lowercase9 = 'o' () / Lowercase10
Lowercase10 = 'p' () / Lowercase11
Lowercase11 = 'q' () / Lowercase12
Lowercase12 = 'r' () / Lowercase13
Lowercase13 = 's' () / Lowercase14
Lowercase14 = 't' () / Lowercase15
Lowercase15 = 'u' () / Lowercase16
Lowercase16 = 'v' () / Lowercase17
Lowercase17 = 'w' () / Lowercase18
Lowercase18 = 'x' () / Lowercase19
Lowercase19 = 'y' () / Lowercase20
Lowercase20 = 'z' () / f
// Uppercase
UppercaseAToF = UppercaseAToF1 () / f
UppercaseAToF1 = 'A' () / UppercaseAToF2
UppercaseAToF2 = 'B' () / UppercaseAToF3
UppercaseAToF3 = 'C' () / UppercaseAToF4
UppercaseAToF4 = 'D' () / UppercaseAToF5
UppercaseAToF5 = 'E' () / UppercaseAToF6
UppercaseAToF6 = 'F' () / f
Uppercase = UppercaseAToF () / Uppercase1
Uppercase1 = 'G' () / Uppercase2
Uppercase2 = 'H' () / Uppercase3
Uppercase3 = 'I' () / Uppercase4
Uppercase4 = 'J' () / Uppercase5
Uppercase5 = 'K' () / Uppercase6
Uppercase6 = 'L' () / Uppercase7
Uppercase7 = 'M' () / Uppercase8
Uppercase8 = 'N' () / Uppercase9
Uppercase9 = 'O' () / Uppercase10
Uppercase10 = 'P' () / Uppercase11
Uppercase11 = 'Q' () / Uppercase12
Uppercase12 = 'R' () / Uppercase13
Uppercase13 = 'S' () / Uppercase14
Uppercase14 = 'T' () / Uppercase15
Uppercase15 = 'U' () / Uppercase16
Uppercase16 = 'V' () / Uppercase17
Uppercase17 = 'W' () / Uppercase18
Uppercase18 = 'X' () / Uppercase19
Uppercase19 = 'Y' () / Uppercase20
Uppercase20 = 'Z' () / f

QuoteEscape = "\\'" () / "\\\""
EndOfLine = "\r\n" () / '\n'
Space = ' ' () / f

// Digits
BinDigit = "0" () / "1"
OctDigit = BinDigit () / OctDigit1
OctDigit1 = "2" () / OctDigit2
OctDigit2 = "3" () / OctDigit3
OctDigit3 = "4" () / OctDigit4
OctDigit4 = "5" () / OctDigit5
OctDigit5 = "6" () / OctDigit6
OctDigit6 = "7" () / f
DecDigit = OctDigit () / DecDigit1
DecDigit1 = "8" () / DecDigit2
DecDigit2 = "9" () / f

// Comment
LineComment = "//" InnerLineComment / f
InnerLineComment = AnyExceptLF InnerLineComment / ()
AnyExceptLF = AnyExceptLF1 ? / f
AnyExceptLF1 = EndOfLine * / ()

TODO

Tasks

  • into_first() in CST

  • Add { Original } in mplg
  • Add functions that easy to get Variable from AST
  • Add RowColSpan
  • Create parser from MPLG file.
  • Error Handling
  • Packrat Parsing
  • Left Recursion

Next implementation

  • Add functions that easy to get Variable from AST
  • Can be Variable in Leaf Node
  • Error Handling

Practice

Sequence

A <- e1 e2

A = e1 e2 / f

Choice

A <- e1 / e2

A = e1 () / e2

Zero or more

A <- e*

A = e A / ()

Not predicate

A <- !e ?

A = B ? / f
B = e * / ()

References

Foundations of TDPL / PEG / packrat parsing

  • Alexander Birman. The TMG Recognition Schema. PhD thesis, Princeton University, February 1970. (TDPL — the "one rule" formulation that MPL specialises.)
  • Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling — Vol. I: Parsing. Prentice Hall, 1972.
  • Bryan Ford. 2002. Packrat parsing: a practical linear-time algorithm with backtracking. PhD dissertation, MIT.
  • Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. POPL 2004, pp. 111–122.

Algorithms used by mpl-macro's optimisations

  • Kota Mizushima, Atusi Maeda, and Yoshinori Yamaguchi. 2010. Packrat parsers can handle practical grammars in mostly constant space. PASTE 2010. — FIRST-set / cut analysis used by cascade::analyze_char_classes and cut::compute_first_sets to exclude LL(1)-disjoint rules from memoisation.
  • Alessandro Warth, James R. Douglass, and Todd Millstein. 2008. Packrat parsers can support left recursion. PEPM 2008. — seed-growing iteration used in Parser::squirrel_parse and the FastParse-generated LR codegen.
  • Sérgio Medeiros, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. Left recursion in Parsing Expression Grammars. Science of Computer Programming 96, pp. 177–190.
  • Luke A. D. Hutchison. 2020. Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems. arXiv preprint arXiv:2005.06444.
  • Luke A. D. Hutchison. 2026. Squirrel: a unified packrat parsing algorithm with linear-time guarantees, optimal error recovery, and full left-recursion support. arXiv preprint arXiv:2601.05012. — the family of position-only seed-growing memoisation MPL's squirrel_recognize and the FastParse LR path implement.

Parser data-layout influences

  • Adrian Sampson. Flattening ASTs (and other compiler data structures). 2023. cs.cornell.edu/~asampson/blog/flattening.html — the flat Vec<Token> representation used in mpl::fast.
  • Aleksey Kladov ("matklad"). Resilient LL parsing tutorial. 2023. Background on lossy / lossless syntax tree design.
  • Joshua Haberman ("Reverberate"). Parsing Protobuf at 2+ GB/s: how I learned to love tail calls in C. blog.reverberate.org, 2021. — motivation for the explicit loop {} codegen in cascade::is_tail_loop (Rust's TCO is unreliable for Result<u32, ()>-shaped recursive returns).

All algorithms above are implemented from the published descriptions; no source code is copied. PEG / TDPL / packrat parsing and their extensions are not patented to our knowledge.

License

This project is licensed under either of:

at your option. All runtime dependencies (bumpalo, proc-macro2, quote, syn) are dual-licensed MIT OR Apache-2.0, compatible with both.