Minimal Parsing Language (MPL)
This is minimal parser combinator of Minimal Parsing Language (MPL) like Top-Down Parsing Language (TDPL). It creates a abstract syntax tree (AST) for each input.
Getting Started
- implement
Variable - insert each rule into
HashMap minimal_parse()
- Optional
- implement
Input- supports
[T]andstrby default
- supports
- implement
Position- supports
u*,i*, andf*by default
- supports
- implement
Span- supports
StartAndLenSpanby default
- supports
- implement
Terminal- supports
SliceTerminal,StrTerminal, andU8SliceTerminalby default
- supports
- implement
Output- supports
()by default
- supports
- implement
Rules- supports
HashMapby default
- supports
- implement
Parse- supports
[T],str, and[u8]by default
- supports
- implement
Example
use crate*;
use Parse;
use ;
use StartAndLenSpan;
use ;
use AST;
use HashMap;
/// ```
/// Open = '(' Parentheses / ()
/// Parentheses = Open Close / f
/// Close = ")" Open / f
/// ```
Test Examples
Parsers written with MPL
- WAV AST : RIFF waveform Audio Format
MPL
Definition of MPL grammar
A MPL grammar G is a tuple G = (V, Σ, R, S) in which:
Vis a finite set of variables.Σis a finite set of original terminal symbols.Tis an union ofΣorM(Σ ∪ M) (M(= {(), f}) is a finite set of metasymbols).Ris a finite set of rules of the formA = B C / D
A in V (A ∈ V),
B, C, D in E (E = T ∪ V) (T ∩ V = ∅) (B, C, D ∈ E).
For any variable A there is exactly one rule with A to the left of=.
- S in V (S ∈ V) is the start variable.
Empty
() is a metasymbol that always succeeds without consuming input.
Empty = () () / ()
Failure
f is a metasymbol that always fails without consuming input.
Failure = f f / f
Extended MPL
Since one of the goals of MPL is to create an AST, it also supports two features in terms of ease of use and speed.
Any
? is a metasymbol representing any single input like wildcard character. This succeeds if there is any input left, and fails if there is no input left.
Any = ? () / f
To extend the difinition of MPL grammar, let ? ∈ M.
All
* is a metasymbol representing All remaining input like wildcard character. This will succeed even if the remaining inputs are zero.
All = * () / f
Same as All = ? All / ().
To extend the difinition of MPL grammar, let * ∈ M.
Difference between TDPL and MPL
The biggest difference between the two grammars is the rule form. There are two rule forms in TDPL.
A..BC/D, A,B,C,D in V.
A..a, a in ∑ ∪ {ε, f}, f is a metasymbol not in ∑ and ε is the null string.
MPL, on the other hand, has one rule form.
TODO
- Visualize AST
- Add RowColSpan
- Create parser from MPLG file.
- Error Handling
- Packrat Parsing
- Left Recursion
Practice
Sequence
A <- e1 e2
A = e1 e2 / f
Choice
A <- e1 / e2
A = e1 / e2
Zero or more
A <- e*
A = e A /
Not predicate
A <- !e .
A = B ? / f
B = e * /
References
- Alexander Birman. The TMG Recognition Schema. PhD thesis, Princeton University, February 1970
- Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling - Vol. I: Parsing. Prentice Hall, Englewood Cliffs, N.J., 1972.
- Bryan Ford. 2002. Packrat parsing: a practical linear-time algorithm with backtracking. Ph.D. Dissertation. Massachusetts Institute of Technology.
- Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 111–122.
- Hutchison, Luke AD. "Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems." arXiv preprint arXiv:2005.06444 (2020).