This crate provides a derive macro, #[derive(Parse)] that derives a recursive descent parser for a syntax tree node based on fields' Parse implementations as well as derive macros.
[!IMPORTANT] Disclaimer: I had an interesting idea and sketched out a proof-of-concept. As of right now, that's all this is. It works, mostly, but is in no way feature-complete or efficient. If people end up showing interest, I'll rewrite it from scratch with better design choices. For now, feel free to experiment, but don't expect feature or performance parity with existing parser generators. It's worth noting that the nature of derive macros imposes a limit on how efficient an approach like this can be, since it's not possible to globally collect definitions across structs and build e.g. an LR transition table.
As of
v0.x, there are NO semver guarantees; breaking changes may occur in any version.
Example
use ;
use *; // Implemented e.g. with Logos
])
// Support stuff; we could've just implemented `Token` for `TokenKind`:
By only annotating what can't be inferred from the syntax tree structs and deriving the rest of the parser implementation, derive_parser minimizes the amount of code needed to express your parser. In addition, this keeps your CST and parser implementations in sync, which should help you avoid bugs when updating your parser. The whole thing can be also be made zero-copy by just adding a lifetime paramter to Token and the node structs.
use ;
use TokenKind; // Implemented e.g. with logos
Even with all that, many of the convenience methods (like .span() on nodes) that derive_parser provides are still not implemented here, and I didn't even try to make this zero-copy due to the sheer amount of lifetime-juggling.
Pratt Parsing
Currently, derive_parser generates recursive descent parsers. This makes inherently left-recursive grammars like arithmetic expressions hard to represent. To solve this, a built-in pratt parser is provided:
use ;
;
This will parse an operator-precedence expression like 1 + 2 * 3 + 4 * -5! as (1 + (2 * 3)) + (4 * (-5)!).
Attributes
If the right-hand side of a field implements Parse, you don't need any attributes — the parser will automatically try to parse the field with its own Parse implementation.
Otherwise, you can use attributes to explain how to parse the field. Currently, the only attribute for this case is #[token(...)].
#[token(PATTERN)]
By far the most common attribute is #[token(...)]. It applies a pattern to the next input token, consuming it if it matches:
It is common to use TokenKind::* in your parser module to avoid repetition. You may use #[token] multiple times to allow different tokens on the same field:
// -- snip --
use *;
]
Token
);
Planned Features
- Error recovery (
#[required],#[recover]) - Better
DelimitedAPI (#[delimited])