Expand description
§syn-grammar-macros
The code generation engine for syn-grammar.
Note: You should not add this crate to your
Cargo.tomldirectly. Instead, use thesyn-grammarcrate, which re-exports the macros from this crate.
This crate defines the procedural macros (grammar!) that compile the EBNF-like grammar DSL into actual Rust code. While it is an internal implementation detail of syn-grammar, understanding its architecture is useful if you intend to write a custom parser backend.
§Responsibilities
- Parsing & Validation: It delegates parsing, transformation, and semantic validation to
syn-grammar-model. - Code Generation: It transforms the validated model into a Rust module containing
syn-based parser functions.
§Code Generation Details
The code generation phase (codegen module) transforms the semantic model into a Rust module containing syn parser functions.
§1. Rule Generation
For each rule in the grammar, a public function parse_<rule_name> and an internal implementation function parse_<rule_name>_impl are generated.
parse_<rule_name>: The public entry point. It initializes theParseContext(used for error reporting and state) and calls the implementation. It handles converting internal errors intosyn::Result.parse_<rule_name>_impl: The actual parser logic. It takesinput: ParseStreamandctx: &mut ParseContext.
§2. Pattern Matching
The generator converts EBNF patterns into syn parsing calls:
- Literals (
"fn"): Converted toinput.parse::<Token![fn]>()?or custom keyword parsing. - Sequences (
A B): Generated as a sequence of statements. - Alternatives (
A | B):- If the alternatives have unique starting tokens (determined by
peek),if input.peek(...)blocks are generated for efficient dispatch. - Otherwise,
syn::parse::discouraged::Speculative(viart::attempt) is used to try each alternative in order.
- If the alternatives have unique starting tokens (determined by
- Repetitions (
A*): Converted towhileloops. - Groups (
(A B)): Treated as nested sequences.
§3. Left Recursion Handling
Standard recursive descent parsers loop infinitely on left-recursive rules (e.g., expr = expr + term). syn-grammar automatically detects direct left recursion and transforms it into an iterative loop:
- Split Variants: The rule’s variants are split into “base cases” (non-recursive) and “recursive cases” (starting with the rule itself).
- Parse Base: The parser first attempts to match one of the base cases to establish an initial value (
lhs). - Loop: It then enters a
loop. Inside the loop, it checks if the input matches the “tail” of any recursive variant (the part after the recursive call).- If it matches, the action is executed using the current
lhsand the parsed tail, updatinglhswith the result. - If no recursive variant matches, the loop terminates, and
lhsis returned.
- If it matches, the action is executed using the current
This transformation allows writing natural expression grammars without manual restructuring.
§4. The Cut Operator (=>)
The cut operator is handled during the generation of alternative branches. When a pattern contains =>:
- The pattern is split into
pre_cutandpost_cut. - If
pre_cutmatches successfully, the parser commits to this branch. - Any failure in
post_cutbecomes a fatal error, preventing backtracking to other alternatives.
§5. Error Reporting
The generated code uses ParseContext to track errors. When speculative parsing (attempt) fails, the error is recorded. The context keeps track of the “deepest” error (the one that consumed the most tokens) to provide helpful diagnostics to the user, rather than just reporting the last failure.
§Creating a Custom Backend
If you want to generate parsers for a different library (e.g., winnow, chumsky, or a documentation generator) instead of syn, you cannot simply “plug in” a generator to this crate. Procedural macros are compiled as separate artifacts, so the code generation logic must be baked into the macro crate itself.
To create a new backend:
- Create a new proc-macro crate (e.g.,
my-grammar-macros). - Depend on
syn-grammar-model. This gives you the parser for the DSL, so you don’t have to rewrite the grammar syntax parsing. - Implement your own
codegenmodule. This module will take theGrammarDefinitionfrom the model and output your desired code. - Define your own
grammar!macro. This is necessary because of a fundamental limitation in Rust procedural macros: a macro crate must contain the logic it executes. You cannot dynamically inject a generator function into an existing compiled macro crate. Therefore, you must define the macro entry point in your own crate to invoke your custom generator.
Example of a custom backend entry point:
use proc_macro::TokenStream;
use syn_grammar_model::parse_grammar;
// use my_custom_codegen::generate;
#[proc_macro]
pub fn grammar(input: TokenStream) -> TokenStream {
// 1. Reuse the shared model parser
let model = match parse_grammar(input.into()) {
Ok(m) => m,
Err(e) => return e.to_compile_error().into(),
};
// 2. Use your custom generator
// let output = generate(model);
// output.into()
TokenStream::new() // Placeholder
}This architecture ensures that the syntax remains consistent across different backends while allowing complete flexibility in the generated output.
Macros§
- grammar
- The main macro for defining grammars.