miden-assembly-syntax
This crate provides parsing and semantic analysis of the Miden Assembly language.
Developer guide
The following sections will guide you through how to make some of the more
common syntax changes that come up from time to time. Syntax parsing and
analysis is split between this crate and miden-assembly-syntax-cst, where
the latter contains the concrete syntax tree (represents what was literally
written by a user in source code), while this crate contains the abstract
syntax tree (the form we perform semantic analysis and assembly on), as well
as tools for lowering the CST to AST, and performing semantic analysis.
Adding new instructions
One of the more common tasks you may need to perform is adding support for a new instruction. This is done as follows:
- Add a new variant to the
Instructionenum in this crate - Add support for assembling this new variant in
crates/assembly/src/instruction/mod.rs - Add support for parsing the new instruction from the CST in
parser/cst/instructions.moda. If the instruction is a new simple primitive instruction, add a new entry toPRIMITIVE_SPECSb. If the instruction accepts an immediate value, add a new entry toCOMPACT_SUFFIX_SPECS. Try and make use of one of the existing patterns, but if a new one is needed, you'll need to add a newCompactSuffixKindvariant to represent the pattern, and implement a new lowering function for that variant. c. For any other instruction, a new entry must be added toEXTENDED_INSTRUCTION_SPECS. You'll likely need to add a newExtendedInstructionKindvariant as well, unless you are adding a new variation on one of the existing patterns. New variants of that enum require a corresponding lowering function as well. - Add tests to make sure that you get the desired AST given the MASM syntax you wish to support.
Deprecating instructions
Removing instructions typically requires a deprecation period, so this section is presumed to be a prelude to later removing the same instruction. If you wish to just remove the instruction with no deprecation, you can skip this section.
If the instruction is being renamed, you can deprecate the old alias and automatically replace it with the new name by adding an entry to DEPRECATED_ALIAS_SPECS
For other forms of deprecation, we have not yet added infrastructure for it, but the best place to add new types of deprecation handling is in try_lower_instruction, which is where the existing deprecation handling lives.
Removing instructions
After deprecating an instruction, removing an instruction is straightforward:
- Remove the variant from the
Instructionenum in this crate - Remove the lowering code from the
parser::cstmodule in this crate - Remove any references in existing tests or MASM sources
- Remove any dead code that results from the above changes
Adding new top-level forms
This document will not go into deep detail on this topic, as new forms are typically accompanied by new syntax that may require modifications to parser machinery that is not typically touched for routine changes.
However, to provide some orientation for those that may find themselves needing to do so, a few pointers to help you decide how best to tackle this:
- To start with, is the syntax of the new form unambiguous? You must ensure that it can be unambiguously parsed, or it can cause weird edge cases to crop up. For example, you should attempt to avoid left-recursive non-terminal rules where possible, as it can make it impossible to determine whether to produce a node, or keep consuming more tokens. In general, prefer to have some form of explicit terminator token for top-level forms, rather than relying on implicit termination by encountering the start of some other top-level item - this is often unreliable, and makes it more difficult to detect subtle syntax mistakes.
- Once you know what the syntax will be, the next step is to define the AST node(s) corresponding to that syntax. Take a look at existing AST nodes for reference, if you are unsure how to structure them.
- Next, you will need to support parsing the new syntax into CST, and then implement support for lowering the new CST syntax to the AST node(s) you created. The exact details of this are highly dependent on the specific syntax, and how much of it is brand new, versus how much of the existing machinery can be used for the new form.
a. Does the new form require tokens that are not currently valid in MASM? For example, some new punctuation, or a sigil that needs special recognition. If so, you will need to add a new
SyntaxKindvariant, a newTokenvariant, and modify the lexer (all inmiden-assembly-syntax-cst), in order to even be able to parse a source file containing the new token. Next, you will generally need to modify the CST parser inmiden-assembly-syntax-cstto handle the new token/syntax kind type in existing parser rules. From there, you can move on to parsing the form itself. b. Is the new form largely a duplicate of an existing form, just with minor differences? If so, you may find it easiest to simply duplicate the existing parser logic for that form, and modify it as appropriate. The specific details depend on the form.
Ultimately, the CST -> AST lowering lives in crates/assembly-syntax/src/parser/cst, in forms.rs and fragments.rs. Depending on how much new syntax there is, you may find it necessary to modify the parser itself in new ways - if so, make sure you update this document as appropriate.
License
This project is dual-licensed under the MIT and Apache 2.0 licenses.