Contains functionality for compiling the language of Dialectic's Session!
macro into an actual Rust type. This crate is considered an internal implementation detail and none of its API is subject to semantic versioning rules. The compilation process is done via transforming the syntax gradually across three representations:
- [
syntax::Syntax
] - the "surface-level" abstract syntax tree. This is what gets parsed from the tokenstream provided by the proc macro interface. - [
cfg::Cfg
] - a control flow graph representation, which is gradually transformed by semantics-preserving passes until it takes on a form more suitable for emitting to the target. - [
target::Target
] - the "target language" syntax tree, which directly maps to Dialectic session types.
Syntax
AST transformations
After parsing, the Syntax
AST only really undergoes one transformation, when it is converted to the control flow graph representation.
Conversion to CFG - [Syntax::to_cfg
]
During this conversion, we resolve labels in the AST and ensure that all break
and continue
constructs refer to valid nodes and emit errors for malformed loop nodes and such. Also, please note this method is often referred to as Syntax::to_cfg
but the method is actually implemented on Spanned<Syntax>
.
Errors
This pass may emit several errors:
- [
CompileError::UndeclaredLabel
] - emitted when we find a reference to a label that's not in scope - [
CompileError::ShadowedLabel
] - emitted when two loops in the same scope have the same label - [
CompileError::ContinueOutsideLoop
] - emitted when we find aContinue
in an empty environment - [
CompileError::BreakOutsideLoop
] - emitted when we find aBreak
in an empty environment
[Cfg
] transformations
The CFG undergoes only one really crucial explicit transform before undergoing flow analysis, error reporting, and then finally lowering to the Target
. The rough outline of CFG processing looks like this:
- Scope resolution and implicit continue insertion
- Control flow analysis
- Reachability analysis, using the control flow analysis output
- Dead code analysis - report unreachable code, which will never be emitted by the compiler
- Target generation, break elimination, and loop productivity analysis
Scope resolution - [Cfg::resolve_scopes
]
The scope resolution pass is implemented in the [Cfg::resolve_scopes
] method. It traverses the CFG from a given root node, and converts "implicit" continuations to "explicit" continuations. Because implicit continues are another sort of "implicit" continuation represented by the absence of a continuation in the scope of a loop body, we also insert them here. There is a chance, however, that a machine-generated Continue
node of this sort will actually be unreachable code, which would trigger an error during dead code analysis; because of that, machine generated Continues
are marked with the machine_generated
field of CfgNode
, and dead code reporting is configured to ignore them.
After the scope resolution pass, there are three critical invariants we can use:
- [
Ir::Choose
] and [Ir::Offer
] nodes will never have a continuation (theirnode.next
fields will always beNone
). - [
Ir::Loop
] nodes will always have a nonempty body (theOption<Index>
inside the [Ir::Loop
] variant will always beSome(body)
). - Paths through the body of an [
Ir::Loop
] node will always terminate in an [Ir::Break
] or an [Ir::Continue
].
Most other passes will make use these assumptions.
The scope resolution algorithm looks something like this:
// We begin w/ an empty implicit continuation (the Done continuation) and start traversing from the
// root.
resolve_scopes;
Errors
This pass never emits any errors.
Dead code reporting - [Cfg::report_dead_code
]
The dead core reporting pass itself is responsible for running the flow analysis and reachability analysis. After these are run, it traverses the control flow graph from the root, looking for nodes which the flow analysis considers "impassable" (control flow does not proceed to their continuations) but which still have a continuation. This gives us the location of all the points in the program which consist of reachable code immediately adjacent to unreachable code, which is much nicer for the user to see errors of rather than errors for every unreachable node, of which there may be many in the same flow path.
The dead code reporting algorithm looks something like this:
report_dead_code;
Errors
This pass emits two types of errors:
- [
CompileError::FollowingCodeUnreachable
] - emitted on a reachable node which has an unreachable continuation - [
CompileError::UnreachableStatement
] - emitted on an unreachable node that is the continuation of a reachable node
Lowering to target/target code generation - [Cfg::generate_target
]
The code generation/lowering pass also performs loop productivity analysis. An unproductive loop is technically a valid surface language program but will cause the Rust typechecker as of Rust 1.51 to infinitely loop and generate an error which is very difficult for new users to decipher. To prevent this, we detect them and report them instead of compiling a technically valid session type which will not compile as valid Rust.
Loop productivity analysis is conducted during the target generation pass because it is a syntactic property of the target language itself.
For most nodes, generating the corresponding target code is trivial - Send
, Recv
, Call
, Split
, Choose
, and Offer
all map very directly to their [Target
] counterparts. The complex components are Loop
, Continue
, Type
, Break
, and Error
:
Continue
s must be converted from directly referencing their target to referencing the correct DeBruijn index of the target loop. For this purpose, we keep a loop environment as a stack which is pushed to when entering a loop node and popped when exiting. Calculating the DeBruijn index corresponding to a loop is done with a linear search on the loop environment looking for the position of the matching loop in the loop environment.Break
has no corresponding representation in theTarget
; instead, it lowered by substituting any reference to it with the continuation of the loop it references. The loop's continuation must also be placed into its own loop environment, because in theTarget
, continuations of a loop are simply parts of a loop which do not end in aContinue
. This actually makes codegen a little bit more convenient, but should still be noted.Loop
must push its index to the loop environment stack before generating its body, and then pop once its body is generated, in order to ensure the loop environment properly corresponds to the scope in the target AST.Type
in theTarget
has nowhere to put the continuation corresponding to its next pointer. In theTarget
, aType
is run until it isDone
. So there are two cases; if theType
's continuation isNone
(the "done" continuation) we can just emit the type directly as a [Target::Type
] node; if it isSome
, we must emit a [Target::Then
] which sequences theType
's continuation after it.Error
has no corresponding representation in theTarget
, and is substituted with its continuation during codegen. While they are allowed during codegen, a codegen pass which encounters anError
node will already have encountered at least one corresponding emitted [CompileError
], and the resulting target AST will not be returned fromCfg::generate_target
.
The codegen algorithm looks something like this:
Errors
This pass emits two types of errors:
- [
CompileError::UnproductiveLoop
] - emitted on a loop node which is found to be unproductive - [
CompileError::UnproductiveContinue
] - emitted on a non-machine-generated continue node which causes a loop to be unproductive
[Target
] transformations
At current, the target AST does not undergo any kind of transform before it is converted very transparently to its destination format (whether that's to be displayed as a string or emitted as a Rust token tree.)