dialectic_compiler/
lib.rs

1/*!
2[![Rust](https://github.com/boltlabs-inc/dialectic/actions/workflows/rust.yml/badge.svg)](https://github.com/boltlabs-inc/dialectic/actions/workflows/rust.yml)
3![license: MIT](https://img.shields.io/github/license/boltlabs-inc/dialectic)
4[![crates.io](https://img.shields.io/crates/v/dialectic-compiler)](https://crates.io/crates/dialectic-compiler)
5[![docs.rs documentation](https://docs.rs/dialectic-compiler/badge.svg)](https://docs.rs/dialectic-compiler)
6
7Contains functionality for compiling the language of Dialectic's `Session!` macro into an actual Rust type. This crate is considered an internal implementation detail and none of its API is subject to semantic versioning rules. The compilation process is done via transforming the syntax gradually across three representations:
8
9- [`syntax::Syntax`] - the "surface-level" abstract syntax tree. This is what gets parsed from the tokenstream provided by the proc macro interface.
10- [`cfg::Cfg`] - a control flow graph representation, which is gradually transformed by semantics-preserving passes until it takes on a form more suitable for emitting to the target.
11- [`target::Target`] - the "target language" syntax tree, which directly maps to Dialectic session types.
12
13# `Syntax` AST transformations
14
15After parsing, the `Syntax` AST only really undergoes one transformation, when it is converted to the control flow graph representation.
16
17## Conversion to CFG - [`Syntax::to_cfg`]
18
19During this conversion, we resolve labels in the AST and ensure that all `break` and `continue` constructs refer to valid nodes and emit errors for malformed loop nodes and such. Also, please note this method is often referred to as `Syntax::to_cfg` but the method is *actually* implemented on `Spanned<Syntax>`.
20
21### Errors
22
23This pass may emit several errors:
24
25- [`CompileError::UndeclaredLabel`] - emitted when we find a reference to a label that's not in scope
26- [`CompileError::ShadowedLabel`] - emitted when two loops in the same scope have the same label
27- [`CompileError::ContinueOutsideLoop`] - emitted when we find a `Continue` in an empty environment
28- [`CompileError::BreakOutsideLoop`] - emitted when we find a `Break` in an empty environment
29
30# [`Cfg`] transformations
31
32The CFG undergoes only one really crucial explicit transform before undergoing flow analysis, error reporting, and then finally lowering to the `Target`. The rough outline of CFG processing looks like this:
33
34- Scope resolution and implicit continue insertion
35- Control flow analysis
36- Reachability analysis, using the control flow analysis output
37- Dead code analysis - report unreachable code, which will never be emitted by the compiler
38- Target generation, break elimination, and loop productivity analysis
39
40## Scope resolution - [`Cfg::resolve_scopes`]
41
42The scope resolution pass is implemented in the [`Cfg::resolve_scopes`] method. It traverses the CFG from a given root node, and converts "implicit" continuations to "explicit" continuations. Because implicit continues are another sort of "implicit" continuation represented by the absence of a continuation in the scope of a loop body, we also insert them here. There is a chance, however, that a machine-generated `Continue` node of this sort will actually be unreachable code, which would trigger an error during dead code analysis; because of that, machine generated `Continues` are marked with the `machine_generated` field of `CfgNode`, and dead code reporting is configured to ignore them.
43
44After the scope resolution pass, there are three critical invariants we can use:
45
46- [`Ir::Choose`] and [`Ir::Offer`] nodes will never have a continuation (their `node.next` fields will always be `None`).
47- [`Ir::Loop`] nodes will always have a nonempty body (the `Option<Index>` inside the [`Ir::Loop`] variant will always be `Some(body)`).
48- Paths through the body of an [`Ir::Loop`] node will always terminate in an [`Ir::Break`] or an [`Ir::Continue`].
49
50Most other passes will make use these assumptions.
51
52The scope resolution algorithm looks something like this:
53
54```text
55fn resolve_scopes(implicit_scope, node) {
56    if the node is a Recv, Send, Type, Break, Continue, Call, Split, or Error {
57        // The implicit continuation of the callee of a Call node or arm of a Split node is always
58        // Done, because we want to to run it until Done rather than to continue afterwards, as the
59        // Call/Split node takes care of the continuation.
60        if the node is a Call(callee) {
61            resolve_scopes(None, callee);
62        }
63
64        if the node is a Split(tx, rx) {
65            resolve_scopes(None, tx);
66            resolve_scopes(None, rx);
67        }
68
69        // If this node has an explicit continuation, then we follow it and continue visiting! If it
70        // doesn't, then we assign the implicit continuation in its scope to become its new explicit
71        // continuation.
72        match node.next {
73            Some(next) => resolve_scopes(implicit_cont, next),
74            None => node.next = implicit_scope,
75        }
76    } else if the node is a Choose(choices) or Offer(choices) {
77        // Remove the continuation from the node if present; we inline it into all arms of the //
78        // Choose/Offer.
79        let cont = match node.next.take() {
80            // If we find an implicit continuation, then it's the new "top of the continuation
81            // scope" for this node's arms. In order to properly inline the outer scope's implicit
82            // continuation, we visit the new implicit continuation w/ the previous one in order to
83            // cause it to be inlined in the correct places as well.
84            Some(next) => {
85                resolve_scopes(implicit_cont, next);
86                Some(next)
87            }
88            // If this node doesn't have an implicit continuation, then there's no need to worry
89            // about inlining the outer scope's implicit continuation into it, as we can just inline
90            // the outer implicit continuation into every arm instead.
91            None => implicit_cont,
92        };
93
94        for choice in choices {
95            resolve_scopes(cont, choice);
96        }
97
98        // We never follow or reassign a Choose/Offer node's implicit continuation, because it's
99        // been inlined into all its arms already.
100    } else if the node is a Loop(body) {
101        // Inside a loop body, the absence of an explicit continuation doesn't imply following the
102        // implicit continuation held by the scope outside the loop (the continuation of the loop
103        // block) - instead, it implies continuing the loop! So to handle that, we visit the body of
104        // the loop with its implicit continuation set to a Continue node we generate, targeted at
105        // the loop node itself.
106        let continue0 = Continue(node);
107        continue0.machine_generated = true;
108        resolve_scopes(continue0, body);
109    }
110}
111
112// We begin w/ an empty implicit continuation (the Done continuation) and start traversing from the
113// root.
114resolve_scopes(None, root);
115```
116
117### Errors
118
119This pass never emits any errors.
120
121## Dead code reporting - [`Cfg::report_dead_code`]
122
123The dead core reporting pass itself is responsible for running the flow analysis and reachability analysis. After these are run, it traverses the control flow graph from the root, looking for nodes which the flow analysis considers "impassable" (control flow does not proceed to their continuations) but which still have a continuation. This gives us the location of all the points in the program which consist of reachable code immediately adjacent to unreachable code, which is much nicer for the user to see errors of rather than errors for *every* unreachable node, of which there may be many in the same flow path.
124
125The dead code reporting algorithm looks something like this:
126
127```text
128fn report_dead_code(node) {
129    // We want to follow every child node except for the node's continuation. If we did follow the
130    // continuation, we would end up reporting every unreachable node instead of just the
131    // unreachable nodes on the boundary between reachable and unreachable code.
132    for child in all children of node {
133        report_dead_code(child);
134    }
135
136    if let Some(cont) = node.next {
137        if node is passable {
138            report_dead_code(cont);
139        } else if !cont.machine_generated && cont is not reachable {
140            emit unreachable code errors;
141        }
142    }
143}
144
145report_dead_code(node);
146```
147
148### Errors
149
150This pass emits two types of errors:
151
152- [`CompileError::FollowingCodeUnreachable`] - emitted on a reachable node which has an unreachable continuation
153- [`CompileError::UnreachableStatement`] - emitted on an unreachable node that is the continuation of a reachable node
154
155## Lowering to target/target code generation - [`Cfg::generate_target`]
156
157The code generation/lowering pass also performs loop productivity analysis. An unproductive loop is technically a valid surface language program but will cause the Rust typechecker as of Rust 1.51 to infinitely loop and generate an error which is very difficult for new users to decipher. To prevent this, we detect them and report them instead of compiling a technically valid session type which will not compile as valid Rust.
158
159Loop productivity analysis is conducted during the target generation pass because it is a syntactic property of the target language itself.
160
161For most nodes, generating the corresponding target code is trivial - `Send`, `Recv`, `Call`, `Split`, `Choose`, and `Offer` all map very directly to their [`Target`] counterparts. The complex components are `Loop`, `Continue`, `Type`, `Break`, and `Error`:
162
163- `Continue`s must be converted from directly referencing their target to referencing the correct DeBruijn index of the target loop. For this purpose, we keep a loop environment as a stack which is pushed to when entering a loop node and popped when exiting. Calculating the DeBruijn index corresponding to a loop is done with a linear search on the loop environment looking for the position of the matching loop in the loop environment.
164- `Break` has no corresponding representation in the `Target`; instead, it lowered by substituting any reference to it with the continuation of the loop it references. The loop's continuation must also be placed into its own loop environment, because in the `Target`, continuations of a loop are simply parts of a loop which do not end in a `Continue`. This actually makes codegen a little bit more convenient, but should still be noted.
165- `Loop` must push its index to the loop environment stack before generating its body, and then pop once its body is generated, in order to ensure the loop environment properly corresponds to the scope in the target AST.
166- `Type` in the `Target` has nowhere to put the continuation corresponding to its next pointer. In the `Target`, a `Type` is run until it is `Done`. So there are two cases; if the `Type`'s continuation is `None` (the "done" continuation) we can just emit the type directly as a [`Target::Type`] node; if it is `Some`, we must emit a [`Target::Then`] which sequences the `Type`'s continuation after it.
167- `Error` has no corresponding representation in the `Target`, and is substituted with its continuation during codegen. While they are allowed *during* codegen, a codegen pass which encounters an `Error` node will already have encountered at least one corresponding emitted [`CompileError`], and the resulting target AST will not be returned from `Cfg::generate_target`.
168
169The codegen algorithm looks something like this:
170
171```text
172fn generate_target(loop_env, maybe_node) {
173    if maybe_node is None {
174        // If the node is empty, that's the "done" continuation.
175        return Done;
176    } else if maybe_node is Some(node) {
177        if node is NOT Loop, Continue, Break {
178            // Note that the current loop's target representation contains something which is not
179            // another Loop in between itself and its `Continue` if present.
180            loop_env.mark_productive();
181        }
182
183        if node is Recv, Send, Call, Split, Choose, or Offer {
184            // Recursively call generate_target on child nodes and convert to the directly
185            // corresponding Target.
186            return Target:: ...;
187        } else if node is Loop(body) {
188            loop_env.push(node);
189            let body = generate_target(loop_env, body);
190            loop_env.pop(node);
191            return Target::Loop(body);
192        } else if node is Continue(jump_target) {
193            let debruijn_index = loop_env.debruijn_index_of(jump_target);
194            // If we've hit a continue, we can know whether or not the corresponding loop is
195            // productive or not, and emit an error if not.
196            if !loop_env.productive_for_continue(debruijn_index) {
197                // Emit an error for the unproductive loop, and for the unproductive continue *if*
198                // it is not machine generated
199                ...
200            }
201            return Target::Continue(debruijn_index);
202        } else if node is Break(jump_target) {
203            // For a break, return the result of generating the target form of its corresponding
204            // loop's continuation.
205            return generate_target(loop_env, jump_target.next);
206        } else if node is Type(ty) {
207            // If the continuation is "done", then we don't need to emit a Then.
208            if node.next is None {
209                return Target::Type(ty);
210            } else {
211                let cont = generate_target(loop_env, node.next);
212                return Target::Then(Target::Type(ty), );
213            }
214        } else if node is Error {
215            // Just keep going so we can collect more loop productivity errors.
216            return generate_target(loop_env, node.next);
217        }
218    }
219}
220```
221
222### Errors
223
224This pass emits two types of errors:
225
226- [`CompileError::UnproductiveLoop`] - emitted on a loop node which is found to be unproductive
227- [`CompileError::UnproductiveContinue`] - emitted on a non-machine-generated continue node which causes a loop to be unproductive
228
229# [`Target`] transformations
230
231At current, the target AST does not undergo any kind of transform before it is converted very transparently to its destination format (whether that's to be displayed as a string or emitted as a Rust token tree.)
232
233<!-- snip -->
234
235[`Syntax::to_cfg`]: crate::Spanned::to_cfg
236[`Cfg`]: crate::cfg::Cfg
237[`Cfg::resolve_scopes`]: crate::cfg::Cfg::resolve_scopes
238[`Cfg::report_dead_code`]: crate::cfg::Cfg::report_dead_code
239[`Cfg::generate_target`]: crate::cfg::Cfg::generate_target
240[`Ir`]: crate::cfg::Ir
241[`Ir::Offer`]: crate::cfg::Ir::Offer
242[`Ir::Choose`]: crate::cfg::Ir::Choose
243[`Ir::Loop`]: crate::cfg::Ir::Loop
244[`Ir::Break`]: crate::cfg::Ir::Break
245[`Ir::Continue`]: crate::cfg::Ir::Continue
246*/
247
248#![warn(missing_docs)]
249#![warn(missing_copy_implementations, missing_debug_implementations)]
250#![warn(unused_qualifications)]
251#![warn(future_incompatible)]
252#![warn(unused)]
253#![forbid(broken_intra_doc_links)]
254
255use {
256    proc_macro2::Span,
257    proc_macro_crate::FoundCrate,
258    quote::format_ident,
259    std::{
260        env,
261        fmt::{Display, Formatter},
262    },
263    syn::{parse_quote, Path},
264    thiserror::Error,
265};
266
267pub mod cfg;
268pub mod flow;
269pub mod parse;
270pub mod syntax;
271pub mod target;
272
273pub use crate::{
274    syntax::{Invocation, Syntax},
275    target::Target,
276};
277
278/// A compilation error due to invalid (but parseable) input in the surface macro syntax.
279#[derive(Error, Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
280pub enum CompileError {
281    /// Error resulting from `'a loop { ... 'a loop { ... }}`.
282    #[error("label name `'{0}` shadows a label name that is already in scope")]
283    ShadowedLabel(String),
284    /// Error resulting from `continue 'a` or `break 'a` which *are* inside a `loop`, but not inside
285    /// any loop with the label `'a`.
286    #[error("undeclared label `'{0}`")]
287    UndeclaredLabel(String),
288    /// Error resulting from any call to `continue` outside of a `loop`.
289    #[error("cannot `continue` outside of a loop")]
290    ContinueOutsideLoop,
291    /// Error resulting from any call to `break` outside of a `loop`.
292    #[error("cannot `break` outside of a loop")]
293    BreakOutsideLoop,
294    /// Error resulting from control flow analysis finding that a statement unconditionally jumps
295    /// away from following code.
296    #[error("any code following this statement is unreachable")]
297    FollowingCodeUnreachable,
298    /// Error resulting from control flow analysis finding that a statement can never be reached
299    /// because of preceding control flow.
300    #[error("unreachable statement")]
301    UnreachableStatement,
302    /// Error resulting from unproductive loop analysis finding that a loop is unproductive and
303    /// cannot be compiled without resulting in typechecker overflow.
304    #[error("this `loop` statement is unproductive (it takes no actions before repeating)")]
305    UnproductiveLoop,
306    /// Error resulting from any `continue` or `break` which produce an unproductive loop.
307    #[error("this `continue` statement causes an unproductive `loop`")]
308    UnproductiveContinue,
309}
310
311#[derive(Debug, Clone, Copy)]
312/// A thing attached to some `Span` that tracks its origin in the macro invocation.
313pub struct Spanned<T> {
314    /// The thing to which the [`Span`] is attached.
315    pub inner: T,
316    /// The [`Span`] which is attache to the thing.
317    pub span: Span,
318}
319
320impl<T> From<T> for Spanned<T> {
321    fn from(inner: T) -> Self {
322        Self {
323            inner,
324            span: Span::call_site(),
325        }
326    }
327}
328
329impl<T: Display> Display for Spanned<T> {
330    fn fmt(&self, f: &mut Formatter) -> Result<(), std::fmt::Error> {
331        write!(f, "{}", self.inner)
332    }
333}
334
335/// Returns a path prefix which refers to the root module of the dialectic crate, regardless of
336/// where it is called (inside dialectic, doctests in dialectic, unit tests in dialectic,
337/// integration tests in dialectic, outside dialectic.)
338pub fn dialectic_path() -> Path {
339    // We need to find the right path where we can reference types in our proc macro. This is a
340    // little tricky. There are three cases to consider.
341    match proc_macro_crate::crate_name("dialectic") {
342        // The first case is that we are in dialectic and compiling dialectic itself, OR we are
343        // compiling a dialectic doctest. In this case, we want to use `crate::dialectic`, which
344        // will grab the symbol "dialectic" in the crate root. In the case of a doctest, this
345        // will result in the extern crate dialectic; in the case of dialectic itself, it will
346        // result in a private dummy module called "dialectic", which exists to support macro
347        // calls like these and re-exports dialectic::types.
348        Ok(FoundCrate::Itself) if env::var("CARGO_CRATE_NAME").as_deref() == Ok("dialectic") => {
349            parse_quote!(crate::dialectic)
350        }
351        // The second case is that we are in an integration test of dialectic. This one's
352        // straightforward.
353        Ok(FoundCrate::Itself) | Err(_) => parse_quote!(::dialectic),
354        // And lastly, the third case: we are in a user's crate. We found the crate with the
355        // name `dialectic` and will use that identifier as our crate name, in a similar manner
356        // to the second case, prefixed with `::` to ensure it is a "global" path.
357        Ok(FoundCrate::Name(name)) => {
358            let name_ident = format_ident!("{}", name);
359            parse_quote!(::#name_ident)
360        }
361    }
362}
363
364#[cfg(test)]
365mod tests {
366    use crate::cfg::*;
367    use thunderdome::Index;
368
369    impl Cfg {
370        fn send(&mut self, ty: &str) -> Index {
371            self.singleton(Ir::Send(syn::parse_str(ty).unwrap()))
372        }
373
374        fn recv(&mut self, ty: &str) -> Index {
375            self.singleton(Ir::Recv(syn::parse_str(ty).unwrap()))
376        }
377
378        fn type_(&mut self, ty: &str) -> Index {
379            self.singleton(Ir::Type(syn::parse_str(ty).unwrap()))
380        }
381    }
382
383    #[test]
384    fn tally_client_cfg_direct_subst() {
385        let mut cfg = Cfg::new();
386        let client_tally = cfg.singleton(Ir::Loop(None));
387        let client = cfg.singleton(Ir::Loop(None));
388
389        let send = cfg.send("i64");
390        let recv = cfg.recv("i64");
391        let continue0 = cfg.singleton(Ir::Continue(client_tally));
392        cfg[send].next = Some(continue0);
393        let continue1 = cfg.singleton(Ir::Continue(client));
394        cfg[recv].next = Some(continue1);
395        let choose_opts = vec![Some(send), Some(recv)];
396        let choose = cfg.singleton(Ir::Choose(choose_opts));
397
398        cfg[client_tally].expr = Ir::Loop(Some(choose));
399
400        let break0 = cfg.singleton(Ir::Break(client));
401        let send = cfg.send("Operation");
402        cfg[send].next = Some(client_tally);
403        let choose_opts = vec![Some(break0), Some(send)];
404        let choose = cfg.singleton(Ir::Choose(choose_opts));
405
406        cfg[client].expr = Ir::Loop(Some(choose));
407
408        let s = format!("{}", cfg.generate_target(Some(client)).unwrap());
409        assert_eq!(s, "Loop<Choose<(Done, Send<Operation, Loop<Choose<(Send<i64, Continue<0>>, Recv<i64, Continue<1>>)>>>)>>");
410    }
411
412    #[test]
413    fn tally_client_cfg_call() {
414        let mut cfg = Cfg::new();
415        let client = cfg.singleton(Ir::Loop(None));
416        let break0 = cfg.singleton(Ir::Break(client));
417        let send = cfg.send("Operation");
418        let callee = cfg.type_("ClientTally");
419        let call = cfg.singleton(Ir::Call(Some(callee)));
420        cfg[send].next = Some(call);
421        let continue0 = cfg.singleton(Ir::Continue(client));
422        cfg[call].next = Some(continue0);
423        let choose_opts = vec![Some(break0), Some(send)];
424        let choose = cfg.singleton(Ir::Choose(choose_opts));
425
426        cfg[client].expr = Ir::Loop(Some(choose));
427
428        let s = format!("{}", cfg.generate_target(Some(client)).unwrap());
429        assert_eq!(
430            s,
431            "Loop<Choose<(Done, Send<Operation, Call<ClientTally, Continue<0>>>)>>"
432        );
433    }
434}