Crate stack_graphs

source ·
Expand description

Stack graphs provide a single framework for performing name resolution for any programming language, while abstracting away the specific name resolution rules for each of those languages. The basic idea is to represent the definitions and references in a program using graph. A name binding maps a reference to all of the possible definitions that the reference could refer to. Because we’ve represented definitions and references as a graph, bindings are represented by paths within that graph.

While searching for a path in an incremental stack graph, we keep track of two stacks: a symbol stack and a scope stack. Broadly speaking, the symbol stack keeps track of what symbols we’re trying to resolve, while the scope stack gives us control over which particular scopes we look for those symbols in.

§Relationship to scope graphs

Stack graphs are based in the scope graphs formalism from Eelco Visser’s group at TU Delft. Scope graphs also model the name binding structure of a program using a graph, and uses paths within that graph to represent which definitions a reference might refer to.

Stack graphs add incrementality to scope graphs. An incremental analysis is one where we can reuse analysis results for source code files that haven’t changed. In a non-incremental analysis, we have to reanalyze the entire contents of a repository or package when any file is changed.

As one example, the Scopes as Types paper presents some rewrite rules that let you handle field access in a record or object type — for instance, being able to resolve the foo part of A.foo. In Scopes as Types, we’d handle this by performing the path-finding algorithm when constructing the graph, to find out what A resolves to. Once we find its definition, we then attach the reference to foo to that result.

This is not incremental because the reference to foo “lives” over in some unrelated part of the graph. If A is defined in a different file (or worse, a different package), then we have to update that distant part of the graph with the node representing the reference. We end up cluttering the graph for a class definition with nodes representing all of its field references, which is especially problematic for popular packages with lots of downstream dependencies.

Our key insight is to recognize that when resolving A.foo, we have to “pause” the resolution of foo to start resolving A. Once we’ve found the binding for A, we can resume the original resolution of foo. This describes a stack! So instead of having our path-finding algorithm look for exactly one symbol, we can keep track of a stack of things to look for. To correctly resolve foo, we do still have to eventually make our way over to the part of the graph where A is defined. But by having a stack of path-finding targets, we can resolve the reference to A.foo with a single execution of the path-finding algorithm. And most importantly, each “chunk” of the overall graph only depends on “local” information from the original source file. (a.k.a., it’s incremental!)

Modules§

  • Cache-friendly arena allocation for stack graph data.
  • Defines assertions that can be run against a stack graph.
  • Defines a C API for working with stack graphs in other languages.
  • Detect and avoid cycles in our path-finding algorithm.
  • Defines the structure of a stack graph.
  • Partial paths are “snippets” of paths that we can precalculate for each file that we analyze.
  • Paths represent name bindings in a source language.
  • Partial paths can be “stitched together” to produce name-binding paths.

Macros§

Structs§

Traits§