Crate cstree[−][src]
Expand description
cstree
is a generic library for creating and working with concrete syntax trees.
“Traditional” abstract syntax trees (ASTs) usually contain different types of nodes which represent information
about the source text of a document and reduce this information to the minimal amount necessary to correctly
interpret it. In contrast, CSTs are lossless representations of the entire input where all tree nodes are
represented uniformly (i.e. the nodes are untyped), but include a SyntaxKind
field to determine the kind of
node.
One of the big advantages of this representation is not only that it can recreate the original source exactly, but
also that it lends itself very well to the representation of incomplete or erroneous trees and is thus very suited
for usage in contexts such as IDEs.
The concept of and the data structures for CSTs are inspired in part by Swift’s libsyntax.
Trees consist of two layers: the inner tree (called green tree) contains the actual source text in position
independent green nodes. Tokens and nodes that appear identically at multiple places in the source text are
deduplicated in this representation in order to store the tree efficiently. This means that the green tree may not
structurally be a tree. To remedy this, the actual syntax tree is constructed on top of the green tree as a
secondary tree (called red tree), which models the exact source structure.
The cstree
implementation is a fork of the excellent rowan
, developed by the authors of rust-analyzer who wrote up a conceptual overview of their implementation in their repository.
Notable differences of cstree
compared to rowan
:
- Syntax trees (red trees) are created lazily, but are persistent. Once a node has been created,
it will remain allocated, while
rowan
re-creates the red layer on the fly. Apart from the trade-off discussed here, this helps to achieve good tree traversal speed while providing the next points: - Syntax (red) nodes are
Send
andSync
, allowing to share realized trees across threads. This is achieved by atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count individual nodes (helping with the point above). - Syntax nodes can hold custom data.
cstree
trees are trees over interned strings. This meanscstree
will deduplicate the text of tokens such as identifiers with the same name. In this position,rowan
stores each string, with a small string optimization (seeSmolStr
).- Performance optimizations for tree creation: only allocate new nodes on the heap if they are not in cache, avoid recursively hashing subtrees
- Performance optimizations for tree traversal: persisting red nodes allows tree traversal methods to return
references. You can still
clone
to obtain an owned node, but you only pay that cost when you need to.
Getting Started
The main entry points for constructing syntax trees are GreenNodeBuilder
and SyntaxNode::new_root
for green
and red trees respectively. See examples/s_expressions.rs
for a guided tutorial to cstree
.
AST Layer
While cstree
is built for concrete syntax trees, applications are quite easily able to work with either a CST or
an AST representation, or freely switch between them. To do so, use cstree
to build syntax and underlying green
tree and provide AST wrappers for your different kinds of nodes. An example of how this is done can be seen here and here (note that the latter file is automatically generated by a task).
Modules
Types and Traits for efficient String storage and deduplication.
Structs
An atomically reference counted shared pointer
A checkpoint for maybe wrapping a node. See GreenNodeBuilder::checkpoint
for details.
Internal node in the immutable “green” tree. It contains other nodes and tokens as its children.
A builder for green trees.
Construct with new
, with_cache
, or
from_cache
. To add tree nodes, start them with
start_node
, add token
s and then
finish_node
. When the whole tree is constructed, call
finish
to obtain the root.
An iterator over a GreenNode
’s children.
Leaf node in the immutable “green” tree.
A NodeCache
deduplicates identical tokens and small nodes during tree construction.
You can re-use the same cache for multiple similar trees with GreenNodeBuilder::with_cache
.
Syntax tree node that is guaranteed to belong to a tree that contains an associated
Resolver
.
Syntax tree token that is guaranteed to belong to a tree that contains an associated
Resolver
.
An iterator over the children of a SyntaxNode
.
SyntaxKind is a type tag for each token or node.
Inner syntax tree node. Syntax nodes can be shared between threads. Every syntax tree is reference counted as a whole and nodes are pointer-sized, so copying individual nodes is relatively cheap.
An iterator over the child nodes of a SyntaxNode
.
An efficient representation of the text that is covered by a SyntaxNode
, i.e. the combined
source text of all tokens that are descendants of the node.
Syntax tree token.
A range in text, represented as a pair of TextSize
.
A measure of text length. Also, equivalently, an index into text.
Enums
Convenience type to represent tree elements which may either be a node or a token.
There might be zero, one or two leaves at a given offset.
WalkEvent
describes tree walking process.
Traits
The Language
trait is the bridge between the internal cstree
representation and your language
types.
This is essential to providing a SyntaxNode
API that can be used with your types, as in the
s_expressions
example:
Primitives with a textual length that can be passed to TextSize::of
.
Type Definitions
An element of the tree that is guaranteed to belong to a tree that contains an associated
Resolver
, can be either a node or a token.
A reference to an element of the tree that is guaranteed to belong to a tree that contains an
associated Resolver
, can be either a reference to a node or one to a token.
An element of the tree, can be either a node or a token.
A reference to an element of the tree, can be either a reference to a node or one to a token.