cstree/
lib.rs

1//! `cstree` is a generic library for creating and working with concrete syntax trees (CSTs).
2//!
3//! "Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which
4//! represent different syntactical elements of the source text of a document and reduce its
5//! information to the minimal amount necessary to correctly interpret it. In contrast, CSTs are
6//! lossless representations of the entire input where all tree nodes are represented homogeneously
7//! (i.e., the nodes are _untyped_), but are tagged with a [`RawSyntaxKind`]  to determine the kind
8//! of grammatical element they represent.
9//!
10//! One big advantage of this representation is that it cannot only recreate the original source
11//! exactly, but also lends itself very well to the representation of _incomplete or erroneous_
12//! trees and is thus highly suited for usage in contexts such as IDEs or any other application
13//! where a user is _editing_ the source text.
14//!
15//! The concept of and the data structures for `cstree`'s syntax trees are inspired in part by
16//! Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
17//! Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source
18//! text as position independent green nodes. Tokens and nodes that appear identically at multiple
19//! places in the source are deduplicated in this representation in order to store the tree
20//! efficiently. This means that a green tree may not actually structurally be a tree. To remedy
21//! this, the real syntax tree is constructed on top of the green tree as a secondary tree (called
22//! the _red_ tree), which models the exact source structure.
23//! As a possible third layer, a strongly typed AST [can be built] on top of the red tree.
24//!
25//! [can be built]: #ast-layer
26//!
27//! The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/),
28//! developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who
29//! wrote up a conceptual overview of their implementation in
30//! [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
31//! Notable differences of `cstree` compared to `rowan`:
32//!
33//! - Syntax trees (red trees) are created lazily, but are persistent. Once a red node has been
34//!   created by `cstree`, it will remain allocated. In contrast, `rowan` re-creates the red layer on
35//!   the fly on each traversal of the tree. Apart from the trade-off discussed
36//!   [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#memoized-rednodes),
37//!   this helps to achieve good tree traversal speed while helping to provide the following:
38//! - Syntax (red) nodes are `Send` and `Sync`, allowing to share realized trees across threads. This is achieved by
39//!   atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count
40//!   individual nodes.
41//! - [`SyntaxNode`](syntax::SyntaxNode)s can hold custom data.
42//! - `cstree` trees are trees over interned strings. This means `cstree` will deduplicate the text of tokens with the
43//!   same source string, such as identifiers with the same name. In this position, `rowan` stores each token's text
44//!   together with its metadata as a custom DST (dynamically-sized type).
45//! - `cstree` includes some performance optimizations for tree creation: it only allocates space for new nodes on the
46//!   heap if they are not in cache and avoids recursively hashing subtrees by pre-hashing them.
47//! - `cstree` includes some performance optimizations for tree traversal: persisting red nodes allows tree traversal
48//!   methods to return references instead of cloning nodes, which involves updating the tree's reference count. You can
49//!   still `clone` the reference to obtain an owned node, but you only pay that cost when you need to.
50//! - The downside of offering thread safe syntax trees is that `cstree` cannot offer any mutability API for its CSTs.
51//!   Trees can still be updated into new trees through [replacing] nodes, but cannot be mutated in place.
52//!
53//! [replacing]: syntax::SyntaxNode::replace_with
54//!
55//! ## Getting Started
56//! If you're looking at `cstree`, you're probably looking at or already writing a parser and are considering using
57//! concrete syntax trees as its output. We'll talk more about parsing below -- first, let's have a look at what needs
58//! to happen to go from input text to a `cstree` syntax tree:
59//!
60//!  1. Define an enumeration of the types of tokens (like keywords) and nodes (like "an expression") that you want to
61//!     have in your syntax and implement [`Syntax`]
62//!
63//!  2. Create a [`GreenNodeBuilder`](build::GreenNodeBuilder) and call
64//!     [`start_node`](build::GreenNodeBuilder::start_node), [`token`](build::GreenNodeBuilder::token) and
65//!     [`finish_node`](build::GreenNodeBuilder::finish_node) from your parser
66//!
67//!  3. Call [`SyntaxNode::new_root`](syntax::SyntaxNode::new_root) or
68//!     [`SyntaxNode::new_root_with_resolver`](syntax::SyntaxNode::new_root_with_resolver) with the resulting
69//!     [`GreenNode`](green::GreenNode) to obtain a syntax tree that you can traverse
70//!
71//! There's a full [getting started guide] that walks through each of the above steps in detail in the documentation for
72//! the `getting_started` module. The walkthrough goes through the necessary steps bit by bit and skips the lexer, but
73//! the full code plus a runnable interpreter for the small walkthrough language are available in the `readme` example.
74//! [Additional examples] can be found in the `examples/` folder in the repository.
75//! A good starting point is the `s_expressions` example, which implements a parser for a small S-Expression language
76//! with guiding comments.
77//!
78//! [getting started guide]: getting_started/index.html
79//! [Additional examples]: https://github.com/domenicquirl/cstree/tree/master/cstree/examples
80//!
81//! ## License
82//!
83//! `cstree` is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
84//!
85//! See `LICENSE-APACHE` and `LICENSE-MIT` for details.
86
87#![forbid(missing_debug_implementations, unconditional_recursion)]
88#![deny(unsafe_code, future_incompatible)]
89#![allow(
90    unstable_name_collisions, // strict provenance - must come after `future_incompatible` to take precedence
91    unexpected_cfgs, // nightly docs.rs features and `salsa-2022` feature until that is figured out
92    clippy::duplicated_attributes, // interning modules
93)]
94#![warn(missing_docs)]
95// Docs.rs
96#![doc(html_root_url = "https://docs.rs/cstree/0.13.0")]
97#![cfg_attr(doc_cfg, feature(doc_cfg))]
98
99pub mod getting_started;
100
101#[allow(unsafe_code)]
102pub mod green;
103#[allow(unsafe_code)]
104pub mod syntax;
105
106#[allow(unsafe_code)]
107pub mod interning;
108
109#[cfg(feature = "serialize")]
110mod serde_impls;
111#[allow(missing_docs)]
112mod utility_types;
113
114use std::fmt;
115
116/// `RawSyntaxKind` is a type tag for each token or node.
117#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
118pub struct RawSyntaxKind(pub u32);
119
120/// Typesafe representations of text ranges and sizes.
121pub mod text {
122    pub use crate::syntax::SyntaxText;
123    pub use text_size::{TextLen, TextRange, TextSize};
124}
125
126/// A tree builder for the construction of syntax trees.
127///
128/// Please refer to the documentation on [`GreenNodeBuilder`](build::GreenNodeBuilder) itself and the ["getting started"
129/// section](../index.html#getting-started) from the top-level documentation for an introduction to how to build a
130/// syntax tree.
131pub mod build {
132    pub use crate::green::builder::{Checkpoint, GreenNodeBuilder, NodeCache};
133}
134
135/// A convenient collection of the most used parts of `cstree`.
136pub mod prelude {
137    pub use crate::{
138        RawSyntaxKind,
139        Syntax,
140        build::GreenNodeBuilder,
141        green::{GreenNode, GreenToken},
142        syntax::{SyntaxElement, SyntaxNode, SyntaxToken},
143    };
144}
145
146/// Types for syntax tree traversal / moving through trees.
147pub mod traversal {
148    pub use crate::utility_types::{Direction, WalkEvent};
149}
150
151/// Utility types. It shouldn't be needed to reference these directly, but they are returned in several places in
152/// `cstree` and may come in handy.
153pub mod util {
154    pub use crate::utility_types::{NodeOrToken, TokenAtOffset};
155}
156
157/// Synchronization primitives.
158pub mod sync {
159    /// An atomically reference counted shared pointer.
160    ///
161    /// This is like [`Arc`](std::sync::Arc) in the standard library, but more efficient for how `cstree` stores
162    /// syntax trees internally. This Arc does not support weak reference counting.
163    pub use triomphe::Arc;
164}
165
166/// A type that represents what items in your language can be.
167/// Typically, this is an `enum` with variants such as `Identifier`, `Literal`, ...
168///
169/// The `Syntax` trait is the bridge between the internal `cstree` representation and your
170/// language's types.
171/// This is essential for providing a [`SyntaxNode`] API that can be used with your types, as in the
172/// `s_expressions` example:
173///
174/// ```
175/// #[derive(Debug, Clone, Copy, PartialEq, Eq, cstree::Syntax)]
176/// # #[allow(non_camel_case_types)]
177/// #[repr(u32)]
178/// enum SyntaxKind {
179///     #[static_text("+")]
180///     Plus, // `+`
181///     #[static_text("-")]
182///     Minus, // `-`
183///     Integer,    // like `15`
184///     Expression, // combined expression, like `5 + 4 - 3`
185///     Whitespace, // whitespace is explicit
186/// }
187/// ```
188///
189/// `cstree` provides a procedural macro called `cstree_derive` to automatically generate `Syntax` implementations for
190/// syntax kind enums if its `derive` feature is enabled.
191///
192/// [`SyntaxNode`]: crate::syntax::SyntaxNode
193pub trait Syntax: Sized + Copy + fmt::Debug + Eq {
194    /// Construct a semantic item kind from the compact representation.
195    fn from_raw(raw: RawSyntaxKind) -> Self;
196
197    /// Convert a semantic item kind into a more compact representation.
198    fn into_raw(self) -> RawSyntaxKind;
199
200    /// Fixed text for a particular syntax kind.
201    /// Implement for kinds that will only ever represent the same text, such as punctuation (like a
202    /// semicolon), keywords (like `fn`), or operators (like `<=`).
203    ///
204    /// Indicating tokens that have a `static_text` this way allows `cstree` to store them more efficiently, which makes
205    /// it faster to add them to a syntax tree and to look up their text. Since there can often be many occurrences
206    /// of these tokens inside a file, doing so will improve the performance of using `cstree`.
207    fn static_text(self) -> Option<&'static str>;
208}
209
210#[cfg(feature = "derive")]
211#[allow(unused_imports)]
212#[macro_use]
213extern crate cstree_derive;
214
215#[cfg(feature = "derive")]
216/// Derive macro available if `cstree` is built with `features = ["derive"]`.
217pub use cstree_derive::Syntax;
218
219#[doc(hidden)]
220#[allow(unsafe_code, unused)]
221pub mod testing {
222    pub use crate::prelude::*;
223    pub fn parse<S: Syntax, I>(_b: &mut GreenNodeBuilder<S, I>, _s: &str) {}
224
225    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
226    #[repr(u32)]
227    #[allow(non_camel_case_types)]
228    pub enum TestSyntaxKind {
229        Plus,
230        Identifier,
231        Int,
232        Float,
233        Operation,
234        Root,
235        Whitespace,
236        __LAST,
237    }
238    pub type MySyntax = TestSyntaxKind;
239    pub use TestSyntaxKind::*;
240
241    impl Syntax for TestSyntaxKind {
242        fn from_raw(raw: RawSyntaxKind) -> Self {
243            assert!(raw.0 <= TestSyntaxKind::__LAST as u32);
244            unsafe { std::mem::transmute::<u32, Self>(raw.0) }
245        }
246
247        fn into_raw(self) -> RawSyntaxKind {
248            RawSyntaxKind(self as u32)
249        }
250
251        fn static_text(self) -> Option<&'static str> {
252            match self {
253                TestSyntaxKind::Plus => Some("+"),
254                _ => None,
255            }
256        }
257    }
258}