1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
//! `cstree` is a generic library for creating and working with concrete syntax trees (CSTs).
//!
//! "Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which
//! represent different syntactical elements of the source text of a document and reduce its
//! information to the minimal amount necessary to correctly interpret it. In contrast, CSTs are
//! lossless representations of the entire input where all tree nodes are represented homogeneously
//! (i.e., the nodes are _untyped_), but are tagged with a [`RawSyntaxKind`] to determine the kind
//! of grammatical element they represent.
//!
//! One big advantage of this representation is that it cannot only recreate the original source
//! exactly, but also lends itself very well to the representation of _incomplete or erroneous_
//! trees and is thus highly suited for usage in contexts such as IDEs or any other application
//! where a user is _editing_ the source text.
//!
//! The concept of and the data structures for `cstree`'s syntax trees are inspired in part by
//! Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
//! Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source
//! text as position independent green nodes. Tokens and nodes that appear identically at multiple
//! places in the source are deduplicated in this representation in order to store the tree
//! efficiently. This means that a green tree may not actually structurally be a tree. To remedy
//! this, the real syntax tree is constructed on top of the green tree as a secondary tree (called
//! the _red_ tree), which models the exact source structure.
//! As a possible third layer, a strongly typed AST [can be built] on top of the red tree.
//!
//! [can be built]: #ast-layer
//!
//! The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/),
//! developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who
//! wrote up a conceptual overview of their implementation in
//! [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
//! Notable differences of `cstree` compared to `rowan`:
//!
//! - Syntax trees (red trees) are created lazily, but are persistent. Once a red node has been
//! created by `cstree`, it will remain allocated. In contrast, `rowan` re-creates the red layer on
//! the fly on each traversal of the tree. Apart from the trade-off discussed
//! [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#memoized-rednodes),
//! this helps to achieve good tree traversal speed while helping to provide the following:
//! - Syntax (red) nodes are `Send` and `Sync`, allowing to share realized trees across threads. This is achieved by
//! atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count
//! individual nodes.
//! - [`SyntaxNode`](syntax::SyntaxNode)s can hold custom data.
//! - `cstree` trees are trees over interned strings. This means `cstree` will deduplicate the text of tokens with the
//! same source string, such as identifiers with the same name. In this position, `rowan` stores each token's text
//! together with its metadata as a custom DST (dynamically-sized type).
//! - `cstree` includes some performance optimizations for tree creation: it only allocates space for new nodes on the
//! heap if they are not in cache and avoids recursively hashing subtrees by pre-hashing them.
//! - `cstree` includes some performance optimizations for tree traversal: persisting red nodes allows tree traversal
//! methods to return references instead of cloning nodes, which involves updating the tree's reference count. You can
//! still `clone` the reference to obtain an owned node, but you only pay that cost when you need to.
//! - The downside of offering thread safe syntax trees is that `cstree` cannot offer any mutability API for its CSTs.
//! Trees can still be updated into new trees through [replacing] nodes, but cannot be mutated in place.
//!
//! [replacing]: syntax::SyntaxNode::replace_with
//!
//! ## Getting Started
//! If you're looking at `cstree`, you're probably looking at or already writing a parser and are considering using
//! concrete syntax trees as its output. We'll talk more about parsing below -- first, let's have a look at what needs
//! to happen to go from input text to a `cstree` syntax tree:
//!
//! 1. Define an enumeration of the types of tokens (like keywords) and nodes (like "an expression") that you want to
//! have in your syntax and implement [`Syntax`]
//!
//! 2. Create a [`GreenNodeBuilder`](build::GreenNodeBuilder) and call
//! [`start_node`](build::GreenNodeBuilder::start_node), [`token`](build::GreenNodeBuilder::token) and
//! [`finish_node`](build::GreenNodeBuilder::finish_node) from your parser
//!
//! 3. Call [`SyntaxNode::new_root`](syntax::SyntaxNode::new_root) or
//! [`SyntaxNode::new_root_with_resolver`](syntax::SyntaxNode::new_root_with_resolver) with the resulting
//! [`GreenNode`](green::GreenNode) to obtain a syntax tree that you can traverse
//!
//! There's a full [getting started guide] that walks through each of the above steps in detail in the documentation for
//! the `getting_started` module. The walkthrough goes through the necessary steps bit by bit and skips the lexer, but
//! the full code plus a runnable interpreter for the small walkthrough language are available in the `readme` example.
//! [Additional examples] can be found in the `examples/` folder in the repository.
//! A good starting point is the `s_expressions` example, which implements a parser for a small S-Expression language
//! with guiding comments.
//!
//! [getting started guide]: getting_started/index.html
//! [Additional examples]: https://github.com/domenicquirl/cstree/tree/master/cstree/examples
//!
//! ## License
//!
//! `cstree` is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
//!
//! See `LICENSE-APACHE` and `LICENSE-MIT` for details.
#![forbid(missing_debug_implementations, unconditional_recursion)]
#![deny(unsafe_code, future_incompatible)]
#![allow(unstable_name_collisions)] // strict provenance - must come after `future_incompatible` to take precedence
#![warn(missing_docs)]
// Docs.rs
#![doc(html_root_url = "https://docs.rs/cstree/0.12.0")]
#![cfg_attr(doc_cfg, feature(doc_cfg))]
pub mod getting_started;
#[allow(unsafe_code)]
pub mod green;
#[allow(unsafe_code)]
pub mod syntax;
#[allow(unsafe_code)]
pub mod interning;
#[cfg(feature = "serialize")]
mod serde_impls;
#[allow(missing_docs)]
mod utility_types;
use std::fmt;
/// `RawSyntaxKind` is a type tag for each token or node.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct RawSyntaxKind(pub u32);
/// Typesafe representations of text ranges and sizes.
pub mod text {
pub use crate::syntax::SyntaxText;
pub use text_size::{TextLen, TextRange, TextSize};
}
/// A tree builder for the construction of syntax trees.
///
/// Please refer to the documentation on [`GreenNodeBuilder`](build::GreenNodeBuilder) itself and the ["getting started"
/// section](../index.html#getting-started) from the top-level documentation for an introduction to how to build a
/// syntax tree.
pub mod build {
pub use crate::green::builder::{Checkpoint, GreenNodeBuilder, NodeCache};
}
/// A convenient collection of the most used parts of `cstree`.
pub mod prelude {
pub use crate::{
build::GreenNodeBuilder,
green::{GreenNode, GreenToken},
syntax::{SyntaxElement, SyntaxNode, SyntaxToken},
RawSyntaxKind, Syntax,
};
}
/// Types for syntax tree traversal / moving through trees.
pub mod traversal {
pub use crate::utility_types::{Direction, WalkEvent};
}
/// Utility types. It shouldn't be needed to reference these directly, but they are returned in several places in
/// `cstree` and may come in handy.
pub mod util {
pub use crate::utility_types::{NodeOrToken, TokenAtOffset};
}
/// Synchronization primitives.
pub mod sync {
/// An atomically reference counted shared pointer.
///
/// This is like [`Arc`](std::sync::Arc) in the standard library, but more efficient for how `cstree` stores
/// syntax trees internally. This Arc does not support weak reference counting.
pub use triomphe::Arc;
}
/// A type that represents what items in your language can be.
/// Typically, this is an `enum` with variants such as `Identifier`, `Literal`, ...
///
/// The `Syntax` trait is the bridge between the internal `cstree` representation and your
/// language's types.
/// This is essential for providing a [`SyntaxNode`] API that can be used with your types, as in the
/// `s_expressions` example:
///
/// ```
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, cstree::Syntax)]
/// # #[allow(non_camel_case_types)]
/// #[repr(u32)]
/// enum SyntaxKind {
/// #[static_text("+")]
/// Plus, // `+`
/// #[static_text("-")]
/// Minus, // `-`
/// Integer, // like `15`
/// Expression, // combined expression, like `5 + 4 - 3`
/// Whitespace, // whitespace is explicit
/// }
/// ```
///
/// `cstree` provides a procedural macro called `cstree_derive` to automatically generate `Syntax` implementations for
/// syntax kind enums if its `derive` feature is enabled.
///
/// [`SyntaxNode`]: crate::syntax::SyntaxNode
pub trait Syntax: Sized + Copy + fmt::Debug + Eq {
/// Construct a semantic item kind from the compact representation.
fn from_raw(raw: RawSyntaxKind) -> Self;
/// Convert a semantic item kind into a more compact representation.
fn into_raw(self) -> RawSyntaxKind;
/// Fixed text for a particular syntax kind.
/// Implement for kinds that will only ever represent the same text, such as punctuation (like a
/// semicolon), keywords (like `fn`), or operators (like `<=`).
///
/// Indicating tokens that have a `static_text` this way allows `cstree` to store them more efficiently, which makes
/// it faster to add them to a syntax tree and to look up their text. Since there can often be many occurrences
/// of these tokens inside a file, doing so will improve the performance of using `cstree`.
fn static_text(self) -> Option<&'static str>;
}
#[cfg(feature = "derive")]
#[allow(unused_imports)]
#[macro_use]
extern crate cstree_derive;
#[cfg(feature = "derive")]
/// Derive macro available if `cstree` is built with `features = ["derive"]`.
pub use cstree_derive::Syntax;
#[doc(hidden)]
#[allow(unsafe_code, unused)]
pub mod testing {
pub use crate::prelude::*;
pub fn parse<S: Syntax, I>(_b: &mut GreenNodeBuilder<S, I>, _s: &str) {}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(u32)]
#[allow(non_camel_case_types)]
pub enum TestSyntaxKind {
Plus,
Identifier,
Int,
Float,
Operation,
Root,
Whitespace,
__LAST,
}
pub type MySyntax = TestSyntaxKind;
pub use TestSyntaxKind::*;
impl Syntax for TestSyntaxKind {
fn from_raw(raw: RawSyntaxKind) -> Self {
assert!(raw.0 <= TestSyntaxKind::__LAST as u32);
unsafe { std::mem::transmute::<u32, Self>(raw.0) }
}
fn into_raw(self) -> RawSyntaxKind {
RawSyntaxKind(self as u32)
}
fn static_text(self) -> Option<&'static str> {
match self {
TestSyntaxKind::Plus => Some("+"),
_ => None,
}
}
}
}