1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
//! `cstree` is a generic library for creating and working with concrete syntax trees (CSTs).
//!
//! "Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which
//! represent different syntactical elements of the source text of a document and reduce its
//! information to the minimal amount necessary to correctly interpret it. In contrast, CSTs are
//! lossless representations of the entire input where all tree nodes are represented homogeneously
//! (i.e., the nodes are _untyped_), but are tagged with a [`RawSyntaxKind`]  to determine the kind
//! of grammatical element they represent.
//!
//! One big advantage of this representation is that it cannot only recreate the original source
//! exactly, but also lends itself very well to the representation of _incomplete or erroneous_
//! trees and is thus highly suited for usage in contexts such as IDEs or any other application
//! where a user is _editing_ the source text.
//!
//! The concept of and the data structures for `cstree`'s syntax trees are inspired in part by
//! Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
//! Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source
//! text as position independent green nodes. Tokens and nodes that appear identically at multiple
//! places in the source are deduplicated in this representation in order to store the tree
//! efficiently. This means that a green tree may not actually structurally be a tree. To remedy
//! this, the real syntax tree is constructed on top of the green tree as a secondary tree (called
//! the _red_ tree), which models the exact source structure.
//! As a possible third layer, a strongly typed AST [can be built] on top of the red tree.
//!
//! [can be built]: #ast-layer
//!
//! The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/),
//! developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who
//! wrote up a conceptual overview of their implementation in
//! [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
//! Notable differences of `cstree` compared to `rowan`:
//!
//! - Syntax trees (red trees) are created lazily, but are persistent. Once a red node has been
//!   created by `cstree`, it will remain allocated. In contrast, `rowan` re-creates the red layer on
//!   the fly on each traversal of the tree. Apart from the trade-off discussed
//!   [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#memoized-rednodes),
//!   this helps to achieve good tree traversal speed while helping to provide the following:
//! - Syntax (red) nodes are `Send` and `Sync`, allowing to share realized trees across threads. This is achieved by
//!   atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count
//!   individual nodes.
//! - [`SyntaxNode`](syntax::SyntaxNode)s can hold custom data.
//! - `cstree` trees are trees over interned strings. This means `cstree` will deduplicate the text of tokens with the
//!   same source string, such as identifiers with the same name. In this position, `rowan` stores each token's text
//!   together with its metadata as a custom DST (dynamically-sized type).
//! - `cstree` includes some performance optimizations for tree creation: it only allocates space for new nodes on the
//!   heap if they are not in cache and avoids recursively hashing subtrees by pre-hashing them.
//! - `cstree` includes some performance optimizations for tree traversal: persisting red nodes allows tree traversal
//!   methods to return references instead of cloning nodes, which involves updating the tree's reference count. You can
//!   still `clone` the reference to obtain an owned node, but you only pay that cost when you need to.
//! - The downside of offering thread safe syntax trees is that `cstree` cannot offer any mutability API for its CSTs.
//!   Trees can still be updated into new trees through [replacing] nodes, but cannot be mutated in place.
//!
//! [replacing]: syntax::SyntaxNode::replace_with
//!
//! ## Getting Started
//! If you're looking at `cstree`, you're probably looking at or already writing a parser and are considering using
//! concrete syntax trees as its output. We'll talk more about parsing below -- first, let's have a look at what needs
//! to happen to go from input text to a `cstree` syntax tree:
//!
//!  1. Define an enumeration of the types of tokens (like keywords) and nodes (like "an expression") that you want to
//! have in your syntax and implement [`Syntax`]
//!
//!  2. Create a [`GreenNodeBuilder`](build::GreenNodeBuilder) and call
//! [`start_node`](build::GreenNodeBuilder::start_node), [`token`](build::GreenNodeBuilder::token) and
//! [`finish_node`](build::GreenNodeBuilder::finish_node) from your parser  
//!
//!  3. Call [`SyntaxNode::new_root`](syntax::SyntaxNode::new_root) or
//! [`SyntaxNode::new_root_with_resolver`](syntax::SyntaxNode::new_root_with_resolver) with the resulting
//! [`GreenNode`](green::GreenNode) to obtain a syntax tree that you can traverse
//!
//! There's a full [getting started guide] that walks through each of the above steps in detail in the documentation for
//! the `getting_started` module. The walkthrough goes through the necessary steps bit by bit and skips the lexer, but
//! the full code plus a runnable interpreter for the small walkthrough language are available in the `readme` example.
//! [Additional examples] can be found in the `examples/` folder in the repository.
//! A good starting point is the `s_expressions` example, which implements a parser for a small S-Expression language
//! with guiding comments.
//!
//! [getting started guide]: getting_started/index.html
//! [Additional examples]: https://github.com/domenicquirl/cstree/tree/master/cstree/examples
//!
//! ## License
//!
//! `cstree` is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
//!
//! See `LICENSE-APACHE` and `LICENSE-MIT` for details.

#![forbid(missing_debug_implementations, unconditional_recursion)]
#![deny(unsafe_code, future_incompatible)]
#![allow(unstable_name_collisions)] // strict provenance - must come after `future_incompatible` to take precedence
#![warn(missing_docs)]
// Docs.rs
#![doc(html_root_url = "https://docs.rs/cstree/0.12.0")]
#![cfg_attr(doc_cfg, feature(doc_cfg))]

pub mod getting_started;

#[allow(unsafe_code)]
pub mod green;
#[allow(unsafe_code)]
pub mod syntax;

#[allow(unsafe_code)]
pub mod interning;

#[cfg(feature = "serialize")]
mod serde_impls;
#[allow(missing_docs)]
mod utility_types;

use std::fmt;

/// `RawSyntaxKind` is a type tag for each token or node.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct RawSyntaxKind(pub u32);

/// Typesafe representations of text ranges and sizes.
pub mod text {
    pub use crate::syntax::SyntaxText;
    pub use text_size::{TextLen, TextRange, TextSize};
}

/// A tree builder for the construction of syntax trees.
///
/// Please refer to the documentation on [`GreenNodeBuilder`](build::GreenNodeBuilder) itself and the ["getting started"
/// section](../index.html#getting-started) from the top-level documentation for an introduction to how to build a
/// syntax tree.
pub mod build {
    pub use crate::green::builder::{Checkpoint, GreenNodeBuilder, NodeCache};
}

/// A convenient collection of the most used parts of `cstree`.
pub mod prelude {
    pub use crate::{
        build::GreenNodeBuilder,
        green::{GreenNode, GreenToken},
        syntax::{SyntaxElement, SyntaxNode, SyntaxToken},
        RawSyntaxKind, Syntax,
    };
}

/// Types for syntax tree traversal / moving through trees.
pub mod traversal {
    pub use crate::utility_types::{Direction, WalkEvent};
}

/// Utility types. It shouldn't be needed to reference these directly, but they are returned in several places in
/// `cstree` and may come in handy.
pub mod util {
    pub use crate::utility_types::{NodeOrToken, TokenAtOffset};
}

/// Synchronization primitives.
pub mod sync {
    /// An atomically reference counted shared pointer.
    ///
    /// This is like [`Arc`](std::sync::Arc) in the standard library, but more efficient for how `cstree` stores
    /// syntax trees internally. This Arc does not support weak reference counting.
    pub use triomphe::Arc;
}

/// A type that represents what items in your language can be.
/// Typically, this is an `enum` with variants such as `Identifier`, `Literal`, ...
///
/// The `Syntax` trait is the bridge between the internal `cstree` representation and your
/// language's types.
/// This is essential for providing a [`SyntaxNode`] API that can be used with your types, as in the
/// `s_expressions` example:
///
/// ```
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, cstree::Syntax)]
/// # #[allow(non_camel_case_types)]
/// #[repr(u32)]
/// enum SyntaxKind {
///     #[static_text("+")]
///     Plus, // `+`
///     #[static_text("-")]
///     Minus, // `-`
///     Integer,    // like `15`
///     Expression, // combined expression, like `5 + 4 - 3`
///     Whitespace, // whitespace is explicit
/// }
/// ```
///
/// `cstree` provides a procedural macro called `cstree_derive` to automatically generate `Syntax` implementations for
/// syntax kind enums if its `derive` feature is enabled.
///
/// [`SyntaxNode`]: crate::syntax::SyntaxNode
pub trait Syntax: Sized + Copy + fmt::Debug + Eq {
    /// Construct a semantic item kind from the compact representation.
    fn from_raw(raw: RawSyntaxKind) -> Self;

    /// Convert a semantic item kind into a more compact representation.
    fn into_raw(self) -> RawSyntaxKind;

    /// Fixed text for a particular syntax kind.
    /// Implement for kinds that will only ever represent the same text, such as punctuation (like a
    /// semicolon), keywords (like `fn`), or operators (like `<=`).
    ///
    /// Indicating tokens that have a `static_text` this way allows `cstree` to store them more efficiently, which makes
    /// it faster to add them to a syntax tree and to look up their text. Since there can often be many occurrences
    /// of these tokens inside a file, doing so will improve the performance of using `cstree`.
    fn static_text(self) -> Option<&'static str>;
}

#[cfg(feature = "derive")]
#[allow(unused_imports)]
#[macro_use]
extern crate cstree_derive;

#[cfg(feature = "derive")]
/// Derive macro available if `cstree` is built with `features = ["derive"]`.
pub use cstree_derive::Syntax;

#[doc(hidden)]
#[allow(unsafe_code, unused)]
pub mod testing {
    pub use crate::prelude::*;
    pub fn parse<S: Syntax, I>(_b: &mut GreenNodeBuilder<S, I>, _s: &str) {}

    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    #[repr(u32)]
    #[allow(non_camel_case_types)]
    pub enum TestSyntaxKind {
        Plus,
        Identifier,
        Int,
        Float,
        Operation,
        Root,
        Whitespace,
        __LAST,
    }
    pub type MySyntax = TestSyntaxKind;
    pub use TestSyntaxKind::*;

    impl Syntax for TestSyntaxKind {
        fn from_raw(raw: RawSyntaxKind) -> Self {
            assert!(raw.0 <= TestSyntaxKind::__LAST as u32);
            unsafe { std::mem::transmute::<u32, Self>(raw.0) }
        }

        fn into_raw(self) -> RawSyntaxKind {
            RawSyntaxKind(self as u32)
        }

        fn static_text(self) -> Option<&'static str> {
            match self {
                TestSyntaxKind::Plus => Some("+"),
                _ => None,
            }
        }
    }
}