1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
//! `cstree` is a generic library for creating and working with concrete syntax trees.
//! "Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which represent information
//! about the source text of a document and reduce this information to the minimal amount necessary to correctly
//! interpret it. In contrast, CSTs are lossless representations of the entire input where all tree nodes are
//! represented uniformly (i.e. the nodes are _untyped_), but include a [`SyntaxKind`] field to determine the kind of
//! node.
//! One of the big advantages of this representation is not only that it can recreate the original source exactly, but
//! also that it lends itself very well to the representation of _incomplete or erroneous_ trees and is thus very suited
//! for usage in contexts such as IDEs.
//!
//! The concept of and the data structures for CSTs are inspired in part by Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
//! Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source text in _position
//! independent_ green nodes. Tokens and nodes that appear identically at multiple places in the source text are
//! _deduplicated_ in this representation in order to store the tree efficiently. This means that the green tree may not
//! structurally be a tree. To remedy this, the actual syntax tree is constructed on top of the green tree as a
//! secondary tree (called _red_ tree), which models the exact source structure.

//! The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/), developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who wrote up a conceptual overview of their implementation in [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
//! Notable differences of `cstree` compared to `rowan`:
//! - Syntax trees (red trees) are created lazily, but are persistent. Once a node has been created,
//!   it will remain allocated, while `rowan` re-creates the red layer on the fly. Apart from the
//!   trade-off discussed
//!   [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#memoized-rednodes),
//!   this helps to achieve good tree traversal speed while providing the next points:
//! - Syntax (red) nodes are `Send` and `Sync`, allowing to share realized trees across threads. This is achieved by
//!   atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count
//!   individual nodes (helping with the point above).
//! - Syntax nodes can hold custom data.
//! - `cstree` trees are trees over interned strings. This means `cstree` will deduplicate the text
//!   of tokens such as identifiers with the same name. In this position, `rowan` stores each string,
//!   with a small string optimization (see [`SmolStr`](https://crates.io/crates/smol_str)).
//! - Performance optimizations for tree creation: only allocate new nodes on the heap if they are not in cache, avoid
//!   recursively hashing subtrees
//! - Performance optimizations for tree traversal: persisting red nodes allows tree traversal methods to return
//!   references. You can still `clone` to obtain an owned node, but you only pay that cost when you need to.
//!
//! ## Getting Started
//! The main entry points for constructing syntax trees are [`GreenNodeBuilder`] and [`SyntaxNode::new_root`] for green
//! and red trees respectively. See `examples/s_expressions.rs` for a guided tutorial to `cstree`.
//!
//! ## AST Layer
//! While `cstree` is built for concrete syntax trees, applications are quite easily able to work with either a CST or
//! an AST representation, or freely switch between them. To do so, use `cstree` to build syntax and underlying green
//! tree and provide AST wrappers for your different kinds of nodes. An example of how this is done can be seen [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated.rs) and [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated/nodes.rs) (note that the latter file is automatically generated by a task).

#![forbid(
    // missing_debug_implementations,
    unconditional_recursion,
    future_incompatible,
)]
#![deny(unsafe_code, missing_docs)]

#[allow(unsafe_code)]
mod green;
#[allow(unsafe_code)]
pub mod syntax;

#[cfg(feature = "serde1")]
mod serde_impls;
#[allow(missing_docs)]
mod utility_types;

/// Types and Traits for efficient String storage and deduplication.
pub mod interning {
    pub use crate::green::TokenInterner;
    pub use lasso::{Interner, IntoReader, IntoReaderAndResolver, IntoResolver, Reader, Resolver};
}
use std::fmt;

// Reexport types for working with strings.
pub use text_size::{TextLen, TextRange, TextSize};

pub use crate::{
    green::{Checkpoint, Children, GreenNode, GreenNodeBuilder, GreenToken, NodeCache, SyntaxKind},
    syntax::*,
    utility_types::{Direction, NodeOrToken, TokenAtOffset, WalkEvent},
};
pub use triomphe::Arc;

/// The `Language` trait is the bridge between the internal `cstree` representation and your language
/// types.
/// This is essential to providing a [`SyntaxNode`] API that can be used with your types, as in the
/// `s_expressions` example:
/// ```
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
/// # #[allow(non_camel_case_types)]
/// #[repr(u16)]
/// enum SyntaxKind {
///     ROOT,       // top-level node
///     ATOM,       // `+`, `15`
///     WHITESPACE, // whitespaces is explicit
///     #[doc(hidden)]
///     __LAST,
/// }
/// use SyntaxKind::*;
///
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
/// enum Lang {}
///
/// impl cstree::Language for Lang {
///     type Kind = SyntaxKind;
///
///     fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
///         assert!(raw.0 <= __LAST as u16);
///         unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
///     }
///
///     fn kind_to_raw(kind: Self::Kind) -> cstree::SyntaxKind {
///         cstree::SyntaxKind(kind as u16)
///     }
/// }
/// ```
pub trait Language: Sized + Clone + Copy + fmt::Debug + Eq + Ord + std::hash::Hash {
    /// A type that represents what items in your Language can be.
    /// Typically, this is an `enum` with variants such as `Identifier`, `Literal`, ...
    type Kind: fmt::Debug;

    /// Construct a semantic item kind from the compact representation.
    fn kind_from_raw(raw: SyntaxKind) -> Self::Kind;

    /// Convert a semantic item kind into a more compact representation.
    fn kind_to_raw(kind: Self::Kind) -> SyntaxKind;
}