Expand description
§Abstract Syntax Tree Interface
Typst’s Abstract Syntax Tree (AST) is a lazy, typed view over the untyped
Concrete Syntax Tree (CST) and is rooted in the Markup node.
§The AST is a View
Most AST nodes are wrapper structs around SyntaxNode pointers. This summary
will use a running example of the Raw node type, which is declared (after
macro expansion) as: struct Raw<'a>(&'a SyntaxNode);.
SyntaxNodes are generated by the parser and constitute the Concrete Syntax
Tree (CST). The CST is concrete because it has the property that an in-order
tree traversal will recreate the text of the source file exactly.
SyntaxNodes in the CST contain their SyntaxKind, but don’t themselves
provide access to the semantic meaning of their contents. That semantic meaning
is available through the Abstract Syntax Tree by iterating over CST nodes and
inspecting their contents. The format is prepared ahead-of-time by the parser so
that this module can unpack the abstract meaning from the CST’s structure.
Raw nodes are parsed by recognizing paired backtick delimiters, which you will
find as CST nodes with the RawDelim kind. However, the AST doesn’t include
these delimiters because it abstracts over the backticks. Instead, the parent
raw node will only use its child RawDelim CST nodes to determine whether the
element is a block or inline.
§The AST is Typed
AST nodes all implement the AstNode trait, but nodes can also implement
their own unique methods. These unique methods are the “real” interface of the
AST, and provide access to the abstract, semantic, representation of each kind
of node. For example, the Raw node provides 3 methods that specify its
abstract representation: Raw::lines() returns the raw text as an iterator of
lines, Raw::lang() provides the optionally present RawLang language tag,
and Raw::block() gives a bool for whether the raw element is a block or
inline.
This semantic information is unavailable in the CST. Only by converting a CST
node to an AST struct will Rust let you call a method of that struct. This is a
safe interface because the only way to create an AST node outside this file is
to call AstNode::from_untyped. The node! macro implements from_untyped
by checking the node’s kind before constructing it, returning Some() only if
the kind matches. So we know that it will have the expected children underneath,
otherwise the parser wouldn’t have produced this node.
§The AST is rooted in the Markup node
The AST is rooted in the Markup node, which provides only one method:
Markup::exprs. This returns an iterator of the main Expr enum. Expr
is important because it contains the majority of expressions that Typst will
evaluate. Not just markup, but also math and code expressions. Not all
expression types are available from the parser at every step, but this does
decrease the amount of wrapper enums needed in the AST (and this file is long
enough already).
Expressions also branch off into the remaining tree. You can view enums in this file as edges on a graph: areas where the tree has paths from one type to another (accessed through methods), then structs are the nodes of the graph, providing methods that return enums, etc. etc.
§The AST is Lazy
Being lazy means that the untyped CST nodes are converted to typed AST nodes only as the tree is traversed. If we parse a file and a raw block is contained in a branch of an if-statement that we don’t take, then we won’t pay the cost of creating an iterator over the lines or checking whether it was a block or inline (although it will still be parsed into nodes).
This is also a factor of the current “tree-interpreter” evaluation model. A bytecode interpreter might instead eagerly convert the AST into bytecode, but it would still traverse using this lazy interface. While the tree-interpreter evaluation is straightforward and easy to add new features onto, it has to re-traverse the AST every time a function is evaluated. A bytecode interpreter using the lazy interface would only need to traverse each node once, improving throughput at the cost of initial latency and development flexibility.
Structs§
- Args
- A function call’s argument list:
(12pt, y). - Array
- An array:
(1, "hi", 12cm). - Auto
- The
autoliteral. - Binary
- A binary operation:
a + b. - Bool
- A boolean:
true,false. - Closure
- A closure:
(x, y) => z. - Code
- The body of a code block.
- Code
Block - A code block:
{ let x = 1; x + 2 }. - Conditional
- An if-else conditional:
if x { y } else { z }. - Content
Block - A content block:
[*Hi* there!]. - Contextual
- A contextual expression:
context text.lang. - Destruct
Assignment - An assignment expression
(x, y) = (1, 2). - Destructuring
- A destructuring pattern:
xor(x, _, ..y). - Dict
- A dictionary:
(thickness: 3pt, dash: "solid"). - Emph
- Emphasized content:
_Emphasized_. - Enum
Item - An item in an enumeration (numbered list):
+ ...or1. .... - Equation
- A mathematical equation:
$x$,$ x^2 $. - Escape
- An escape sequence:
\#,\u{1F5FA}. - Field
Access - A field access:
properties.age. - Float
- A floating-point number:
1.2,10e-4. - ForLoop
- A for loop:
for x in y { z }. - Func
Call - An invocation of a function or method:
f(x, y). - Func
Return - A return from a function:
return,return x + 1. - Heading
- A section heading:
= Introduction. - Ident
- An identifier:
it. - Import
Item Path - A path to a submodule’s imported name:
a.b.c. - Import
Items - Items to import from a module:
a, b, c. - Int
- An integer:
120. - Keyed
- A keyed pair:
"spacy key": true. - Label
- A label:
<intro>. - LetBinding
- A let binding:
let x = 1. - Linebreak
- A forced line break:
\. - Link
- A hyperlink:
https://typst.org. - List
Item - An item in a bullet list:
- .... - Loop
Break - A break from a loop:
break. - Loop
Continue - A continue in a loop:
continue. - Markup
- The syntactical root capable of representing a full parsed document.
- Math
- The contents of a mathematical equation:
x^2 + 1. - Math
Align Point - An alignment point in math:
&. - Math
Attach - A base with optional attachments in math:
a_1^2. - Math
Delimited - Matched delimiters in math:
[x + y]. - Math
Frac - A fraction in math:
x/2 - Math
Ident - An identifier in math:
pi. - Math
Primes - Grouped primes in math:
a'''. - Math
Root - A root in math:
√x,∛xor∜x. - Math
Shorthand - A shorthand for a unicode codepoint in math:
a <= b. - Math
Text - A lone text fragment in math:
x,25,3.1415,=,[. - Module
Import - A module import:
import "utils.typ": a, b, c. - Module
Include - A module include:
include "chapter1.typ". - Named
- A named pair:
thickness: 3pt. - None
- The
noneliteral. - Numeric
- A numeric value with a unit:
12pt,3cm,2em,90deg,50%. - Params
- A closure’s parameters:
(x, y). - Parbreak
- A paragraph break, indicated by one or multiple blank lines.
- Parenthesized
- A grouped expression:
(1 + 2). - Raw
- Raw text with optional syntax highlighting:
`...`. - RawDelim
- A raw delimiter in single or 3+ backticks:
`. - RawLang
- A language tag at the start of raw element:
typ. - Ref
- A reference:
@target,@target[..]. - Renamed
Import Item - A renamed import item:
a as d - SetRule
- A set rule:
set text(...). - Shorthand
- A shorthand for a unicode codepoint. For example,
~for a non-breaking space or-?for a soft hyphen. - Show
Rule - A show rule:
show heading: it => emph(it.body). - Smart
Quote - A smart quote:
'or". - Space
- Whitespace in markup or math. Has at most one newline in markup, as more indicate a paragraph break.
- Spread
- A spread:
..xor..x.at(0). - Str
- A quoted string:
"...". - Strong
- Strong content:
*Strong*. - Term
Item - An item in a term list:
/ Term: Details. - Text
- Plain text without markup.
- Unary
- A unary operation:
-x. - Underscore
- An underscore:
_ - While
Loop - A while loop:
while x { y }.
Enums§
- Arg
- An argument to a function call.
- Array
Item - An item in an array.
- Assoc
- The associativity of a binary operator.
- Bare
Import Error - Reasons why a bare name cannot be determined for an import source.
- BinOp
- A binary operator.
- Destructuring
Item - The kind of an element in a destructuring pattern.
- Dict
Item - An item in an dictionary expression.
- Expr
- An expression in markup, math or code.
- Import
Item - An imported item, potentially renamed to another identifier.
- Imports
- The items that ought to be imported from a file.
- LetBinding
Kind - The kind of a let binding, either a normal one or a closure.
- Math
Text Kind - The underlying text kind.
- Param
- A parameter to a closure.
- Pattern
- The kind of a pattern.
- UnOp
- A unary operator.
- Unit
- Unit of a numeric value.
Traits§
- AstNode
- A typed AST node.