Skip to main content

lex_core/lex/
token.rs

1//! Core token types and helpers shared across the lexer, parser, and tooling.
2//!
3//!     This module provides the token types used throughout the lexing and parsing pipeline.
4//!     Lex opts for handling more complexity in the lexing stage in order to keep the parsing
5//!     stage very simple. This implies in greater token complexity, and this is the origin of
6//!     several token types.
7//!
8//! Token Layers
9//!
10//!     Even though the grammar operates mostly over lines, we have multiple layers of tokens:
11//!
12//!     Structural Tokens:
13//!         Indent, Dedent. These are semantic tokens that represent indentation level changes,
14//!         similar to open/close braces in more c-style languages. They are produced by the
15//!         semantic indentation transformation from raw Indentation tokens. See
16//!         [semantic_indentation](crate::lex::lexing::transformations::semantic_indentation).
17//!
18//!     Core Tokens:
19//!         Character/word level tokens. They are produced by the logos lexer. See [core](core) module
20//!         for the complete list of core tokens. Grammar: [specs/v1/grammar-core.lex].
21//!
22//!     Line Tokens:
23//!         A group of core tokens in a single line, and used in the actual parsing. See
24//!         [line](line) module. The LineType enum is the definitive set of line classifications
25//!         (blank, annotation start/end, data, subject, list, subject-or-list-item, paragraph,
26//!         dialog, indent, dedent). Grammar: [specs/v1/grammar-line.lex].
27//!
28//!     Inline Tokens:
29//!         Span-based tokens that operate at the character level within text content. Unlike
30//!         line-based tokens, inline tokens can start and end at arbitrary positions and can be
31//!         nested within each other. See [inline](inline) module. Grammar: [specs/v1/grammar-inline.lex].
32//!
33//!     Line Container Tokens:
34//!         A vector of line tokens or other line container tokens. This is a tree representation
35//!         of each level's lines. This is created and used by the parser. See [to_line_container]
36//!         module.
37//!
38//!     Synthetic Tokens:
39//!         Tokens that are not produced by the logos lexer, but are created by the lexing pipeline
40//!         to capture context information from parent to children elements so that parsing can be
41//!         done in a regular single pass.
42//!
43//!         Context Injection: Synthetic tokens enable single-pass parsing by injecting parent context
44//!         into child scopes. This avoids making the grammar context-sensitive and eliminates the need
45//!         for tree walking during parsing.
46//!
47//!         Example - Session Preceding Blank Lines: Sessions require preceding blank lines, but for a
48//!         session that is the first element in its parent, that preceding blank line belongs to the
49//!         parent session's scope. A synthetic BlankLine token is injected at the start of the child
50//!         scope to represent this parent context, allowing the parser to check for the required
51//!         preceding blank line without looking upward in the tree.
52//!
53//!         Properties: Synthetic tokens are not consumed during parsing and do not become AST nodes.
54//!         They exist solely to inform parsing decisions. Since they have no source text, they carry
55//!         no byte range information.
56
57pub mod core;
58pub mod formatting;
59pub mod inline;
60pub mod line;
61pub mod normalization;
62pub mod testing;
63pub mod to_line_container;
64
65pub use core::Token;
66pub use formatting::{detokenize, ToLexString};
67pub use inline::InlineKind;
68pub use line::{LineContainer, LineToken, LineType};
69pub use normalization::utilities;