lex_core/lex/token.rs
1//! Core token types and helpers shared across the lexer, parser, and tooling.
2//!
3//! This module provides the token types used throughout the lexing and parsing pipeline.
4//! Lex opts for handling more complexity in the lexing stage in order to keep the parsing
5//! stage very simple. This implies in greater token complexity, and this is the origin of
6//! several token types.
7//!
8//! Token Layers
9//!
10//! Even though the grammar operates mostly over lines, we have multiple layers of tokens:
11//!
12//! Structural Tokens:
13//! Indent, Dedent. These are semantic tokens that represent indentation level changes,
14//! similar to open/close braces in more c-style languages. They are produced by the
15//! semantic indentation transformation from raw Indentation tokens. See
16//! [semantic_indentation](crate::lex::lexing::transformations::semantic_indentation).
17//!
18//! Core Tokens:
19//! Character/word level tokens. They are produced by the logos lexer. See [core](core) module
20//! for the complete list of core tokens. Grammar: [specs/v1/grammar-core.lex].
21//!
22//! Line Tokens:
23//! A group of core tokens in a single line, and used in the actual parsing. See
24//! [line](line) module. The LineType enum is the definitive set of line classifications
25//! (blank, annotation start/end, data, subject, list, subject-or-list-item, paragraph,
26//! dialog, indent, dedent). Grammar: [specs/v1/grammar-line.lex].
27//!
28//! Inline Tokens:
29//! Span-based tokens that operate at the character level within text content. Unlike
30//! line-based tokens, inline tokens can start and end at arbitrary positions and can be
31//! nested within each other. See [inline](inline) module. Grammar: [specs/v1/grammar-inline.lex].
32//!
33//! Line Container Tokens:
34//! A vector of line tokens or other line container tokens. This is a tree representation
35//! of each level's lines. This is created and used by the parser. See [to_line_container]
36//! module.
37//!
38//! Synthetic Tokens:
39//! Tokens that are not produced by the logos lexer, but are created by the lexing pipeline
40//! to capture context information from parent to children elements so that parsing can be
41//! done in a regular single pass.
42//!
43//! Context Injection: Synthetic tokens enable single-pass parsing by injecting parent context
44//! into child scopes. This avoids making the grammar context-sensitive and eliminates the need
45//! for tree walking during parsing.
46//!
47//! Example - Session Preceding Blank Lines: Sessions require preceding blank lines, but for a
48//! session that is the first element in its parent, that preceding blank line belongs to the
49//! parent session's scope. A synthetic BlankLine token is injected at the start of the child
50//! scope to represent this parent context, allowing the parser to check for the required
51//! preceding blank line without looking upward in the tree.
52//!
53//! Properties: Synthetic tokens are not consumed during parsing and do not become AST nodes.
54//! They exist solely to inform parsing decisions. Since they have no source text, they carry
55//! no byte range information.
56
57pub mod core;
58pub mod formatting;
59pub mod inline;
60pub mod line;
61pub mod normalization;
62pub mod testing;
63pub mod to_line_container;
64
65pub use core::Token;
66pub use formatting::{detokenize, ToLexString};
67pub use inline::InlineKind;
68pub use line::{LineContainer, LineToken, LineType};
69pub use normalization::utilities;