Skip to main content

lex_core/lex/
parsing.rs

1//! Parsing module for the lex format
2//!
3//!     This module provides the complete processing pipeline from source text to AST:
4//!         1. Lexing: Tokenization of source text. See [lexing](crate::lex::lexing) module.
5//!         2. Analysis: Syntactic analysis to produce IR nodes. See [engine](engine) module.
6//!         3. Building: Construction of AST from IR nodes. See [building](crate::lex::building) module.
7//!         4. Inline Parsing: Parse inline elements in text content. See [inlines](crate::lex::inlines) module.
8//!         5. Assembling: Post-parsing transformations. See [assembling](crate::lex::assembling) module.
9//!
10//! Parsing End To End
11//!
12//!     The complete pipeline transforms a string of Lex source up to the final AST through
13//!     these stages:
14//!
15//!         Lexing (5.1):
16//!             Tokenization and transformations that group tokens into lines. At the end of
17//!             lexing, we have a TokenStream of Line tokens + indent/dedent tokens.
18//!
19//!         Parsing - Semantic Analysis (5.2):
20//!             At the very beginning of parsing we will group line tokens into a tree of
21//!             LineContainers. What this gives us is the ability to parse each level in isolation.
22//!             Because we don't need to know what a LineContainer has, but only that it is a
23//!             line container, we can parse each level with a regular regex. We simply print
24//!             token names and match the grammar patterns against them.
25//!
26//!             When tokens are matched, we create intermediate representation nodes, which carry
27//!             only two bits of information: the node matched and which tokens it uses.
28//!
29//!             This allows us to separate the semantic analysis from the ast building. This is
30//!             a good thing overall, but was instrumental during development, as we ran multiple
31//!             parsers in parallel and the ast building had to be unified (correct parsing would
32//!             result in the same node types + tokens).
33//!
34//!         AST Building (5.3):
35//!             From the IR nodes, we build the actual AST nodes. During this step, important
36//!             things happen:
37//!                 1. We unroll source tokens so that ast nodes have access to token values.
38//!                 2. The location from tokens is used to calculate the location for the ast node.
39//!                 3. The location is transformed from byte range to a dual byte range + line:column
40//!                    position.
41//!             At this stage we create the root session node; it will be attached to the
42//!             [`Document`] during assembling.
43//!
44//!         Inline Parsing (5.4):
45//!             Before assembling the document (while annotations are still part of the content
46//!             tree), we parse the TextContent nodes for inline elements. This parsing is much
47//!             simpler, as it has formal start/end tokens and has no structural elements.
48//!
49//!         Document Assembly (5.5):
50//!             The assembling stage wraps the root session into a document node and performs
51//!             metadata attachment. Annotations, which are metadata, are always attached to AST
52//!             nodes, so they can be very targeted. Only with the full document in place we can
53//!             attach annotations to their correct target nodes. This is harder than it seems.
54//!             Keeping Lex ethos of not enforcing structure, this needs to deal with several
55//!             ambiguous cases, including some complex logic for calculating "human
56//!             understanding" distance between elements.
57//!
58//! Terminology
59//!
60//!     - parse: Colloquial term for the entire process (lexing + analysis + building)
61//!     - analyze/analysis: The syntactic analysis phase specifically
62//!     - build: The AST construction phase specifically
63//!
64//! Testing
65//!
66//!     All parser tests must follow strict guidelines. See the [testing module](crate::lex::testing)
67//!     for comprehensive documentation on using verified lex sources and AST assertions.
68
69// Parser implementations
70pub mod common;
71pub mod engine;
72pub mod ir;
73pub mod parser;
74
75// Re-export common parser interfaces
76pub use common::{ParseError, ParserInput};
77
78// Re-export AST types and utilities from the ast module
79pub use crate::lex::ast::{
80    format_at_position, Annotation, AstNode, Container, ContentItem, Definition, Document, Label,
81    List, ListItem, Paragraph, Parameter, Position, Range, Session, SourceLocation, TextNode,
82    Verbatim,
83};
84
85pub use crate::lex::formats::{serialize_ast_tag, to_treeviz_str};
86/// Type alias for processing results returned by helper APIs.
87type ProcessResult = Result<Document, String>;
88
89/// Process source text through the complete pipeline: lex, analyze, and build.
90///
91/// This is the primary entry point for processing lex documents. It performs:
92/// 1. Lexing: Tokenizes the source text
93/// 2. Analysis: Performs syntactic analysis to produce IR nodes
94/// 3. Building: Constructs the root session tree from IR nodes (assembling wraps it in a
95///    `Document` and attaches metadata)
96///
97/// # Arguments
98///
99/// * `source` - The source text to process
100///
101/// # Returns
102///
103/// A `Document` containing the complete AST, or parsing errors.
104///
105/// # Example
106///
107/// ```rust,ignore
108/// use lex::lex::parsing::process_full;
109///
110/// let source = "Hello world\n";
111/// let document = process_full(source)?;
112/// ```
113pub fn process_full(source: &str) -> ProcessResult {
114    use crate::lex::transforms::standard::STRING_TO_AST;
115    STRING_TO_AST
116        .run(source.to_string())
117        .map_err(|e| e.to_string())
118}
119
120/// Alias for `process_full` to maintain backward compatibility.
121///
122/// The term "parse" colloquially refers to the entire processing pipeline
123/// (lexing + analysis + building), even though technically parsing is just
124/// the syntactic analysis phase.
125pub fn parse_document(source: &str) -> ProcessResult {
126    process_full(source)
127}