lex_core/lex/parsing.rs
1//! Parsing module for the lex format
2//!
3//! This module provides the complete processing pipeline from source text to AST:
4//! 1. Lexing: Tokenization of source text. See [lexing](crate::lex::lexing) module.
5//! 2. Analysis: Syntactic analysis to produce IR nodes. See [engine](engine) module.
6//! 3. Building: Construction of AST from IR nodes. See [building](crate::lex::building) module.
7//! 4. Inline Parsing: Parse inline elements in text content. See [inlines](crate::lex::inlines) module.
8//! 5. Assembling: Post-parsing transformations. See [assembling](crate::lex::assembling) module.
9//!
10//! Parsing End To End
11//!
12//! The complete pipeline transforms a string of Lex source up to the final AST through
13//! these stages:
14//!
15//! Lexing (5.1):
16//! Tokenization and transformations that group tokens into lines. At the end of
17//! lexing, we have a TokenStream of Line tokens + indent/dedent tokens.
18//!
19//! Parsing - Semantic Analysis (5.2):
20//! At the very beginning of parsing we will group line tokens into a tree of
21//! LineContainers. What this gives us is the ability to parse each level in isolation.
22//! Because we don't need to know what a LineContainer has, but only that it is a
23//! line container, we can parse each level with a regular regex. We simply print
24//! token names and match the grammar patterns against them.
25//!
26//! When tokens are matched, we create intermediate representation nodes, which carry
27//! only two bits of information: the node matched and which tokens it uses.
28//!
29//! This allows us to separate the semantic analysis from the ast building. This is
30//! a good thing overall, but was instrumental during development, as we ran multiple
31//! parsers in parallel and the ast building had to be unified (correct parsing would
32//! result in the same node types + tokens).
33//!
34//! AST Building (5.3):
35//! From the IR nodes, we build the actual AST nodes. During this step, important
36//! things happen:
37//! 1. We unroll source tokens so that ast nodes have access to token values.
38//! 2. The location from tokens is used to calculate the location for the ast node.
39//! 3. The location is transformed from byte range to a dual byte range + line:column
40//! position.
41//! At this stage we create the root session node; it will be attached to the
42//! [`Document`] during assembling.
43//!
44//! Inline Parsing (5.4):
45//! Before assembling the document (while annotations are still part of the content
46//! tree), we parse the TextContent nodes for inline elements. This parsing is much
47//! simpler, as it has formal start/end tokens and has no structural elements.
48//!
49//! Document Assembly (5.5):
50//! The assembling stage wraps the root session into a document node and performs
51//! metadata attachment. Annotations, which are metadata, are always attached to AST
52//! nodes, so they can be very targeted. Only with the full document in place we can
53//! attach annotations to their correct target nodes. This is harder than it seems.
54//! Keeping Lex ethos of not enforcing structure, this needs to deal with several
55//! ambiguous cases, including some complex logic for calculating "human
56//! understanding" distance between elements.
57//!
58//! Terminology
59//!
60//! - parse: Colloquial term for the entire process (lexing + analysis + building)
61//! - analyze/analysis: The syntactic analysis phase specifically
62//! - build: The AST construction phase specifically
63//!
64//! Testing
65//!
66//! All parser tests must follow strict guidelines. See the [testing module](crate::lex::testing)
67//! for comprehensive documentation on using verified lex sources and AST assertions.
68
69// Parser implementations
70pub mod common;
71pub mod engine;
72pub mod ir;
73pub mod parser;
74
75// Re-export common parser interfaces
76pub use common::{ParseError, ParserInput};
77
78// Re-export AST types and utilities from the ast module
79pub use crate::lex::ast::{
80 format_at_position, Annotation, AstNode, Container, ContentItem, Definition, Document, Label,
81 List, ListItem, Paragraph, Parameter, Position, Range, Session, SourceLocation, TextNode,
82 Verbatim,
83};
84
85pub use crate::lex::formats::{serialize_ast_tag, to_treeviz_str};
86/// Type alias for processing results returned by helper APIs.
87type ProcessResult = Result<Document, String>;
88
89/// Process source text through the complete pipeline: lex, analyze, and build.
90///
91/// This is the primary entry point for processing lex documents. It performs:
92/// 1. Lexing: Tokenizes the source text
93/// 2. Analysis: Performs syntactic analysis to produce IR nodes
94/// 3. Building: Constructs the root session tree from IR nodes (assembling wraps it in a
95/// `Document` and attaches metadata)
96///
97/// # Arguments
98///
99/// * `source` - The source text to process
100///
101/// # Returns
102///
103/// A `Document` containing the complete AST, or parsing errors.
104///
105/// # Example
106///
107/// ```rust,ignore
108/// use lex::lex::parsing::process_full;
109///
110/// let source = "Hello world\n";
111/// let document = process_full(source)?;
112/// ```
113pub fn process_full(source: &str) -> ProcessResult {
114 use crate::lex::transforms::standard::STRING_TO_AST;
115 STRING_TO_AST
116 .run(source.to_string())
117 .map_err(|e| e.to_string())
118}
119
120/// Alias for `process_full` to maintain backward compatibility.
121///
122/// The term "parse" colloquially refers to the entire processing pipeline
123/// (lexing + analysis + building), even though technically parsing is just
124/// the syntactic analysis phase.
125pub fn parse_document(source: &str) -> ProcessResult {
126 process_full(source)
127}