lex_core/lex.rs
1//! Main module for lex library functionality
2//!
3//! This module orchestrates the complete lex parsing pipeline. Lex is a simple format,
4//! and yet quite hard to parse. Tactically it is stateful, recursive, line based and
5//! indentation significant. The combination of these makes it a parsing nightmare.
6//!
7//! While these are all true, the format is designed with enough constraints so that,
8//! if correctly implemented, it's quite easy to parse. However it does mean that using
9//! available libraries simply won't work. Libraries can handle context free, token
10//! based, non indentation significant grammars. At best, they are flexible enough to
11//! handle one of these patterns, but never all of them.
12//!
13//! The Parser Design
14//!
15//! After significant research and experimentation we settled on a design that is a bit
16//! off-the-beaten-path, but nicely breaks down complexity into very simple chunks.
17//!
18//! Instead of a straight lexing -> parsing pipeline, lex-parser does the following steps:
19//!
20//! 1. Semantic Indentation: we convert indent tokens into semantic events as indent
21//! and dedent. This is a stateful machine that tracks changes in indentation
22//! levels and emits indent and dedent events. See
23//! [semantic_indentation](lexing::transformations::semantic_indentation).
24//!
25//! 2. Line Grouping: we group tokens into lines. Here we split tokens by line breaks
26//! into groups of tokens. Each group is a Line token and which category is
27//! determined by the tokens inside. See [line_grouping](lexing::line_grouping).
28//!
29//! 3. Tree Building (LineContainer): we build a tree of line groups reflecting the
30//! nesting structure. This groups line tokens into a hierarchical tree structure
31//! based on Indent/Dedent markers. See [to_line_container](token::to_line_container).
32//!
33//! 4. Context Injection: we inject context information into each group allowing parsing
34//! to only read each level's lines. For example, sessions require preceding blank
35//! lines, but for a session that is the first element in its parent, that preceding
36//! blank line belongs to the parent. A synthetic token is injected to capture this
37//! context.
38//!
39//! 5. Parsing by Level: parsing only needs to read each level's lines, which can
40//! include a LineContainer (that is, there is child content there), with no tree
41//! traversal needed. Parsing is done declaratively by processing the grammar patterns
42//! (regular strings) through rust's regex engine. See [parsing](parsing) module.
43//!
44//! On their own, each step is fairly simple, their total sum being some 500 lines of code.
45//! Additionally they are easy to test and verify.
46//!
47//! The key here is that parsing only needs to read each level's line, which can include
48//! a LineContainer (that is, there is child content there), with no tree traversal needed.
49//! Parsing is done declaratively by processing the grammar patterns (regular strings)
50//! through rust's regex engine. Put another way, once tokens are grouped into a tree of
51//! lines, parsing can be done in a regular single pass.
52//!
53//! Whether passes 2-4 are indeed lexing or actual parsing is left as a bike shedding
54//! exercise. The criteria for calling these lexing has been that each transformation is
55//! simply a grouping of tokens, there is no semantics.
56//!
57//! Pipeline Separation
58//!
59//! In addition to the transformations over tokens, the codebase separates the semantic
60//! analysis (in [parsing](parsing)) from the AST building (in [building](building)) and
61//! finally the final document assembly step (in [assembling](assembling)). These are done
62//! with the same intention: keeping complexity localized and shallow at every one of these
63//! layers and making the system more testable. Line grouping and tree building happen at
64//! the parsing stage, after lexing has already produced indent/dedent-aware flat tokens.
65//!
66//! For the complete end-to-end pipeline documentation, see [parsing](parsing) module.
67
68pub mod annotation;
69pub mod assembling;
70pub mod ast;
71pub mod building;
72pub mod formats;
73pub mod inlines;
74pub mod lexing;
75pub mod loader;
76pub mod parsing;
77pub mod testing;
78pub mod token;
79pub mod transforms;