lex_core/
lex.rs

1//! Main module for lex library functionality
2//!
3//!     This module orchestrates the complete lex parsing pipeline. Lex is a simple format,
4//!     and yet quite hard to parse. Tactically it is stateful, recursive, line based and
5//!     indentation significant. The combination of these makes it a parsing nightmare.
6//!
7//!     While these are all true, the format is designed with enough constraints so that,
8//!     if correctly implemented, it's quite easy to parse. However it does mean that using
9//!     available libraries simply won't work. Libraries can handle context free, token
10//!     based, non indentation significant grammars. At best, they are flexible enough to
11//!     handle one of these patterns, but never all of them.
12//!
13//! The Parser Design
14//!
15//!     After significant research and experimentation we settled on a design that is a bit
16//!     off-the-beaten-path, but nicely breaks down complexity into very simple chunks.
17//!
18//!     Instead of a straight lexing -> parsing pipeline, lex-parser does the following steps:
19//!
20//!         1. Semantic Indentation: we convert indent tokens into semantic events as indent
21//!            and dedent. This is a stateful machine that tracks changes in indentation
22//!            levels and emits indent and dedent events. See
23//!            [semantic_indentation](lexing::transformations::semantic_indentation).
24//!
25//!         2. Line Grouping: we group tokens into lines. Here we split tokens by line breaks
26//!            into groups of tokens. Each group is a Line token and which category is
27//!            determined by the tokens inside. See [line_grouping](lexing::line_grouping).
28//!
29//!         3. Tree Building (LineContainer): we build a tree of line groups reflecting the
30//!            nesting structure. This groups line tokens into a hierarchical tree structure
31//!            based on Indent/Dedent markers. See [to_line_container](token::to_line_container).
32//!
33//!         4. Context Injection: we inject context information into each group allowing parsing
34//!            to only read each level's lines. For example, sessions require preceding blank
35//!            lines, but for a session that is the first element in its parent, that preceding
36//!            blank line belongs to the parent. A synthetic token is injected to capture this
37//!            context.
38//!
39//!         5. Parsing by Level: parsing only needs to read each level's lines, which can
40//!            include a LineContainer (that is, there is child content there), with no tree
41//!            traversal needed. Parsing is done declaratively by processing the grammar patterns
42//!            (regular strings) through rust's regex engine. See [parsing](parsing) module.
43//!
44//!     On their own, each step is fairly simple, their total sum being some 500 lines of code.
45//!     Additionally they are easy to test and verify.
46//!
47//!     The key here is that parsing only needs to read each level's line, which can include
48//!     a LineContainer (that is, there is child content there), with no tree traversal needed.
49//!     Parsing is done declaratively by processing the grammar patterns (regular strings)
50//!     through rust's regex engine. Put another way, once tokens are grouped into a tree of
51//!     lines, parsing can be done in a regular single pass.
52//!
53//!     Whether passes 2-4 are indeed lexing or actual parsing is left as a bike shedding
54//!     exercise. The criteria for calling these lexing has been that each transformation is
55//!     simply a grouping of tokens, there is no semantics.
56//!
57//! Pipeline Separation
58//!
59//!     In addition to the transformations over tokens, the codebase separates the semantic
60//!     analysis (in [parsing](parsing)) from the AST building (in [building](building)) and
61//!     finally the final document assembly step (in [assembling](assembling)). These are done
62//!     with the same intention: keeping complexity localized and shallow at every one of these
63//!     layers and making the system more testable. Line grouping and tree building happen at
64//!     the parsing stage, after lexing has already produced indent/dedent-aware flat tokens.
65//!
66//!     For the complete end-to-end pipeline documentation, see [parsing](parsing) module.
67
68pub mod annotation;
69pub mod assembling;
70pub mod ast;
71pub mod building;
72pub mod escape;
73pub mod formats;
74pub mod inlines;
75pub mod lexing;
76pub mod loader;
77pub mod parsing;
78pub mod testing;
79pub mod token;
80pub mod transforms;
lex_core/lex.rs

lex_core/
lex.rs