parlex_gen/lib.rs
1//! # Lexer generator ALEX and parser generator ASLR.
2//!
3//! ## Overview
4//!
5//! **parlex-gen** is the companion crate to **parlex**, providing the **ALEX** lexer generator
6//! and the **ASLR** parser generator. Together, these tools form the code generation component
7//! of the Parlex framework, enabling the automatic construction of efficient lexical analyzers
8//! and parsers in Rust.
9//!
10//! The system is inspired by the classic **lex (flex)** and **yacc (bison)** utilities written
11//! for C, but provides a **Rust-based implementation** that is **more composable** and **improves
12//! upon ambiguity resolution**. Unlike lex and yacc, which mix custom user code with automatically
13//! generated code, Parlex cleanly separates the two: grammar rules and lexer definitions are
14//! explicitly named, and user code refers to them by name.
15//!
16//! The **ALEX** lexer generator offers expressive power comparable to that of lex or flex. It
17//! leverages Rust’s standard regular expression libraries to construct deterministic finite
18//! automata (DFAs) that operate efficiently at runtime to recognize permitted lexical patterns.
19//! The system supports multiple lexical states, enabling context-sensitive tokenization.
20//!
21//! The **ASLR** parser generator implements the **SLR(1)** parsing algorithm, which is somewhat
22//! less general than the **LALR(1)** method employed by yacc and bison. Nevertheless, ASLR introduces
23//! a significant enhancement: it supports **dynamic runtime resolution of shift/reduce ambiguities**,
24//! offering greater flexibility in domains such as **Prolog**, where operator definitions may be
25//! introduced or redefined at runtime.
26//!
27//! Lexers and parsers generated by the **parlex-gen** tools depend on the
28//! [parlex](https://crates.io/crates/parlex) core library, which provides the **traits**,
29//! **data structures**, and **runtime support** necessary for their execution. Users define their
30//! grammars and lexical rules declaratively, invoke **ALEX** and **ASLR** to generate Rust source code,
31//! and integrate the resulting components with application logic through the abstractions
32//! provided by `parlex`.
33//!
34//! ### Lexer and Parser Generation with `alex` and `aslr`
35//!
36//! Define your lexer in `lexer.alex` and your grammar in `parser.g`, then run the **ALEX** and
37//! **ASLR** generators to produce the corresponding Rust source files.
38//!
39//! A typical `build.rs` script might look like this:
40//!
41//! ```ignore
42//! // In your build.rs
43//! use std::path::PathBuf;
44//! use parlex_gen::{alex, aslr};
45//!
46//! fn main() {
47//! let manifest_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
48//! let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());
49//!
50//! // --- ALEX Lexer Generation ---
51//! let input_file = PathBuf::from(&manifest_dir).join("src/lexer.alex");
52//! println!("cargo:rerun-if-changed={}", input_file.display());
53//! println!("cargo:warning=ALEX input file: {}", input_file.display());
54//! println!("cargo:warning=ALEX output directory: {}", out_dir.display());
55//! alex::generate(&input_file, &out_dir, "lexer_data", false).unwrap();
56//!
57//! // --- ASLR Parser Generation ---
58//! let input_file = PathBuf::from(&manifest_dir).join("src/parser.g");
59//! println!("cargo:rerun-if-changed={}", input_file.display());
60//! println!("cargo:warning=ASLR input file: {}", input_file.display());
61//! println!("cargo:warning=ASLR output directory: {}", out_dir.display());
62//! aslr::generate(&input_file, &out_dir, "parser_data", false).unwrap();
63//! }
64//! ```
65//!
66//! ## Alex Lexer Specification Format
67//!
68//! The **Alex** specification defines **lexical rules** for recognizing the textual
69//! structure of a language before parsing.
70//! It describes how to match the **components of tokens** — such as identifiers, numbers,
71//! delimiters, operators, and string or block contents — using *regular expressions* and
72//! *lexical states*.
73//!
74//! ### Structure
75//!
76//! An Alex specification contains:
77//!
78//! 1. **Macro definitions**
79//! Named regular expressions declared as:
80//!
81//! ```text
82//! NAME = regex
83//! ```
84//!
85//! Macros can be referenced with `{{NAME}}` inside other patterns.
86//! They are used to build complex rules from smaller reusable fragments (e.g.,
87//! `{{DEC}}`, `{{ATOM}}`, `{{VAR}}`).
88//!
89//! 2. **Lexical rules**
90//! Each rule specifies **what pattern to match** and **in which lexical state** it applies:
91//!
92//! ```text
93//! RuleName: <State1, State2> pattern
94//! ```
95//!
96//! These rules describe low-level recognition of language elements — not yet semantic tokens,
97//! but the raw lexical building blocks.
98//!
99//! 3. **Lexical states**
100//! States define *contexts* that control which rules are active at any time.
101//! The lexer can switch states dynamically, allowing it to handle nested or context-dependent
102//! structures (for example, strings, comments, or embedded data blocks).
103//! A `*` in the state list indicates that the corresponding regular expression rule is active
104//! in **all lexical states**.
105//!
106//! ### Example
107//!
108//! ```text
109//! WS = [ \t\f]+
110//! NL = \r?\n
111//! DEC = [0-9]
112//! ALPHA = [A-Za-z_]
113//! ALNUM = [A-Za-z0-9_]
114//! INT = {{DEC}}+
115//! ID = {{ALPHA}}{{ALNUM}}*
116//! OPER = [=+\-*/();]
117//!
118//! SkipWS: <Expr> {{WS}}
119//! NewLine: <Expr> {{NL}}
120//! OperSym: <Expr> {{OPER}}
121//! Ident: <Expr> {{ID}}
122//! Integer: <Expr> {{INT}}
123//! CommentStart: <Expr, Comment> /\*
124//! CommentEnd: <Comment> \*/
125//! CommentChar: <Comment> [^*\r\n]
126//! CommentNewLine: <Comment> {{NL}}
127//! ErrorAny: <*> .
128//! ```
129//!
130//! > **Note:** The first lexical state encountered in the specification file is used as the starting
131//! lexer state (in this case, `Expr`).
132//!
133//! ## ASLR Grammar Specification Format
134//!
135//! An **ASLR specification** defines a context-free grammar for use with the `aslr` SLR(1) parser
136//! generator. It consists of **production rules**, written in a simple, line-oriented format:
137//!
138//! ```text
139//! rule_name: Nonterminal -> Symbol Symbol ...
140//! ```
141//!
142//! * Each line defines **one production**.
143//! * The **left-hand side (LHS)** is a nonterminal being defined.
144//! * The **right-hand side (RHS)** lists terminals and/or nonterminals.
145//! * An **empty RHS** represents an ε-production (the symbol can derive nothing).
146//! * Multiple alternative productions for the same nonterminal are written as separate rules.
147//! * Grammars can express nested and recursive definitions suitable for SLR(1) parsing.
148//!
149//! ### Naming Rules
150//!
151//! * **Rule names** follow the pattern:
152//! `[a-z]([a-zA-Z0-9])*`
153//! * **Nonterminals** use **capitalized names** (e.g., `Expr`, `Term`, `Seq`).
154//! * **Terminals** follow either:
155//!
156//! * `[a-z]([a-zA-Z0-9])*` — for word-like tokens, or
157//! * one of the special symbols below, which are translated to lowercase names.
158//!
159//! | | | | |
160//! | :-------------- | :-------------- | :------------- | :--------------- |
161//! | `.` dot | `-` minus | `~` tilde | `` ` `` backtick |
162//! | `!` exclamation | `@` at | `#` hash | `$` dollar |
163//! | `%` percent | `^` caret | `&` ampersand | `*` asterisk |
164//! | `+` plus | `=` equals | `\|` pipe | `\\` backslash |
165//! | `<` lessThan | `>` greaterThan | `?` question | `/` slash |
166//! | `;` semicolon | `(` leftParen | `)` rightParen | `[` leftBrack |
167//! | `]` rightBrack | `{` leftBrace | `}` rightBrace | `,` comma |
168//! | `'` singleQuote | `"` doubleQuote | `:` colon | |
169//!
170//! ### Example
171//!
172//! ```text
173//! stat1: Stat ->
174//! stat2: Stat -> ident = Expr
175//! expr1: Expr -> Expr + Expr
176//! expr2: Expr -> Expr * Expr
177//! expr3: Expr -> ( Expr )
178//! expr4: Expr -> int
179//! ```
180//!
181//! ## License
182//!
183//! Copyright (c) 2005–2025 IKH Software, Inc.
184//!
185//! Released under the terms of the GNU Lesser General Public License, version 3.0 or
186//! (at your option) any later version (LGPL-3.0-or-later).
187//!
188//! ## See Also
189//!
190//! - [`parlex`](https://crates.io/crates/parlex) — core support library
191//! - [arena-terms-parser](https://crates.io/crates/arena-terms-parser) — real-world example using **ALEX** and **ASLR**
192
193pub mod alex;
194pub mod aslr;