parlex_gen/
lib.rs

1//! # Lexer generator ALEX and parser generator ASLR.
2//!
3//! ## Overview
4//!
5//! **parlex-gen** is the companion crate to **parlex**, providing the **ALEX** lexer generator
6//! and the **ASLR** parser generator. Together, these tools form the code generation component
7//! of the Parlex framework, enabling the automatic construction of efficient lexical analyzers
8//! and parsers in Rust.
9//!
10//! The system is inspired by the classic **lex (flex)** and **yacc (bison)** utilities written
11//! for C, but provides a **Rust-based implementation** that is **more composable** and **improves
12//! upon ambiguity resolution**. Unlike lex and yacc, which mix custom user code with automatically
13//! generated code, Parlex cleanly separates the two: grammar rules and lexer definitions are
14//! explicitly named, and user code refers to them by name.
15//!
16//! The **ALEX** lexer generator offers expressive power comparable to that of lex or flex. It
17//! leverages Rust’s standard regular expression libraries to construct deterministic finite
18//! automata (DFAs) that operate efficiently at runtime to recognize permitted lexical patterns.
19//! The system supports multiple lexical states, enabling context-sensitive tokenization.
20//!
21//! The **ASLR** parser generator implements the **SLR(1)** parsing algorithm, which is somewhat
22//! less general than the **LALR(1)** method employed by yacc and bison. Nevertheless, ASLR introduces
23//! a significant enhancement: it supports **dynamic runtime resolution of shift/reduce ambiguities**,
24//! offering greater flexibility in domains such as **Prolog**, where operator definitions may be
25//! introduced or redefined at runtime.
26//!
27//! Lexers and parsers generated by the **parlex-gen** tools depend on the
28//! [parlex](https://crates.io/crates/parlex) core library, which provides the **traits**,
29//! **data structures**, and **runtime support** necessary for their execution. Users define their
30//! grammars and lexical rules declaratively, invoke **ALEX** and **ASLR** to generate Rust source code,
31//! and integrate the resulting components with application logic through the abstractions
32//! provided by `parlex`.
33//!
34//! ### Lexer and Parser Generation with `alex` and `aslr`
35//!
36//! Define your lexer in `lexer.alex` and your grammar in `parser.g`, then run the **ALEX** and
37//! **ASLR** generators to produce the corresponding Rust source files.
38//!
39//! A typical `build.rs` script might look like this:
40//!
41//! ```ignore
42//! // In your build.rs
43//! use std::path::PathBuf;
44//! use parlex_gen::{alex, aslr};
45//!
46//! fn main() {
47//!     let manifest_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
48//!     let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());
49//!
50//!     // --- ALEX Lexer Generation ---
51//!     let input_file = PathBuf::from(&manifest_dir).join("src/lexer.alex");
52//!     println!("cargo:rerun-if-changed={}", input_file.display());
53//!     println!("cargo:warning=ALEX input file: {}", input_file.display());
54//!     println!("cargo:warning=ALEX output directory: {}", out_dir.display());
55//!     alex::generate(&input_file, &out_dir, "lexer_data", false).unwrap();
56//!
57//!     // --- ASLR Parser Generation ---
58//!     let input_file = PathBuf::from(&manifest_dir).join("src/parser.g");
59//!     println!("cargo:rerun-if-changed={}", input_file.display());
60//!     println!("cargo:warning=ASLR input file: {}", input_file.display());
61//!     println!("cargo:warning=ASLR output directory: {}", out_dir.display());
62//!     aslr::generate(&input_file, &out_dir, "parser_data", false).unwrap();
63//! }
64//! ```
65//!
66//! ## Alex Lexer Specification Format
67//!
68//! The **Alex** specification defines **lexical rules** for recognizing the textual
69//! structure of a language before parsing.
70//! It describes how to match the **components of tokens** — such as identifiers, numbers,
71//! delimiters, operators, and string or block contents — using *regular expressions* and
72//! *lexical states*.
73//!
74//! ### Structure
75//!
76//! An Alex specification contains:
77//!
78//! 1. **Macro definitions**
79//!    Named regular expressions declared as:
80//!
81//!    ```text
82//!    NAME = regex
83//!    ```
84//!
85//!    Macros can be referenced with `{{NAME}}` inside other patterns.
86//!    They are used to build complex rules from smaller reusable fragments (e.g.,
87//!    `{{DEC}}`, `{{ATOM}}`, `{{VAR}}`).
88//!
89//! 2. **Lexical rules**
90//!    Each rule specifies **what pattern to match** and **in which lexical state** it applies:
91//!
92//!    ```text
93//!    RuleName: <State1, State2> pattern
94//!    ```
95//!
96//!    These rules describe low-level recognition of language elements — not yet semantic tokens,
97//!    but the raw lexical building blocks.
98//!
99//! 3. **Lexical states**
100//!    States define *contexts* that control which rules are active at any time.
101//!    The lexer can switch states dynamically, allowing it to handle nested or context-dependent
102//!    structures (for example, strings, comments, or embedded data blocks).
103//!    A `*` in the state list indicates that the corresponding regular expression rule is active
104//!    in **all lexical states**.
105//!
106//! ### Example
107//!
108//! ```text
109//! WS        = [ \t\f]+
110//! NL        = \r?\n
111//! DEC       = [0-9]
112//! ALPHA     = [A-Za-z_]
113//! ALNUM     = [A-Za-z0-9_]
114//! INT       = {{DEC}}+
115//! ID        = {{ALPHA}}{{ALNUM}}*
116//! OPER      = [=+\-*/();]
117//!
118//! SkipWS: <Expr> {{WS}}
119//! NewLine: <Expr> {{NL}}
120//! OperSym: <Expr> {{OPER}}
121//! Ident: <Expr> {{ID}}
122//! Integer: <Expr> {{INT}}
123//! CommentStart: <Expr, Comment> /\*
124//! CommentEnd: <Comment> \*/
125//! CommentChar: <Comment> [^*\r\n]
126//! CommentNewLine: <Comment> {{NL}}
127//! ErrorAny: <*> .
128//! ```
129//!
130//! > **Note:** The first lexical state encountered in the specification file is used as the starting
131//!   lexer state (in this case, `Expr`).
132//!
133//! ## ASLR Grammar Specification Format
134//!
135//! An **ASLR specification** defines a context-free grammar for use with the `aslr` SLR(1) parser
136//! generator. It consists of **production rules**, written in a simple, line-oriented format:
137//!
138//! ```text
139//! rule_name:   Nonterminal -> Symbol Symbol ...
140//! ```
141//!
142//! * Each line defines **one production**.
143//! * The **left-hand side (LHS)** is a nonterminal being defined.
144//! * The **right-hand side (RHS)** lists terminals and/or nonterminals.
145//! * An **empty RHS** represents an ε-production (the symbol can derive nothing).
146//! * Multiple alternative productions for the same nonterminal are written as separate rules.
147//! * Grammars can express nested and recursive definitions suitable for SLR(1) parsing.
148//!
149//! ### Naming Rules
150//!
151//! * **Rule names** follow the pattern:
152//!   `[a-z]([a-zA-Z0-9])*`
153//! * **Nonterminals** use **capitalized names** (e.g., `Expr`, `Term`, `Seq`).
154//! * **Terminals** follow either:
155//!
156//!   * `[a-z]([a-zA-Z0-9])*` — for word-like tokens, or
157//!   * one of the special symbols below, which are translated to lowercase names.
158//!
159//! |                 |                 |                |                  |
160//! | :-------------- | :-------------- | :------------- | :--------------- |
161//! | `.` dot         | `-` minus       | `~` tilde      | `` ` `` backtick |
162//! | `!` exclamation | `@` at          | `#` hash       | `$` dollar       |
163//! | `%` percent     | `^` caret       | `&` ampersand  | `*` asterisk     |
164//! | `+` plus        | `=` equals      | `\|` pipe      | `\\` backslash   |
165//! | `<` lessThan    | `>` greaterThan | `?` question   | `/` slash        |
166//! | `;` semicolon   | `(` leftParen   | `)` rightParen | `[` leftBrack    |
167//! | `]` rightBrack  | `{` leftBrace   | `}` rightBrace | `,` comma        |
168//! | `'` singleQuote | `"` doubleQuote | `:` colon      |                  |
169//!
170//! ### Example
171//!
172//! ```text
173//! stat1:  Stat ->
174//! stat2:  Stat -> ident = Expr
175//! expr1:  Expr -> Expr + Expr
176//! expr2:  Expr -> Expr * Expr
177//! expr3:  Expr -> ( Expr )
178//! expr4:  Expr -> int
179//! ```
180//!
181//! ## License
182//!
183//! Copyright (c) 2005–2025 IKH Software, Inc.
184//!
185//! Released under the terms of the GNU Lesser General Public License, version 3.0 or
186//! (at your option) any later version (LGPL-3.0-or-later).
187//!
188//! ## See Also
189//!
190//! - [`parlex`](https://crates.io/crates/parlex) — core support library
191//! - [arena-terms-parser](https://crates.io/crates/arena-terms-parser) — real-world example using **ALEX** and **ASLR**
192
193pub mod alex;
194pub mod aslr;