parlex-gen 0.3.0

Lexer generator ALEX and parser generator ASLR
Documentation
# parlex-gen

[![Crates.io](https://img.shields.io/crates/v/parlex-gen.svg)](https://crates.io/crates/parlex-gen)
[![Documentation](https://docs.rs/parlex-gen/badge.svg)](https://docs.rs/parlex-gen)
[![License: LGPL-3.0-or-later](https://img.shields.io/badge/License-LGPL%203.0--or--later-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0)
[![Rust](https://img.shields.io/badge/rust-stable-brightgreen.svg)](https://www.rust-lang.org)

Lexer generator ALEX and parser generator ASLR.

## Overview

**parlex-gen** is the companion crate to **parlex**, providing the **ALEX** lexer generator and the **ASLR** parser generator. Together, these tools form the code generation component of the Parlex framework, enabling the automatic construction of efficient lexical analyzers and parsers in Rust.

The system is inspired by the classic **lex (flex)** and **yacc (bison)** utilities written for C, but provides a **Rust-based implementation** that is **more composable** and **improves upon ambiguity resolution**. Unlike lex and yacc, which mix custom user code with automatically generated code, Parlex cleanly separates the two: grammar rules and lexer definitions are explicitly named, and user code refers to them by name.

The **ALEX** lexer generator offers expressive power comparable to that of lex or flex. It leverages Rust’s standard regular expression libraries to construct deterministic finite automata (DFAs) that operate efficiently at runtime to recognize permitted lexical patterns. The system supports multiple lexical states, enabling context-sensitive tokenization.

The **ASLR** parser generator implements the **SLR(1)** parsing algorithm, which is somewhat less general than the **LALR(1)** method employed by yacc and bison. Nevertheless, ASLR introduces a significant enhancement: it supports **dynamic runtime resolution of shift/reduce ambiguities**, offering greater flexibility in domains such as **Prolog**, where operator definitions may be introduced or redefined at runtime.

Lexers and parsers generated by the **parlex-gen** tools depend on the [parlex](https://crates.io/crates/parlex) core library, which provides the **traits**, **data structures**, and **runtime support** necessary for their execution. Users define their grammars and lexical rules declaratively, invoke **ALEX** and **ASLR** to generate Rust source code, and integrate the resulting components with application logic through the abstractions provided by `parlex`.


## Usage

Add this to your `Cargo.toml`:

```toml
[build-dependencies]
parlex-gen = "0.3"
```

You'll also need the core library:

```toml
[dependencies]
parlex = "0.3"
```


### Lexer and Parser Generation with `alex` and `aslr`

Define your lexer in `lexer.alex` and your grammar in `parser.g`, then run the **ALEX** and **ASLR** generators to produce the corresponding Rust source files.

A typical `build.rs` script might look like this:

```rust
// In your build.rs
use std::path::PathBuf;
use parlex_gen::{alex, aslr};

fn main() {
    let manifest_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
    let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());

    // --- ALEX Lexer Generation ---
    let input_file = PathBuf::from(&manifest_dir).join("src/lexer.alex");
    println!("cargo:rerun-if-changed={}", input_file.display());
    println!("cargo:warning=ALEX input file: {}", input_file.display());
    println!("cargo:warning=ALEX output directory: {}", out_dir.display());
    alex::generate(&input_file, &out_dir, "lexer_data", false).unwrap();

    // --- ASLR Parser Generation ---
    let input_file = PathBuf::from(&manifest_dir).join("src/parser.g");
    println!("cargo:rerun-if-changed={}", input_file.display());
    println!("cargo:warning=ASLR input file: {}", input_file.display());
    println!("cargo:warning=ASLR output directory: {}", out_dir.display());
    aslr::generate(&input_file, &out_dir, "parser_data", false).unwrap();
}
```


## Alex Lexer Specification Format

The **Alex** specification defines **lexical rules** for recognizing the textual structure of a language before parsing.
It describes how to match the **components of tokens** — such as identifiers, numbers, delimiters, operators, and string or block contents — using *regular expressions* and *lexical states*.

### Structure

An Alex specification contains:

1. **Macro definitions**
   Named regular expressions declared as:

   ```text
   NAME = regex
   ```

   Macros can be referenced with `{{NAME}}` inside other patterns.
   They are used to build complex rules from smaller reusable fragments (e.g., `{{DEC}}`, `{{ATOM}}`, `{{VAR}}`).

2. **Lexical rules**
   Each rule specifies **what pattern to match** and **in which lexical state** it applies:

   ```text
   RuleName: <State1, State2> pattern
   ```

   These rules describe low-level recognition of language elements — not yet semantic tokens, but the raw lexical building blocks.

3. **Lexical states**
   States define *contexts* that control which rules are active at any time.
   The lexer can switch states dynamically, allowing it to handle nested or context-dependent structures (for example, strings, comments, or embedded data blocks).
   A `*` in the state list indicates that the corresponding regular expression rule is active in **all lexical states**.


### Example

```text
WS = [ \t]
NL = \r?\n
IDENT = [a-z_][a-z_A-Z0-9]*
NUMBER = [0-9]+

Ident: <Expr> {{IDENT}}
Number: <Expr> {{NUMBER}}
Semicolon: <Expr> ;
Equals: <Expr> =
Plus: <Expr> \+
Minus: <Expr> -
Asterisk: <Expr> \*
Slash: <Expr> /
LeftParen: <Expr> \(
RightParen: <Expr> \)
CommentBegin: <Expr, Comment> /\*
CommentEnd: <Comment> \*/
CommentChar: <Comment> [^*\r\n]+
NewLine: <*> {{NL}}
WhiteSpace: <Expr> {{WS}}+
Error: <*> .
```

> **Note:**
> The first lexical state encountered in the specification file is used as the starting lexer state (in this case, `Expr`).


## ASLR Grammar Specification Format

An **ASLR specification** defines a context-free grammar for use with the `aslr` SLR(1) parser generator.
It consists of **production rules**, written in a simple, line-oriented format:

```
rule_name:   Nonterminal -> Symbol Symbol ...
```

* Each line defines **one production**.
* The **left-hand side (LHS)** is a nonterminal being defined.
* The **right-hand side (RHS)** lists terminals and/or nonterminals.
* An **empty RHS** represents an ε-production (the symbol can derive nothing).
* Multiple alternative productions for the same nonterminal are written as separate rules.
* Grammars can express nested and recursive definitions suitable for SLR(1) parsing.

### Naming Rules

* **Rule names** follow the pattern:
  `[a-z]([a-zA-Z0-9])*`
* **Nonterminals** use **capitalized names** (e.g., `Expr`, `Term`, `Seq`).
* **Terminals** follow either:

  * `[a-z]([a-zA-Z0-9])*` — for word-like tokens, or
  * one of the special symbols below, which are translated to lowercase names.

|                 |                 |                |                  |
| :-------------- | :-------------- | :------------- | :--------------- |
| `.` dot         | `-` minus       | `~` tilde      | `` ` `` backtick |
| `!` exclamation | `@` at          | `#` hash       | `$` dollar       |
| `%` percent     | `^` caret       | `&` ampersand  | `*` asterisk     |
| `+` plus        | `=` equals      | `\|` pipe      | `\\` backslash   |
| `<` lessThan    | `>` greaterThan | `?` question   | `/` slash        |
| `;` semicolon   | `(` leftParen   | `)` rightParen | `[` leftBrack    |
| `]` rightBrack  | `{` leftBrace   | `}` rightBrace | `,` comma        |
| `'` singleQuote | `"` doubleQuote | `:` colon      |                  |

### Example

```text
stat1: Stat ->
stat2: Stat -> Expr
stat3: Stat -> ident = Expr
expr1: Expr -> number
expr2: Expr -> ident
expr3: Expr -> Expr + Expr
expr4: Expr -> Expr - Expr
expr5: Expr -> Expr * Expr
expr6: Expr -> Expr / Expr
expr7: Expr -> - Expr
expr8: Expr -> ( Expr )
```

## License

Copyright (c) 2005–2025 IKH Software, Inc.

Released under the terms of the GNU Lesser General Public License, version 3.0
or (at your option) any later version (LGPL-3.0-or-later).

## See Also

- [`parlex`]https://crates.io/crates/parlex — core support library
- [`arena-terms-parser`]https://crates.io/crates/arena-terms-parser — real-world example using **ALEX** and **ASLR**