rusty_lr 4.1.0

A Bison-like Parser generator & Compiler frontend framework for Rust generating IELR(1), LALR(1) parser tables, with deterministic LR and non-deterministic LR (GLR) parsing.
Documentation
# rusty_lr

[![crates.io](https://img.shields.io/crates/v/rusty_lr.svg)](https://crates.io/crates/rusty_lr)
[![docs.rs](https://docs.rs/rusty_lr/badge.svg)](https://docs.rs/rusty_lr)

***A Bison-like parser generator and compiler frontend for Rust. It generates optimized IELR(1) and LALR(1) parsing tables, supporting both deterministic LR and non-deterministic Generalized LR (GLR) parsing.***

RustyLR is a robust parser generator that converts context-free grammars into optimized IELR(1) or LALR(1) tables. It seamlessly integrates with the Rust ecosystem, allowing you to write custom reduce actions directly in Rust with rich, diagnostic-driven error reporting. 

Highly inspired by classic tools like *Bison* and *Yacc*, RustyLR uses a familiar syntax while offering modern features such as state optimization, location tracking, generalized parsing, and compile-time/runtime conflict resolution.

![title](images/title.png)

## Features

- **Custom Reduce Actions:** Define actions directly in Rust to build abstract syntax trees (ASTs) or custom data structures easily.
- **Automatic Parser Optimization:** Shrinks parsing tables and boosts runtime performance by grouping terminal symbols that exhibit identical behavior across parser states.
- **Multiple Parsing Strategies:** Supports minimal-LR(1) (IELR-style), LALR(1) tables, and Generalized LR (GLR) parsing.
- **Detailed Diagnostics:** Reports grammar conflicts, provides verbose traces of conflict resolution stages, and logs optimization passes.
- **Static & Runtime Conflict Resolution:** Resolve grammar conflicts at compile time (precedence/associativity) or dynamically at runtime.
- **Location Tracking:** Automatically tracks positions of tokens and non-terminals, simplifying error reporting in compiler diagnostics.
- **State Machine Debugging:** The `rustylr` CLI provides a `--state` flag to inspect and visualize the generated state machine, making conflict debugging straightforward.

---

## Quick Start: Using the `rustylr` CLI

The recommended way to use RustyLR is via the standalone `rustylr` CLI executable. It offers faster compilation, comprehensive grammar diagnostics, and interactive tools for debugging state machines.

### 1. Add `rusty_lr` to your dependencies
Add the runtime library to your `Cargo.toml`. The generated parser will depend on it.

```toml
[dependencies]
rusty_lr = "..." # Ensure this matches the version of the CLI executable
```

### 2. Install the `rustylr` CLI
Install the command-line generator from crates.io using Cargo:

```bash
cargo install rustylr
```

### 3. Create a Grammar File
Create a grammar file, e.g., `src/grammar.rs`. Any Rust code placed *above* the `%%` delimiter is copied directly to the generated parser file. The section *below* `%%` defines directives and grammar rules.

```rust
// src/grammar.rs

#[derive(Debug, Clone, Copy)]
pub enum Token {
    Num(i32),
    Plus,
    Minus,
    Mul,
    Div,
    LParen,
    RParen,
}

%%

// Define the token type and the start symbol
%tokentype Token;
%start Expr;

// Declare operator precedence and associativity (lowest to highest)
%left plus minus;
%left mul div;

// Map grammar terminals to Token enum variants
%token num Token::Num(_);
%token plus Token::Plus;
%token minus Token::Minus;
%token mul Token::Mul;
%token div Token::Div;
%token lparen Token::LParen;
%token rparen Token::RParen;

// Production rules
// Expr(i32) means the non-terminal Expr returns an i32.
// In the action block `{ ... }`, reference RHS symbols by their names.
Expr(i32)
    : e1=Expr plus e2=Expr   { e1 + e2 }
    | e1=Expr minus e2=Expr  { e1 - e2 }
    | e1=Expr mul e2=Expr    { e1 * e2 }
    | e1=Expr div e2=Expr    { e1 / e2 }
    | lparen e=Expr rparen   { e }
    | num {
        if let Token::Num(val) = num {
            val
        } else {
            unreachable!()
        }
    }
    ;
```

### 4. Generate the Parser Code
Run the CLI to compile your grammar into a Rust module:

```bash
rustylr src/grammar.rs src/parser.rs
```

### 5. Parse a Token Stream
Initialize the state context with initial user data (or `with_default_userdata()` when the user data type implements `Default`), and feed your tokens to it:

```rust
// src/main.rs
mod parser;
use parser::Token;

fn main() {
    // Represents the expression: 3 + 4 * 2
    let tokens = vec![
        Token::Num(3),
        Token::Plus,
        Token::Num(4),
        Token::Mul,
        Token::Num(2),
    ];

    let mut context = parser::ExprContext::with_default_userdata();

    for token in tokens {
        if let Err(err) = context.feed(token) {
            eprintln!("Parse error: {}", err);
            return;
        }
    }

    match context.accept() {
        Ok((result, _userdata)) => {
            println!("Parsed result: {}", result); // Output: 11
        }
        Err(err) => {
            eprintln!("Failed to finalize parsing: {}", err);
        }
    }
}
```

> [!IMPORTANT]
> The version of the `rustylr` CLI executable must exactly match the version of the `rusty_lr` library in your `Cargo.toml`. Version mismatches may result in build failures.

---

## Generated Code Structure

The generated parser module contains several generated components tailored to your start symbol:
- **`<Start>Parser`**: A lightweight struct containing the static parsing tables. [(docs)]https://docs.rs/rusty_lr/latest/rusty_lr/parser/trait.Parser.html
- **`<Start>Context`**: A mutable state context that keeps track of the stack and parsed symbol values. [(LR docs)]https://docs.rs/rusty_lr/latest/rusty_lr/parser/deterministic/struct.Context.html [(GLR docs)]https://docs.rs/rusty_lr/latest/rusty_lr/parser/nondeterministic/struct.Context.html

The generated module also includes internal state, rule, and non-terminal types used by the runtime and debugging APIs.

### Feeding Tokens
You can feed terminal symbols either with or without location information:
```rust
// Basic feeding
context.feed(token);

// Location-aware feeding (requires %location in grammar)
context.feed_location(token, token_location);
```

---

## GLR Parsing

RustyLR provides native support for Generalized LR (GLR) parsing. When you add the `%glr;` directive to a grammar, RustyLR generates a non-deterministic parser that forks state branches upon encountering shift/reduce or reduce/reduce conflicts. This is particularly useful for ambiguous grammars or complex programming languages.

In GLR mode, user data is branch-local. When the parser forks, the current user data is cloned so each active branch owns and mutates an independent `UserData` value.

For more details, see [GLR.md](GLR.md).

---

## Error Handling and Conflict Resolution

RustyLR provides multiple tools to resolve grammar ambiguities and handle parsing failures:
- **Panic-Mode Error Recovery:** Use the special `error` terminal to catch and recover from syntax errors. Unlike Bison's blocking loop-based recovery, RustyLR incrementally discards and merges subsequent unexpected tokens into the `error` token's location span on each `feed()`, enabling reactive stream parsing and accurate diagnostic spans.
- **Operator Precedence:** Disambiguate expressions with `%left`, `%right`, and `%precedence` directives.
- **Advanced Reduce Rule Priority:** Resolve reduce/reduce conflicts with `%dprec` when precedence declarations are not enough.
- **Runtime Error Propagation:** Return custom `Err` payloads from reduce actions to signal semantic or parsing errors.

See [SYNTAX.md - Resolving Conflicts](SYNTAX.md#resolving-conflicts) for in-depth information.

---

## Location Tracking

Track input spans automatically across tokens and non-terminals to print helpful compiler errors:

```rust
Expr(i32)
    : e1=Expr '+' e2=Expr {
        println!("Span of e1: {:?}", @e1);
        println!("Span of e2: {:?}", @e2);
        println!("Span of this Expr: {:?}", @$); // @$ (or @0) represents the current non-terminal's span
        e1 + e2
    }
    | Expr error Expr {
        println!("Syntax error recovery span: {:?}", @error);
        0 // Fallback value
    }
    ;
```

See [SYNTAX.md - Location Tracking](SYNTAX.md#location-tracking) for configuration details.

---

## State Machine Debugging

You can inspect the generated parser states using the `--state` option. This outputs a color-coded state listing showing core items, lookahead sets, transitions, and conflict reports.

```bash
rustylr --state src/grammar.rs
```

![State Machine Debug](images/state_option.png)

The `<Start>Context` also offers inspection utilities that are useful while debugging a parser:

```rust
let mut context = ExprContext::with_default_userdata();

// ... feed tokens ...

context.expected_token();  // Returns the expected symbols for the current state
context.can_feed(&token);  // Checks if a terminal can be fed next
println!("{}", context);   // Formats the state tree (requires 'tree' feature)
```

---

## Examples

- [Calculator (enum tokens)]https://github.com/ehwan/RustyLR/blob/main/example/calculator/src/parser.rs: A numeric expression parser using custom token enums.
- [Calculator (u8 tokens)]https://github.com/ehwan/RustyLR/blob/main/example/calculator_u8/src/parser.rs: A byte-stream numeric calculator.
- [JSON Validator]https://github.com/ehwan/RustyLR/blob/main/example/json/src/parser.rs: A validator checking JSON syntax.
- [Lua 5.4 Parser]https://github.com/ehwan/lua_rust/blob/main/parser/src/parser.rs: A complete parser for the Lua 5.4 programming language.
- [C Parser]https://github.com/ehwan/C-language-Parser-In-Rust/blob/main/src/ast/parser_lr.rs: An LR-based parser for the C programming language.
- [Bootstrap Parser]https://github.com/ehwan/RustyLR/blob/main/rusty_lr_parser/src/parser/parser.rs: RustyLR's own grammar parser, written using RustyLR.

---

## Cargo Features

- **`build`**: Enables helper functions in `rusty_lr_buildscript` for compiling grammars inside `build.rs` scripts.
- **`tree`**: Enables automatic syntax tree rendering for debugging. Implementing `Display` for `Context` outputs a formatted parse tree.

---

## Grammar Syntax

RustyLR's syntax builds upon standard Yacc/Bison design but is optimized for Rust.

See [SYNTAX.md](SYNTAX.md) for the complete reference.

### Type Inference with `_`
You can omit explicit rule types using the `_` placeholder. RustyLR will infer the type based on identity rules and reduce actions:

```rust
Expr(_): Term;
```

If a circular dependency prevents inference, RustyLR will report a compilation error.

---

## Contributing

We welcome issues and pull requests!

### Project Structure

This repository is organized as a Cargo workspace:

- **`rusty_lr/`**: The main user-facing library. Add this to your `Cargo.toml`.
- **`rusty_lr_core/`**: The runtime engine, defining stack logic, deterministic parsing (`src/parser/deterministic`), and GLR parsing (`src/parser/nondeterministic`).
- **`rusty_lr_parser/`**: The grammar compilation engine. Parses RustyLR files, constructs parsing tables, and generates Rust output.
- **`rusty_lr_derive/`**: Procedural macro wrapper around `rusty_lr_parser`, providing the `lr1!` macro.
- **`rusty_lr_buildscript/`**: Helper API for running RustyLR in cargo build scripts.
- **`rusty_lr_executable/`**: The standalone `rustylr` CLI executable.
- **`scripts/`**: Automation, regression test suites, and helper scripts.

```mermaid
graph TD;
    subgraph User Facing
        rusty_lr;
        rusty_lr_executable;
    end

    subgraph Internal
        rusty_lr_derive;
        rusty_lr_buildscript;
        rusty_lr_parser;
        rusty_lr_core;
    end

    rusty_lr --> rusty_lr_core;
    rusty_lr --> rusty_lr_derive;
    rusty_lr --> rusty_lr_buildscript;

    rusty_lr_derive --> rusty_lr_parser;
    rusty_lr_buildscript --> rusty_lr_parser;
    
    rusty_lr_executable --> rusty_lr_buildscript;

    rusty_lr_parser --> rusty_lr_core;
```

### Versioning Policy
RustyLR separates its components into two parts:
1. The compiler CLI (`rustylr`)
2. The runtime library (`rusty_lr`)

To maintain Cargo compatibility, patch versions are incremented when changes are backwards-compatible (meaning previously generated parser files compile without errors with the new library version). If a change to the code generator requires updates to the runtime library API that break older generated code, a minor version bump is performed.

---

## License

Dual-licensed under either:
- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)