rusty_lr
A Bison-like parser generator & compiler frontend for Rust supporting IELR(1), LALR(1) parser tables, with deterministic LR and non-deterministic LR (GLR) parsing.
RustyLR is a parser generator that converts context-free grammars into IELR(1)/LALR(1) tables with deterministic LR and non-deterministic GLR parsing strategies. It supports custom reduce actions in Rust, with beautiful diagnostics. Highly inspired by tools like bison, it uses a similar syntax while integrating seamlessly with Rust's ecosystem. It constructs optimized state machines, ensuring efficient and reliable parsing.

Features
- Custom Reduce Actions: Define custom actions in Rust, allowing you to build custom data structures easily.
- Automatic Optimization: Reduces parser table size and improves performance by grouping terminals with identical behavior across parser states.
- Multiple Parsing Strategies: Supports minimal-LR(1), LALR(1) parser tables, and GLR parsing strategy.
- Detailed Diagnostics: Detects grammar conflicts, verbose conflict resolution stages, and optimization stages.
- Static & Runtime Conflict Resolution: Provides mechanisms to resolve conflicts at compile time or runtime.
- Location Tracking: Tracks the location of every token in the parse tree, useful for error reporting and debugging.
Installation & Usage
Add RustyLR to your Cargo.toml:
[]
= "..."
To work with rusty_lr, you need to generate parser code using one of the following methods:
- Executable: Use the standalone
rustylrexecutable to generate parser code - Build script: Enable the
buildfeature and generate parser code during the build process[] = { = "...", = ["build"] } - Procedural macros: Use the built-in
lr1!macro
Recommendation: Use the rustylr executable. It's faster and provides helpful grammar diagnostics, and commands for debugging state machines directly.
Important: Ensure the version of the generated code targets the same version of rusty_lr in your Cargo.toml. Otherwise, you may encounter build errors.
Using Procedural Macros
Define your grammar using the lr1! macro:
// This defines an `EParser` struct where `E` is the start symbol
lr1!
This defines a simple arithmetic expression parser that can handle expressions like 2 + 3 * 4.
Using Build Script
For complex grammars, you can use a build script to generate the parser. This approach provides more detailed error messages when conflicts occur.
1. Create a grammar file (e.g., src/parser.rs) with the following content:
// Rust code: `use` statements and type definitions
use HashMap;
%% // Grammar definition starts here
%tokentype MyToken;
%start E;
%token id Identifier;
%token num Number;
E: id
| num
;
2. Set up build.rs:
// build.rs
use build;
3. Include the generated source code:
include!;
4. Use the parser in your code:
let parser = new; // Create <StartSymbol>Parser instance
let mut context = new; // Create <StartSymbol>Context instance
let mut userdata: i32 = 0;
for token in tokens
// Get the final parsed result
let result: i32 = context.accept.unwrap;
Using the rustylr Executable
See the Executable Documentation for more details.
Generated Code Structure
The generated code will include several structs and enums:
<Start>Parser: A struct that holds the parser table. (LR docs) (GLR docs)<Start>Context: A struct that maintains the current parsing state and symbol values. (LR docs) (GLR docs)<Start>State: A type representing a parser state and its associated table.<Start>Rule: A type representing a production rule. (docs)<Start>NonTerminals: An enum representing all non-terminal symbols in the grammar. (docs)
Working with Context
You can also get contextual information from the <Start>Context struct:
let mut context = new;
// ... parsing ...
context.expected_token; // Get expected (terminal, non-terminal) symbols for current state
context.can_feed; // Check if a terminal symbol can be fed
context.trace; // Get all `%trace` non-terminals currently being parsed
println!; // Print backtrace of the parser state
println!; // Print tree structure of the parser state (`tree` feature)
The Feed Method
The generated code includes a feed method that processes tokens:
context.feed; // Feed a terminal symbol and update the state machine
context.feed_location; // Feed a terminal symbol with location tracking
This method returns Ok(()) if the token was successfully parsed, or an Err if there was an error.
Note: The actual method signatures differ slightly when building a GLR parser.
GLR Parsing
RustyLR offers built-in support for Generalized LR (GLR) parsing, enabling it to handle ambiguous or nondeterministic grammars that traditional LR(1) or LALR(1) parsers cannot process. See GLR.md for details.
Error Handling and Conflict Resolution
RustyLR provides multiple mechanisms for handling semantic errors and resolving conflicts during parsing:
- Panic Mode Error Recovery: Use the
errortoken for panic-mode error recovery - Operator Precedence: Set precedence with
%left,%right,%precedencefor terminals - Reduce Rule Priority: Set priority with
%dprecfor production rules - Runtime Errors: Return
Errfrom reduce actions to handle semantic errors
See SYNTAX.md - Resolving Conflicts for detailed information.
Location Tracking
Track the location of tokens and non-terminals for better error reporting and debugging:
Expr: exp1=Expr '+' exp2=Expr
| Expr error Expr
See SYNTAX.md - Location Tracking for detailed information.
Examples
- Calculator (enum version): A numeric expression parser using custom token enums
- Calculator (u8 version): A numeric expression parser using byte tokens
- JSON Validator: A JSON syntax validator
- Lua 5.4 syntax parser: A complete Lua language parser
- C language parser: A C language parser
- Bootstrap parser: RustyLR's own syntax parser is written in RustyLR itself
Cargo Features
build: Enables build script tools for generating parsers at compile time.tree: Enables automatic syntax tree construction for debugging purposes. MakesContextimplementDisplayfor pretty-printing.
Grammar Syntax
RustyLR's grammar syntax is inspired by traditional Yacc/Bison formats. See SYNTAX.md for detailed grammar definition syntax.
Contributing
Contributions are welcome! Please feel free to open an issue or submit a pull request.
Project Structure
This project is organized as a Cargo workspace with the following crates:
rusty_lr/: The main end-user library that provides the public API. This is what users add to theirCargo.toml.rusty_lr_core/: Core parsing engine containing the fundamental data structures, algorithms, and runtime components for both deterministic (src/parser/deterministic) and non-deterministic (src/parser/nondeterministic) parsing.rusty_lr_parser/: The main code generation engine that parses RustyLR's grammar syntax, builds parser tables, and generates the actual parser code. This is the core of the parser generation process.rusty_lr_derive/: Procedural macro interface that wrapsrusty_lr_parserto provide thelr1!macro for inline grammar definitions.rusty_lr_buildscript/: Build script interface that wrapsrusty_lr_parserfor generating parser code at compile time when using thebuildfeature.rusty_lr_executable/: Standalonerustylrexecutable for command-line parser generation.scripts/: Development and testing scripts
The crates have the following dependency relationships:
rusty_lrdepends onrusty_lr_core,rusty_lr_derive, andrusty_lr_buildscript(optional)rusty_lr_deriveandrusty_lr_buildscriptdepend onrusty_lr_parserrusty_lr_parserdepends onrusty_lr_corerusty_lr_executabledepends onrusty_lr_buildscript
About the Versioning
RustyLR consists of two big parts:
- executable (
rustylr), the code generator - runtime (
rusty_lr), the main library
Since the cargo automatically uses the latest patch in major.minor.patch version of a crate, we increase the patch number only if the generated code is compatible with the runtime. That is, for any user who is not using buildscript or proc-macro, and using the executable-generated code itself,
any code change that could make compile errors with the previous generated code will result in a minor version bump.
License
This project is dual-licensed under either of the following licenses, at your option:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)