Expand description

Rustlr is a Yacc-style parser generator in and for Rust, designed for the creation of parsers for programming language analysis. Version 0.2.0 introduces signficant improvements, although older parsers are still supported.

Rustlr can create LALR(1) as well as full LR(1) parsers. It is also capable of recognizing operator precedence and associativity declarations that allows the use of some ambiguous grammars. Parsers also have optional access to external state information that allows them to recognize more than just context-free languages. Rustlr implements methods of error recovery similar to those found in other LR generators. For error reporting, however, rustlr parsers can run in training mode, in which, when a parse error is encountered, the human trainer can augment the parser’s state transition table with an appropriate error message. The augmented parser is automatically saved along with a training script that can be used to retrain a new parser after a grammar has been modified. See the ZCParser::parse_train and ZCParser::train_from_script functions.

The parser can generate a full LR(1) parser given the ANSI C grammar in approximately 2-4 seconds on contemporary processors.

The user needs to provide a grammar and a lexical analyzer that implements the Tokenizer trait. Since Version 0.2.0, rustlr contains a general-purpose lexical scanner, StrTokenizer, that implements this trait and is good enough to “get the job done” in many cases. However, the user can choose any tokenizer by adopting it to the Tokenizer trait. Please note that, although there is nothing that prevents it, rustlr was not designed to create parsers for binary formatted data. Rather, it is designed to parse text, and specifically programming language syntax for compilation and analysis.

A detailed tutorial is being prepared that will explain the format of grammars and how to generate and deploy parsers for several examples.

Rustlr should be installed as an executable (cargo install rustlr). Many of the items exported by this crate are only required by the parsers that are generated, and are not intended to be used in other programs. However, rustlr uses traits and trait objects to loosely couple the various components of the runtime parser so that custom interfaces, such as those for graphical IDEs, can built around a basic ZCParser::parse_core function.

As a simplified, self-contained example of how to use rustlr, given this grammar with file name “brackets.grammar”,

 rustlr brackets.grammar lalr

generates a LALR parser as a rust program. This program includes a ‘make_parser’ function, which can be used as in the included main. The program also contains a ‘load_extras’ function, which can be modified by interactive training to give more helpful error messages other than the generic “unexpected symbol..”.

Another self-contained example is found here.

Re-exports

pub use lexer_interface::*;
pub use generic_absyn::*;
pub use runtime_parser::RuntimeParser;
pub use runtime_parser::RProduction;
pub use zc_parser::ZCParser;
pub use zc_parser::ZCRProduction;

Modules

Generic Abstract Syntax Support Module

Rustlr allows the use of any lexical analyzer (tokenizer) that satisfies the Tokenizer trait. However, a basic tokenizer, StrTokenizer is provided that suffices for many examples. This tokenizer is not maximally efficient (not single-pass) as it uses regex.

This module is deprecated by the crate::zc_parser module and is only kept for compatibility with existing parsers.

This module implements a zero-copy version of the runtime parser that uses the LR statemachine generated by rustlr. It will (for now), live side by side with the original parser implemented as crate::RuntimeParser.

Macros

macro for downcasting LBox<dyn Any> to LBox<A> for some concrete type A. Must be called from within the semantic actions of grammar productions. Warning: unwrap is called within the macro

similar to lbdown, but also extracts the boxed expression

macro for creating LBox<dyn Any> structures that can encapsulate any type as abstract syntax. Must called from within the semantic actions of a grammar production rule as it calls the RuntimeParser::lb function to insert the lexical line/column/src information into the LBox.

macro for creating an LBox from a crate::StackedItem ($si) popped from the parse stack; should be called from within the semantics actions of a grammar to accurately encode lexical information.

similar to makelbox but creates an LRc from lexical information inside stack item $si

just extract value from LBox

Enums

this enum is only exported because it’s used by the generated parsers. There is no reason to use it in other programs.

Functions

this function is only exported because it’s used by the generated parsers.

this is the only function that can invoke the parser generator externally, without running rustlr (rustlr::main) directly. It expects to find a file of the form grammarname.grammar. The option argument that can currently only be “lr1” or “lalr”. It generates a grammar in a file named grammarnameparser.rs.