Crate rustlr[][src]

Expand description

rustlr is a parser generator that can create LALR(1) as well as full LR(1) parsers. It is also capable of recognizing operator precedence and associativity declarations that allows the use of some ambiguous grammars. Parsers also have optional access to external state information that allows them to recognize more than just context-free languages. Rustlr implements methods of error recovery similar to those found in other LR generators. For error reporting, however, rustlr parsers can run in training mode, in which, when a parse error is encountered, the human trainer can augment the parser’s state transition table with an appropriate error message. The augmented parser is automatically saved along with a training script that can be used to retrain a new parser after a grammar has been modified. See the RuntimeParser::parse_stdio_train and RuntimeParser::train_from_script functions.

The parser can generate a full LR(1) parser given the ANSI C grammar in approximately 2-4 seconds on contemporary processors.

A detailed tutorial is being prepared that will explain the format of grammars and how to generate and use parsers for several sample languages.

Rustlr should be installed as an executable (cargo install rustlr). Many of the items exported by this crate are only required by the parsers that are generated, and are not intended to be used in other programs. The user needs to provide a grammar and a lexical analyzer that implements the Lexer trait. Only a simple lexer that returns individual characters in a string (charlexer) is provided. The examples in the tutorial use basic_lexer, which was written by the same author but other tokenizers can be easily adopted as well, such as scanlex.

As a simplified, self-contained example of how to use rustlr, given this grammar with file name “brackets.grammar”,

 rustlr brackets.grammar lalr

generates a LALR parser as a rust program. This program includes a ‘make_parser’ function, which can be used as in the included main. The program also contains a ‘load_extras’ function, which can be modified by interactive training to give more helpful error messages other than the generic “unexpected symbol..”.

Another self-contained example is found here.

Re-exports

pub use lexer_interface::*;
pub use generic_absyn::*;
pub use runtime_parser::RuntimeParser;
pub use runtime_parser::RProduction;

Modules

Generic Abstract Syntax Support Module

Rustlr allows the use of any lexical analyzer (tokenizer) that satisfies the Lexer trait. Only a simple charlexer tokenizer that separates non-whitespaces characters is provided as an example.

This module implements the parsing routines that uses the state machine generated by rustlr. The main structure here is RuntimeParser. All parsing functions are organized around the RuntimeParser::parse_base function, which implements the basic LR parsing algorithm. This function expects dynamic Lexer and ErrHandler trait-objects. This module provides generic parsing and parser-training routines that use stdio for interface, but the ErrHandler trait allows custom user interfaces to be build separately.

Macros

macro for downcasting LBox<dyn Any> to a concrete type. Must be called from within the semantic actions of grammar productions. Warning: unwrap is called within the macro

similar to lbdown, but also extracts the boxed expression, should use for non-copiable LBoxed values.

macro for creating LBox<dyn Any> structures that can encapsulate any type as abstract syntax. Must called from within the semantic actions of a grammar production rule as it calls the RuntimeParser::lb function to insert the lexical line/column/src information into the LBox.

Enums

this enum is only exported because it’s used by the generated parsers. There is no reason to use it in other programs.

Functions

this function is only exported because it’s used by the generated parsers.

this is the only function that can invoke the parser generator externally, without running rustlr (rustlr::main) directly. It expects to find a file of the form grammarname.grammar. The option argument that can currently only be “lr1” or “lalr”. It generates a grammar in a file named grammarnameparser.rs.