Expand description

Rustlr is an LR(1)/LALR(1) parser generator for Rust. Advanced features include:

  1. Option to automatically generate the AST datatypes and semantic actions, with manual overrides possible
  2. Ability to train the parser interactively for better error reporting
  3. External state access allows parsers go to beyond CFGs
  4. Support for using *, ? +, and “unexpected token” _ in grammar productions.

A tutorial is separately available that will explain the format of grammars and how to generate and deploy parsers for several examples.

Recent Updates: Version 0.2.3 added the ability to automatically generate a usable lexical scanner from a minimal set of specifications. Version 0.2.5 added the ability for semantics actions to return values of different types. Version 0.2.8 introduced the ability to automatically generate the abstract syntax data types and the corresponding semantic actions. Support for *, + and ? expressions as well as an experimental wildcard expression were introduced in version 0.2.9.

In addition to creating LR/LALR state machines using the classic algorithms, rustlr is capable of recognizing operator precedence and associativity declarations that allow the use of ambiguous grammars. Parsers also have optional access to external state information that allows them to recognize more than just context-free languages. Rustlr implements methods of error recovery similar to those found in other LR generators. For error reporting, however, rustlr parsers can run in training mode: when a parse error is encountered, the human trainer can augment the parser’s state transition table with an appropriate error message. The augmented parser is automatically saved along with a training script that can be used to retrain a new parser after a grammar has been modified. See the ZCParser::parse_train and ZCParser::train_from_script functions.

The parser generator can also generate a lexical scanner using the built-in StrTokenizer from a minimal set of declarations. The generated scanner is “zero-copy” and good enough for processing most text. However, any scanner can be used by adopting it to the Tokenizer trait. A separate chapter of the tutorial will be devoted to manually adopting a tokenizer.

Rustlr should be installed as an executable (cargo install rustlr). Many of the items exported by this crate are only required by the parsers that are generated, and are not intended to be used in other programs. However, rustlr uses traits and trait objects to loosely couple the various components of the runtime parser so that custom interfaces, such as those for graphical IDEs, can built around a basic ZCParser::parse_core function.

As a simplified, self-contained example of how to use rustlr, given this grammar with file name “brackets.grammar”,

 rustlr brackets.grammar

generates a LALR parser as a rust program. This program includes a ‘make_parser’ function and a ‘bracketslexer’ structure which represents the lexical scanner. The program also contains a ‘load_extras’ function, which can be modified by interactive training to give more helpful error messages other than the generic “unexpected symbol..”.

Re-exports

pub use lexer_interface::*;
pub use generic_absyn::*;
pub use runtime_parser::RuntimeParser;
pub use runtime_parser::RProduction;
pub use zc_parser::ZCParser;
pub use zc_parser::ZCRProduction;

Modules

Generic Abstract Syntax Support Module

Rustlr allows the use of any lexical analyzer (tokenizer) that satisfies the Tokenizer trait. However, a basic tokenizer, StrTokenizer is provided that suffices for many examples. This tokenizer is not maximally efficient (not single-pass) as it uses regex.

This module is deprecated by the crate::zc_parser module and is only kept for compatibility with existing parsers.

This module implements a zero-copy version of the runtime parser that uses the LR statemachine generated by rustlr. It will (for now), live side along with the original parser implemented as crate::RuntimeParser. Since Version 0.2.3, this module can now generate a basic lexical scanner based on crate::RawToken and crate::StrTokenizer.

Macros

macro for downcasting LBox<dyn Any> to LBox<A> for some concrete type A. Must be called from within the semantic actions of grammar productions. Warning: unwrap is called within the macro

similar to lbdown, but also extracts the boxed expression

macro for creating LBox<dyn Any> structures that can encapsulate any type as abstract syntax. Must be called from within the semantic actions of a grammar production rule.

macro for creating an LBox from a crate::StackedItem ($si) popped from the parse stack; should be called from within the semantics actions of a grammar to accurately encode lexical information.

similar to makelbox but creates an LRc from lexical information inside stack item $si

just extract value from LBox

Enums

this enum is only exported because it’s used by the generated parsers. There is no reason to use it in other programs.

Constants

Functions

this function is only exported because it’s used by the generated parsers.