Crate rustlr[][src]

Expand description

rustlr is a parser generator that can create LALR(1) as well as full LR(1) parsers. It is also capable of recognizing operator precedence and associativity declarations that allows the use of some ambiguous grammars. Parsers also have optional access to external state information that allows them to recognize more than just context-free languages. Rustlr implements methods of error recovery similar to those found in other LR generators. For error reporting, however, rustlr parsers can run in training mode, in which, when a parse error is encountered, the human trainer can augment the parser’s state transition table with an appropriate error message. The augmented parser is automatically saved. See the RuntimeParser::parse_train function.

The parser can generate a full LR(1) parser given the ANSI C grammar in approximately 2-4 seconds on contemporary processors.

A detailed tutorial is being prepared that will explain the format of grammars and how to generate and use parsers for several sample languages.

Most of the items exported by this crate are only required by the parsers that are generated, and does not form an API. The user needs to provide a grammar and a lexical analyzer that implements the Lexer trait. Only a simple lexer that returns individual characters in a string (charlexer) is provided. The examples in the tutorial use basic_lexer, which was written by the same author but other tokenizers can be easily adopted as well, such as scanlex.

Example

Given this grammar, with file name “calculator.grammar”,

 rustlr calculator.grammar lr1

generates a LR(1) parser as a rust program. This program includes a make_parser function, which can be used as in

 let mut scanner = Exprscanner::new(&sourcefile);
 let mut parser1 = make_parser();
 let absyntree = parser1.parse(&mut scanner);

Here, Exprscanner is a structure that must implement the Lexer trait required by the generated parser.

A relatively self-contained example, containing both a grammar and code for using its generated parser, is found here.

Structs

This structure is expected to be returned by the lexical analyzer (Lexer objects). Furthermore, the .sym field of a Lextoken must match the name of a terminal symbol specified in the grammar that defines the language. AT is the type of the value attached to the token, which is usually some enum that distinguishes between numbers, keywords, alphanumeric symbols and other symbols. See the tutorial and examples on how to define the right kind of AT type.

this structure is only exported because it is required by the generated parsers. There is no reason to use it in other programs.

this is the structure created by the generated parser. The generated parser program will contain a make_parser function that returns this structure. Most of the pub items are, however, only exported to support the operation of the parser, and should not be accessed directly. Only the functions RuntimeParser::parse, RuntimeParser::report, RuntimeParser::abort and RuntimeParser::error_occurred should be called directly from user programs. Only the field RuntimeParser::exstate should be accessed by user programs.

This is a sample Lexer implementation designed to return every non-whitespace character in a string as a separate token, and is used in small grammars for testing and illustration purposes. It is assumed that the characters read are defined as terminal symbols in the grammar.

Enums

this enum is only exported because it’s used by the generated parsers. There is no reason to use it in other programs.

Traits

This trait defines the interace that any lexical analyzer must be adopted to. The default implementations for linenum, column and current_line should be replaced. They’re provided only for compatibility.

Functions

this function is only exported because it’s used by the generated parsers.

this is the only function that can invoke the parser generator externally, without running rustlr (rustlr::main) directly. It expects to find a file of the form grammarname.grammar. The option argument that can currently only be “lr1” or “lalr”. It generates a grammar in a file named grammarnameparser.rs.