RustyLR
RustyLR will provide you a LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
[dependencies]
rusty_lr = "0.7.3"
Features
- pure Rust implementation
- readable error messages, both for grammar building and parsing
- compile-time DFA construction from CFGs ( with proc-macro )
- customizable reducing action
- resolving conflicts of ambiguous grammar
- tracing parser action with callback, also error handling
Sample
In example/calculator/parser.rs,
use lr1;
use lalr1;
// this define struct `EParser`
// where 'E' is the start symbol
lalr1!
In example/calculator/src/main.rs,
The result will be:
[Num(3)] [Plus] [Num(4)]
[Num(3), Plus, Num(4)]
[Num(1)] [Plus] [Num(2), Star, LParen, Num(3), Plus, Num(4), RParen]
[Num(1), Plus, Num(2), Star, LParen, Num(3), Plus, Num(4), RParen]
15
userdata: 2
Build Deterministic Finite Automata (DFA) from Context Free Grammar (CFG)
This section will describe how to build DFA from CFGs, on runtime.
1. Define terminal and non-terminal symbols
// must implement these traits
// must implement these traits
/// impl Display for TermType, NonTermType will make related ProductionRule, error message Display-able
Or simply, you can use char or u8 as terminal, and &'static str or String as non-terminal.
Any type that implements traits above can be used as terminal and non-terminal symbols.
2. Define production rules
Consider the following context free grammar:
A -> A + A (reduce left)
A -> M
This grammar can be written as:
/// type alias
type Token = Token;
/// create grammar
let mut grammar = new;
grammar.add_rule;
grammar.add_rule;
/// set reduce type
grammar.set_reduce_type;
Note that the production rule A -> A + A has a shift/reduce conflict, and the reduce type is set to ReduceType::Left for terminal symbol Plus to resolve the conflict. Default will cause an error when a conflict occurs.
reduce/reduce conflict (e.g. duplicated rules) will be always an error.
3. Build DFA
Calling grammar.build() or grammar.build_lalr() will build the DFA from the CFGs.
let parser:Parser = match grammar.build ;
You must explicitly specify the Augmented non-terminal symbol, and the production rule
Augmented -> StartSymbol $
must be defined in the grammar.
The returned Parser struct contains the DFA and the production rules(cloned). It is completely independent from the Grammar struct, so you can drop the Grammar struct, or export the Parser struct to another module.
4. Error messages
The Error type returned from Grammar::build() will contain the error information.
Error is Display if both Term and NonTerm is Display, and It is Debug if both Term and NonTerm is Debug.
For Shift/Reduce conflicts,
Build failed: Shift/Reduce Conflict
NextTerm: '0'
Reduce Rule:
"Num" -> "Digit"
Shift Rules:
"Digit" -> '0' • /Lookaheads: '\0', '0'
Try rearanging the rules or set ReduceType to Terminal '0' to resolve the conflict.
For Reduce/Reduce Conflicts,
Build failed: Reduce/Reduce Conflict with lookahead: '\0'
Production Rule1:
"Num" -> "Digit"
Production Rule2:
"Num" -> "Digit"
Parse input sequence with generated DFA
For given input sequence, you can start parsing with Parser::begin() method. Once you get the Context from begin(), you will feed the input sequence to the parser with parser.feed() method.
let terms = vec!;
// start parsing
let mut context = parser.begin;
// feed input sequence
for term in terms
Note that EOF token is feeded at the end of sequence, and the augmented rule Augmented -> StartSymbol $ will not be reduced since there are no lookahead symbols.
Parse with callback
For complex error handling and tracing parser action, you can implement Callback trait and pass it to *_callback(...) methods.
// Num + Num * ( Num + Num )
let terms = vec!;
// start parsing
let mut context = parser.begin;
let mut callback = ParserCallback ;
// feed input sequence
for term in terms
The result will be:
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by P -> Num
Reduce by M -> P
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by A -> A + A
Reduce by E -> A
Reduce by P -> ( E )
Reduce by M -> P
Reduce by M -> M * M
Reduce by A -> M
Reduce by A -> A + A
Reduce by E -> A
proc-macro lr1! and lalr1!
lr1! and lalr1! are procedural macros that will generate Parser struct from CFGs at compile time.
Every line in the macro must follow the syntax below.
Token type (must defined)
'%tokentype' <RustType> ';'
Define the type of terminal symbols.
<RustType> must be accessible at the point where the macro is called.
Token definition (must defined)
'%token' <Ident> <RustExpr> ';'
Map terminal symbols' name <Ident> to the actual value <RustExpr>.
<RustExpr> must be accessible at the point where the macro is called.
Start symbol (must defined)
'%start' <Ident> ';'
Define the start symbol of the grammar.
Eof symbol (must defined)
'%eof' <RustExpr> ';'
Define the eof terminal symbol.
<RustExpr> must be accessible at the point where the macro is called.
'eof' terminal symbol will be automatically added to the grammar.
Userdata type (optional)
'%userdata' <RustType> ';'
Define the type of userdata passed to feed() function.
Reduce type (optional)
'%left' <Ident> ';'
'%right' <Ident> ';'
Set the reduce precedence of terminal symbols. <Ident> must be defined in %token.
Production rules
<Ident><RuleType>
':' <Ident>* <ReduceAction>
'|' <Ident>* <ReduceAction>
...
';'
Define the production rules.
<Ident> must be valid terminal or non-terminal symbols.
(optional)
<RuleType> : '(' <RustType> ')'
|
;
<RuleType> is optional, this will define the type of value that this production rule will contains.
(optional)
<ReduceAction> : '{' <RustExpr> '}'
|
;
<ReduceAction> is optional,
this will define the action to be executed when the rule is reduced.
If <RuleType> is defined, <ReduceAction> itself must be the value of <RuleType> (i.e. no semicolon at the end of the statement).
Predefined variables can be used in <ReduceAction>:
s0,s1,s2, ... : slice of shifted terminal symbols&[<TermType>]captured by N'th symbols: slice of shifted terminal symbols&[<TermType>]captured by current rule.v0,v1,v2, ... : value of N'th symbol. If N'th symbol is Terminal, it will be&<TermType>, and if it is NonTerminal, it will bemut <RuleType>.data: userdata passed tofeed()function.
Result<(), String> can be returned from <ReduceAction>, which will be treated as error.
Start Parsing
lr1! and lalr1! will generate struct <StartSymbol>Parser where <StartSymbol> is the name of the start symbol in the grammar.
The struct will have the following methods:
new(): create new parserbegin(): create new contextfeed(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>: feed token to the parserfeed_callback(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>: feed token with callback
Note that &mut UserData is omitted if %userdata is not defined.
Once the input sequence (including eof token) is feeded, without errors, you can get the value of start symbol by calling context.accept().
let parser = new;
// create context
let mut context = parser.begin;
// define userdata
let mut userdata: i32 = 0;
// start feeding tokens
for token in input_sequence
// res = value of start symbol
let res = context.accept;
println!;
println!;