rusty_lr 0.7.3 - Docs.rs

# RustyLR
RustyLR will provide you a LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).

```
[dependencies]
rusty_lr = "0.7.3"
```

## Features
 - pure Rust implementation
 - readable error messages, both for grammar building and parsing
 - compile-time DFA construction from CFGs ( with proc-macro )
 - customizable reducing action
 - resolving conflicts of ambiguous grammar
 - tracing parser action with callback, also error handling

## Sample

In [`example/calculator/parser.rs`](example/calculator/src/parser.rs),
```rust
use rusty_lr::lr1;
use rusty_lr::lalr1;

enum Token {
    // token definitions
    ...
}

// this define struct `EParser`
// where 'E' is the start symbol
lalr1! {
    // type of userdata
    %userdata i32;
    // type of token ( as Terminal symbol )
    %tokentype Token;

    // start symbol
    %start E;
    // eof symbol; for augmented rule generation
    %eof Token::Eof;

    // define tokens
    %token num Token::Num(0); // `num` maps to `Token::Num(0)`
    %token plus Token::Plus;
    %token star Token::Star;
    %token lparen Token::LParen;
    %token rparen Token::RParen;

    // resolving shift/reduce conflict
    %left plus;
    %left star;

    // s{N} is slice of shifted terminal symbols captured by N'th symbol
    // v{N} is value of N'th symbol ( if it has value )
    // s is slice of shifted terminal symbols captured by current rule
    // userdata canbe accessed by `data` ( &mut i32, for current situation )
    A(i32) : A plus A {
            println!("{:?} {:?} {:?}", s0, s1, s2 );
            //                         ^   ^   ^
            //                         |   |   |- slice of 2nd 'A'
            //                         |   |- slice of 'plus'
            //                         |- slice of 1st 'A'
            println!( "{:?}", s );
            *data += 1;
            v0 + v2 // --> this will be new value of current 'A'
        //  ^    ^ 
        //  |    |- value of 2nd 'A'
        //  |- value of 1st 'A'
        }
      | M { v0 }
      ;

    M(i32) : M star M { v0 * v2 }
      | P { v0 }
      ;

    P(i32) : num {
        if let Token::Num(n) = v0 { *n }
        else { return Err(format!("{:?}", s0)); }
        //            ^^^^^^^^^ reduce action returns Result<(), String>
    }
      | lparen E rparen { v1 }
      ;

    E(i32) : A  { v0 };
}
```

In [`example/calculator/src/main.rs`](example/calculator/src/main.rs),
```rust
mod parser;

fn main() {
    use parser::Token;
    // 1 + 2 * ( 3 + 4 ) = 15
    let input = vec![
        Token::Num(1),
        Token::Plus,
        Token::Num(2),
        Token::Star,
        Token::LParen,
        Token::Num(3),
        Token::Plus,
        Token::Num(4),
        Token::RParen,
        Token::Eof,
    ];

    let parser = parser::EParser::new();
    let mut context = parser.begin();
    let mut userdata: i32 = 0;
    for token in input {
        match parser.feed(&mut context, token, &mut userdata) {
            //                          ^^^^^   ^^^^^^^^^^^^ userdata passed here as `&mut i32`
            //                           |- feed token
            Ok(_) => {}
            Err(e) => {
                println!("{:?}", e);
                return;
            }
        }
    }
    // res = value of start symbol ( E(i32) )
    let res = context.accept();
    println!("{}", res);
    println!("userdata: {}", userdata);
}
```

The result will be:
```
[Num(3)] [Plus] [Num(4)]
[Num(3), Plus, Num(4)]
[Num(1)] [Plus] [Num(2), Star, LParen, Num(3), Plus, Num(4), RParen]
[Num(1), Plus, Num(2), Star, LParen, Num(3), Plus, Num(4), RParen]
15
userdata: 2
```

## Build Deterministic Finite Automata (DFA) from Context Free Grammar (CFG)
This section will describe how to build DFA from CFGs, on runtime.

### 1. Define terminal and non-terminal symbols

```rust
#[derive(Clone, Hash, PartialEq, Eq, PartialOrd, Ord)] // must implement these traits
pub enum Term {
    Num,
    Plus,
    Mul,
    LeftParen,
    RightParen,
    Eof,
}

#[derive(Clone, Hash, PartialEq, Eq)] // must implement these traits
pub enum NonTerm {
    E,
    A,
    M,
    P,
    Augmented,
}

/// impl Display for TermType, NonTermType will make related ProductionRule, error message Display-able
impl Display for TermType { ... }
impl Display for NonTermType { ... }
```
Or simply, you can use `char` or `u8` as terminal, and `&'static str` or `String` as non-terminal.
***Any type*** that implements traits above can be used as terminal and non-terminal symbols.

### 2. Define production rules
Consider the following context free grammar:
```
A -> A + A  (reduce left)
A -> M
```
This grammar can be written as:
```rust
/// type alias
type Token = rusty_lr::Token<Term, NonTerm>;

/// create grammar
let mut grammar = rusty_lr::Grammar::<Term,NonTerm>::new();

grammar.add_rule(
    NonTerm::A,
    vec![Token::NonTerm(NonTerm::A), Token::Term(Term::Plus), Token::NonTerm(NonTerm::A)],
);
grammar.add_rule(
    NonTerm::A,
    vec![Token::NonTerm(NonTerm::M)],
);

/// set reduce type
grammar.set_reduce_type( Term::Plus, ReduceType::Left );
```

Note that the production rule `A -> A + A` has a shift/reduce conflict, and the reduce type is set to `ReduceType::Left` for terminal symbol `Plus` to resolve the conflict. Default will cause an error when a conflict occurs.

reduce/reduce conflict (e.g. duplicated rules) will be always an error.

### 3. Build DFA
Calling `grammar.build()` or `grammar.build_lalr()` will build the DFA from the CFGs.

```rust
let parser:rusty_lr::Parser<Term,NonTerm> = match grammar.build(NonTerm::Augmented) {
    Ok(parser) => parser,
    Err(err) => {
        // error is Display if Term, NonTerm is Display
        eprintln!("{}", err);
        return;
    }
};
```

You must explicitly specify the Augmented non-terminal symbol, and the production rule
```
Augmented -> StartSymbol $
```
must be defined in the grammar.

The returned `Parser` struct contains the DFA and the production rules(cloned). It is completely independent from the `Grammar` struct, so you can drop the `Grammar` struct, or export the `Parser` struct to another module.

### 4. Error messages
The `Error` type returned from `Grammar::build()` will contain the error information.
`Error` is `Display` if both `Term` and `NonTerm` is `Display`, and It is `Debug` if both `Term` and `NonTerm` is `Debug`.

For Shift/Reduce conflicts,
```
Build failed: Shift/Reduce Conflict
NextTerm: '0'
Reduce Rule:
"Num" -> "Digit"
Shift Rules:
"Digit" -> '0' • /Lookaheads: '\0', '0'
Try rearanging the rules or set ReduceType to Terminal '0' to resolve the conflict.
```
For Reduce/Reduce Conflicts,
```
Build failed: Reduce/Reduce Conflict with lookahead: '\0'
Production Rule1:
"Num" -> "Digit"
Production Rule2:
"Num" -> "Digit"
```

## Parse input sequence with generated DFA
For given input sequence, you can start parsing with `Parser::begin()` method. Once you get the `Context` from `begin()`, you will feed the input sequence to the parser with `parser.feed()` method.

```rust
let terms = vec![ Term::Num, Term::Plus, Term::Num, Term::Mul, Term::LeftParen, Term::Num, Term::Plus, Term::Num, Term::RightParen, Term::Eof];

// start parsing
let mut context = parser.begin();

// feed input sequence
for term in terms {
    match parser.feed(&mut context, term) {
        Ok(_) => (),
        Err(err) => {
            eprintln!("{:?}", err);
            return;
        }
    }
}
```

Note that `EOF` token is feeded at the end of sequence, and the augmented rule `Augmented -> StartSymbol $` will not be reduced since there are no lookahead symbols.

### Parse with callback

For complex error handling and tracing parser action, you can implement `Callback` trait and pass it to `*_callback(...)` methods.

```rust
struct ParserCallback {}

impl rusty_lr::Callback<Term, NonTerm> for ParserCallback {
    /// Error type for callback
    type Error = String;

    fn reduce(
        &mut self,
        rules: &[rusty_lr::ProductionRule<char, String>],
        //                                  ^     |- NonTerm
        //                                  |- Term
        states: &[rusty_lr::State<char, String>],
        //                         ^     |- NonTerm
        //                         |- Term
        state_stack: &[usize],
        rule: usize,
    ) -> Result<(), Self::Error> {
        // `Rule` is Display if Term, NonTerm is Display
        println!("Reduce by {}", rules[rule]);
        Ok(())
    }
    fn shift_and_goto(
        &mut self,
        rules: &[rusty_lr::ProductionRule<char, String>],
        states: &[rusty_lr::State<char, String>],
        state_stack: &[usize],
        term: &char,
    ) -> Result<(), Self::Error> {
        Ok(())
    }
    fn shift_and_goto_nonterm(
        &mut self,
        rules: &[rusty_lr::ProductionRule<char, String>],
        states: &[rusty_lr::State<char, String>],
        state_stack: &[usize],
        nonterm: &String,
    ) -> Result<(), Self::Error> {
        Ok(())
    }
}
```

```rust
// Num + Num * ( Num + Num )
let terms = vec![ Term::Num, Term::Plus, Term::Num, Term::Mul, Term::LeftParen, Term::Num, Term::Plus, Term::Num, Term::RightParen, Term::Eof];

// start parsing
let mut context = parser.begin();
let mut callback = ParserCallback {};

// feed input sequence
for term in terms {
    match parser.feed_callback(&mut context, &mut callback, term) {
        Ok(_) => (),
        Err(err) => {
            eprintln!("{:?}", err);
            return;
        }
    }
}
```

The result will be:
```
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by P -> Num
Reduce by M -> P
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by P -> Num
Reduce by M -> P
Reduce by A -> M
Reduce by A -> A + A
Reduce by E -> A
Reduce by P -> ( E )
Reduce by M -> P
Reduce by M -> M * M
Reduce by A -> M
Reduce by A -> A + A
Reduce by E -> A
```

## proc-macro `lr1!` and `lalr1!`
`lr1!` and `lalr1!` are procedural macros that will generate `Parser` struct from CFGs at compile time.
Every line in the macro must follow the syntax below.

### Token type <sub><sup>(must defined)</sup></sub>
```
'%tokentype' <RustType> ';'
```
Define the type of terminal symbols.
`<RustType>` must be accessible at the point where the macro is called.

### Token definition <sub><sup>(must defined)</sup></sub>
```
'%token' <Ident> <RustExpr> ';'
```
Map terminal symbols' name `<Ident>` to the actual value `<RustExpr>`.
`<RustExpr>` must be accessible at the point where the macro is called.

### Start symbol <sub><sup>(must defined)</sup></sub>
```
'%start' <Ident> ';'
```
Define the start symbol of the grammar.

### Eof symbol <sub><sup>(must defined)</sup></sub>
```
'%eof' <RustExpr> ';'
```
Define the `eof` terminal symbol.
`<RustExpr>` must be accessible at the point where the macro is called.
'eof' terminal symbol will be automatically added to the grammar.

### Userdata type <sub><sup>(optional)</sup></sub>
```
'%userdata' <RustType> ';'
```
Define the type of userdata passed to `feed()` function.

### Reduce type <sub><sup>(optional)</sup></sub>
```
'%left' <Ident> ';'
'%right' <Ident> ';'
```
Set the reduce precedence of terminal symbols. `<Ident>` must be defined in `%token`.

### Production rules
```
<Ident><RuleType>
  ':' <Ident>* <ReduceAction> 
  '|' <Ident>* <ReduceAction> 
  ...
  ';'
```
Define the production rules.
`<Ident>` must be valid terminal or non-terminal symbols.

```
(optional)
<RuleType> : '(' <RustType> ')'
           |
           ;
```
`<RuleType>` is optional, this will define the type of value that this production rule will contains.

```
(optional)
<ReduceAction> : '{' <RustExpr> '}'
               |
               ;
```
`<ReduceAction>` is optional,
this will define the action to be executed when the rule is reduced.
If `<RuleType>` is defined, `<ReduceAction>` itself must be the value of `<RuleType>` (i.e. no semicolon at the end of the statement).

Predefined variables can be used in `<ReduceAction>`:
 - `s0`, `s1`, `s2`, ... : slice of shifted terminal symbols `&[<TermType>]` captured by N'th symbol
 - `s` : slice of shifted terminal symbols `&[<TermType>]` captured by current rule.
 - `v0`, `v1`, `v2`, ... : value of N'th symbol. 
 If N'th symbol is Terminal, it will be `&<TermType>`, 
 and if it is NonTerminal, it will be `mut <RuleType>`.
 - `data` : userdata passed to `feed()` function.

`Result<(), String>` can be returned from `<ReduceAction>`, which will be treated as error.

## Start Parsing
`lr1!` and `lalr1!` will generate struct `<StartSymbol>Parser` where `<StartSymbol>` is the name of the start symbol in the grammar.

The struct will have the following methods:
 - `new()` : create new parser
 - `begin()` : create new context
 - `feed(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>` : feed token to the parser
 - `feed_callback(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>` : feed token with callback

Note that `&mut UserData` is omitted if `%userdata` is not defined.
Once the input sequence (including `eof` token) is feeded, without errors, you can get the value of start symbol by calling `context.accept()`.

```rust
let parser = parser::EParser::new();
// create context
let mut context = parser.begin();
// define userdata
let mut userdata: i32 = 0;

// start feeding tokens
for token in input_sequence {
    match parser.feed(&mut context, token, &mut userdata) {
        //                          ^^^^^   ^^^^^^^^^^^^ userdata passed here as `&mut i32`
        //                           |- feed token
        Ok(_) => {}
        Err(e) => {
            println!("{:?}", e);
            return;
        }
    }
}
// res = value of start symbol
let res = context.accept();
println!("{}", res);
println!("userdata: {}", userdata);
```