gazelle-parser 0.3.0

LR parser generator with runtime operator precedence and natural lexer feedback
Documentation
# Gazelle

An LR parser generator for Rust with runtime operator precedence and natural lexer feedback.

## What Makes Gazelle Different

### 1. Runtime Operator Precedence

Traditional grammars encode precedence through structure - one rule per level:

```
expr: add_expr;
add_expr: add_expr '+' mul_expr | mul_expr;
mul_expr: mul_expr '*' unary_expr | unary_expr;
// ... 10 more levels for a full language
```

Gazelle's `prec` terminals carry precedence at runtime:

```rust
gazelle! {
    grammar Calc {
        terminals {
            NUM: _,
            prec OP: _,  // precedence attached to each token
        }
        expr = expr OP expr => binop | NUM => literal;
    }
}
```

One rule. The lexer provides precedence per token:

```rust
'+' => calc::Terminal::Op('+', Precedence::Left(1)),
'*' => calc::Terminal::Op('*', Precedence::Left(2)),
```

This enables **user-defined operators** at runtime - see `examples/c11_calculator.rs`.

### 2. Natural Lexer Feedback

The infamous C typedef problem: is `T * x` a multiplication or pointer declaration? The lexer needs parser state to decide.

Traditional parsers hide the parse loop, requiring globals or hacks. Gazelle uses a push-based API - you drive the loop:

```rust
loop {
    let token = lexer.next(&actions.ctx)?;  // lexer sees current state
    parser.push(token, &mut actions)?;       // actions update state
    // next iteration: lexer sees updated state
}
```

No magic. The lexer and parser share state through `actions`, and you control when each runs. See `examples/c11/` for a complete C11 parser using this for typedef disambiguation.

### 3. Clean Separation of Grammar and Actions

The grammar is pure grammar:

```rust
gazelle! {
    grammar Calc {
        terminals { NUM: _, prec OP: _, LPAREN, RPAREN }

        expr = expr OP expr => binop
             | NUM => literal
             | LPAREN expr RPAREN => paren;
    }
}
```

Actions are split into a `Types` trait and per-node `Reducer` impls:

```rust
impl calc::Types for Evaluator {
    type Error = ParseError;
    type Num = f64;
    type Op = char;
    type Expr = f64;
}

impl gazelle::Reducer<calc::Expr<Self>> for Evaluator {
    fn reduce(&mut self, node: calc::Expr<Self>) -> Result<f64, ParseError> {
        Ok(match node {
            calc::Expr::Binop(left, op, right) => match op {
                '+' => left + right,
                '*' => left * right,
                _ => panic!("unknown op"),
            },
            calc::Expr::Literal(n) => n,
            calc::Expr::Paren(e) => e,
        })
    }
}
```

Reducer methods return `Result` - the error type is declared as `type Error: From<ParseError>` on the `Types` trait.

This gives you:
- Full IDE support in action code (autocomplete, type hints, go-to-definition)
- Compile errors point to your code, not generated code
- Multiple implementations (interpreter, AST builder, pretty-printer)
- Grammars reusable across different backends

### 4. CST to AST — You Choose

Parser generators typically force a choice: either you get a concrete syntax tree (CST) that you walk and transform, or you write untyped action callbacks and build your own result from scratch.

This reflects a real distinction, but not the one people usually describe. The conventional definition — "a CST keeps all tokens, an AST drops some" — confuses a consequence for a cause. A better way to think about it:

- A **CST** is a *concrete* tree: an actual data structure with nodes and children, faithfully representing the grammar's productions.
- An **AST** is the same structural information represented *abstractly* — through a sequence of typed reduction calls rather than a materialized tree.

When the tree is abstract, you only capture what you care about. A calculator folds `Add(2, 3)` straight into `5.0` — the parentheses, the operator token, even the child nodes were never stored because there was never a tree to store them in. The information loss traditionally associated with ASTs isn't a design choice; it's what naturally happens when you only materialize what you need.

Gazelle makes this a smooth continuum rather than a binary choice. The `gazelle!` macro generates an enum for each nonterminal whose variants mirror the grammar's alternatives, but the variant fields are generic — their types come from your `Types` trait implementation:

```rust
// Generated by gazelle! — variants match the grammar, types are yours
pub enum Expr<A: Types> {
    Binop(A::Expr, A::Op, A::Expr),
    Literal(A::Num),
    Paren(A::Expr),
}
```

Set the associated type to the generated enum itself, and a blanket impl kicks in — you get a full CST with zero action code:

```rust
impl calc::Types for CstBuilder {
    type Expr = Box<calc::Expr<Self>>;  // Box for recursive types
    type Num = f64;
    type Op = char;
    // No Reducer impl needed — blanket impl auto-boxes the node
}
```

Set it to `f64`, and you write a reducer that folds values during parsing — a fully custom AST (or no tree at all):

```rust
impl calc::Types for Evaluator {
    type Expr = f64;  // No tree, just a number
    type Num = f64;
    type Op = char;
}
```

Set it to `Ignore`, and the node is discarded — useful for validation-only parsing. Mix and match freely across nonterminals: auto-box some, evaluate others, ignore the rest. The same grammar, the same generated enums — the representation depends entirely on what types you plug in.

### 5. Parser Generator as a Library

Most parser generators are build tools. Gazelle exposes table construction as a library:

```rust
use gazelle::{parse_grammar, ErrorContext};
use gazelle::grammar::GrammarBuilder;
use gazelle::table::CompiledTable;
use gazelle::runtime::{Parser, Token};

// Option 1: Parse grammar from string
let grammar = parse_grammar(r#"
    start expr;
    terminals { NUM, PLUS, STAR }
    expr = expr PLUS term => add | term => term;
    term = term STAR factor => mul | factor => factor;
    factor = NUM => num;
"#).unwrap();

// Option 2: Build programmatically
let mut gb = GrammarBuilder::new();
let num = gb.t("NUM");
let plus = gb.t("PLUS");
let expr = gb.nt("expr");
gb.rule(expr, vec![expr, plus, expr]);
gb.rule(expr, vec![num]);
let grammar = gb.build();

// Build tables and parse
let compiled = CompiledTable::build(&grammar);
let mut parser = Parser::new(compiled.table());

let num_id = compiled.symbol_id("NUM").unwrap();
// ... push tokens with parser.maybe_reduce() and parser.shift()
```

Enables grammar analyzers, conflict debuggers, or parsers for dynamic grammars. See `examples/runtime_grammar.rs` for complete examples.

## Examples

### Calculator with User-Defined Operators

```
$ cargo run --example c11_calculator

> operator ^ pow right 3;
defined: ^ = pow right 3

> 2 ^ 3 ^ 2;
512                        // right-assoc: 2^(3^2) = 2^9

> x = 2 * 3 ^ 2;
x = 18                     // ^ binds tighter than *
```

### C11 Parser

A complete C11 parser demonstrating:
- Lexer feedback for typedef disambiguation (Jourdan's approach)
- Dynamic precedence collapsing 10+ expression levels into one rule
- Full C11 test suite (41 tests)

```
$ cargo test --example c11
```

The expression grammar:
```rust
// Traditional C grammar: 10+ cascading rules
// Gazelle: one rule with prec terminals
assignment_expression = cast_expression
                      | assignment_expression BINOP assignment_expression
                      | assignment_expression STAR assignment_expression
                      | assignment_expression QUESTION expression COLON assignment_expression;
```

## Usage

```rust
use gazelle::{ParseError, Precedence};
use gazelle_macros::gazelle;

gazelle! {
    grammar MyParser {
        start expr;
        terminals {
            NUM: _,        // terminal with payload
            LPAREN, RPAREN,
            prec OP: _,    // precedence terminal with payload
        }

        expr = expr OP expr => binop
             | NUM => num
             | LPAREN expr RPAREN => paren;
    }
}

struct Eval;

impl my_parser::Types for Eval {
    type Error = ParseError;
    type Num = i32;
    type Op = char;
    type Expr = i32;
}

impl gazelle::Reducer<my_parser::Expr<Self>> for Eval {
    fn reduce(&mut self, node: my_parser::Expr<Self>) -> Result<i32, ParseError> {
        Ok(match node {
            my_parser::Expr::Binop(l, op, r) => match op {
                '+' => l + r, '*' => l * r, _ => 0,
            },
            my_parser::Expr::Num(n) => n,
        })
    }
}

fn main() {
    let mut parser = my_parser::Parser::<Eval>::new();
    let mut actions = Eval;

    // Push tokens (you control the loop)
    parser.push(my_parser::Terminal::Num(2), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Op('+', Precedence::Left(1)), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Num(3), &mut actions).unwrap();

    let result = parser.finish(&mut actions).map_err(|(_, e)| e).unwrap();
    assert_eq!(result, 5);
}
```

## When to Use Gazelle

**Good fit:**
- Languages with user-definable operators or precedence
- C-family languages needing lexer feedback (typedef disambiguation)
- Complex expression grammars you want to simplify
- When you want full IDE support in semantic actions

**Consider alternatives for:**
- Simple formats (JSON, TOML) - nom or pest may be simpler
- Maximum ecosystem maturity - lalrpop

## License

MIT