gazelle-parser 0.3.0

# Building a Calculator: From Simple to C11

This tutorial builds a calculator step by step, starting simple and adding features until we have full C11 expression syntax with user-defined operators. Each step introduces a Gazelle feature to solve a real problem.

## Step 1: A Basic Calculator

Let's start with arithmetic: `+`, `-`, `*`, `/`. We need precedence — `*` binds tighter than `+` — so we use the traditional cascading grammar:

```rust
use gazelle_macros::gazelle;

gazelle! {
    grammar Calc {
        start expr;
        terminals {
            NUM: _,
            PLUS, MINUS, STAR, SLASH,
            LPAREN, RPAREN,
        }

        expr = add_expr => add_expr;

        add_expr = add_expr PLUS mul_expr => add
                 | add_expr MINUS mul_expr => sub
                 | mul_expr => mul_expr;

        mul_expr = mul_expr STAR primary => mul
                 | mul_expr SLASH primary => div
                 | primary => primary;

        primary = NUM => num
                | LPAREN expr RPAREN => paren;
    }
}
```

This works. `1 + 2 * 3` parses as `1 + (2 * 3)` because `mul_expr` is lower in the cascade than `add_expr`.

Every alternative has `=> name`, which generates an enum variant. Untyped terminals (like `LPAREN`, `RPAREN`, `PLUS`) are omitted from variant fields — only typed symbols become fields.

### What Gets Generated

The `gazelle!` macro generates a `calc` module containing several things. First, a `Types` trait and per-node enums:

```rust
trait Types: Sized {
    type Error: From<ParseError>;
    type Num: Debug;
    type AddExpr: Debug;
    type MulExpr: Debug;
    type Primary: Debug;
}

enum AddExpr<A: Types> {
    Add(A::AddExpr, A::MulExpr),
    Sub(A::AddExpr, A::MulExpr),
}

enum MulExpr<A: Types> { Mul(A::MulExpr, A::Primary), Div(A::MulExpr, A::Primary) }
enum Primary<A: Types> { Num(A::Num), Paren(A::Expr) }
```

An `Actions` trait is auto-implemented for any type satisfying `Types` + all `Reducer` bounds. You only write `Reducer` impls for nodes with custom logic — identity (CST), `Box<N>` (auto-boxing), and `Ignore` (discard) are handled by blanket impls.

Second, a terminal enum generic over Types:

```rust
enum Terminal<A: Types> {
    Num(A::Num),
    Plus, Minus, Star, Slash, Lparen, Rparen,
}
```

Third, a parser struct `Parser<A: Types>` with `push` and `finish` methods.

### Implementing the Traits

```rust
use gazelle::{ParseError, Reducer};

struct Eval;

impl calc::Types for Eval {
    type Error = ParseError;
    type Num = i64;
    type AddExpr = i64;
    type MulExpr = i64;
    type Primary = i64;
}

impl Reducer<calc::AddExpr<Self>> for Eval {
    fn reduce(&mut self, node: calc::AddExpr<Self>) -> Result<i64, ParseError> {
        Ok(match node {
            calc::AddExpr::Add(l, r) => l + r,
            calc::AddExpr::Sub(l, r) => l - r,
        })
    }
}

impl Reducer<calc::MulExpr<Self>> for Eval {
    fn reduce(&mut self, node: calc::MulExpr<Self>) -> Result<i64, ParseError> {
        Ok(match node {
            calc::MulExpr::Mul(l, r) => l * r,
            calc::MulExpr::Div(l, r) => l / r,
        })
    }
}

impl Reducer<calc::Primary<Self>> for Eval {
    fn reduce(&mut self, node: calc::Primary<Self>) -> Result<i64, ParseError> {
        Ok(match node {
            calc::Primary::Num(n) => n,
            calc::Primary::Paren(e) => e,
        })
    }
}
```

One `Reducer` impl per non-terminal enum.

### The Lexer

Gazelle provides a `Source` type with composable methods for building lexers. We use its methods to read tokens and map them to our terminal enum:

```rust
use gazelle::lexer::Source;

fn tokenize(input: &str) -> Result<Vec<calc::Terminal<Eval>>, String> {
    let mut src = Source::from_str(input);
    let mut tokens = Vec::new();

    loop {
        src.skip_whitespace();
        if src.at_end() { break; }

        if let Some(span) = src.read_number() {
            let s = &input[span.start..span.end];
            tokens.push(calc::Terminal::Num(s.parse().unwrap()));
        } else if let Some(c) = src.peek() {
            src.advance();
            tokens.push(match c {
                '(' => calc::Terminal::Lparen,
                ')' => calc::Terminal::Rparen,
                '+' => calc::Terminal::Plus,
                '-' => calc::Terminal::Minus,
                '*' => calc::Terminal::Star,
                '/' => calc::Terminal::Slash,
                _ => return Err(format!("unexpected char: {}", c)),
            });
        }
    }
    Ok(tokens)
}
```

`Source` provides methods like `read_number()`, `read_ident()`, `skip_whitespace()` that return `Span` values you can use to extract text from the input.

### Running the Parser

```rust
fn run(input: &str) -> Result<i64, String> {
    let tokens = tokenize(input)?;
    let mut parser = calc::Parser::<Eval>::new();
    let mut actions = Eval;

    for token in tokens {
        parser.push(token, &mut actions).map_err(|e| parser.format_error(&e))?;
    }
    parser.finish(&mut actions).map_err(|(p, e)| p.format_error(&e))
}

fn main() {
    let result = run("1 + 2 * 3").unwrap();
    println!("{}", result);  // prints: 7
}
```

The parser is push-based: you feed tokens one at a time. Each `push` may trigger reductions, calling methods on your `actions` impl. When input is exhausted, `finish` returns the final value — the result of reducing the start symbol (`expr`).

## Step 2: Multiple Statements

Let's allow multiple expressions separated by semicolons.

```rust
gazelle! {
    grammar Calc {
        start stmts;
        terminals {
            NUM: _,
            PLUS, MINUS, STAR, SLASH,
            LPAREN, RPAREN,
            SEMI,
        }

        stmts = stmt*;
        stmt = add_expr SEMI => print;

        add_expr = add_expr PLUS mul_expr => add
                 | add_expr MINUS mul_expr => sub
                 | mul_expr => mul_expr;

        mul_expr = mul_expr STAR primary => mul
                 | mul_expr SLASH primary => div
                 | primary => primary;

        primary = NUM => num
                | LPAREN add_expr RPAREN => paren;
    }
}
```

Gazelle supports modifiers on symbols: `*` (zero or more), `+` (one or more), `?` (optional). Here `stmt*` means zero or more statements. Each `stmt` prints an expression and expects a semicolon.

Since `stmts` is untyped, `finish` returns `Result<(), _>`. The `print` action triggers a `Reducer` call:

```rust
struct Eval;

impl calc::Types for Eval {
    type Error = ParseError;
    type Num = i64;
    type AddExpr = i64;
    type MulExpr = i64;
    type Primary = i64;
}

impl Reducer<calc::Stmt<Self>> for Eval {
    fn reduce(&mut self, node: calc::Stmt<Self>) -> Result<(), ParseError> {
        match node {
            calc::Stmt::Print(e) => { println!("{}", e); Ok(()) }
        }
    }
}

impl Reducer<calc::AddExpr<Self>> for Eval {
    fn reduce(&mut self, node: calc::AddExpr<Self>) -> Result<i64, ParseError> {
        Ok(match node {
            calc::AddExpr::Add(l, r) => l + r,
            calc::AddExpr::Sub(l, r) => l - r,
        })
    }
}

// Similar Reducer impls for MulExpr, Primary...

fn run(input: &str) -> Result<(), String> {
    let tokens = tokenize(input)?;
    let mut parser = calc::Parser::<Eval>::new();
    let mut actions = Eval;

    for token in tokens {
        parser.push(token, &mut actions).map_err(|e| parser.format_error(&e))?;
    }
    parser.finish(&mut actions).map_err(|(p, e)| p.format_error(&e))
}
```

Now users can type several calculations:

```
> 1 + 2; 3 * 4; 5;
3
12
5
```

## Step 3: The Cascade Problem

Our grammar handles 2 precedence levels with 2 non-terminals. C11 has 15 levels. That means 15 non-terminals:

```
assignment_expr → conditional_expr → logical_or_expr → logical_and_expr →
bitwise_or_expr → bitwise_xor_expr → bitwise_and_expr → equality_expr →
relational_expr → shift_expr → additive_expr → multiplicative_expr →
cast_expr → unary_expr → postfix_expr → primary_expr
```

Each level has the same pattern: `this_level OP next_level`. Every rule produces the same semantic result — a value. The repetition is painful and obscures the actual structure: binary operation, two operands, one operator.

## Step 4: Runtime Precedence

Gazelle solves this with precedence terminals. Mark operators with `prec` and attach precedence values in the lexer:

```rust
gazelle! {
    grammar Calc {
        start stmts;
        terminals {
            NUM: _,
            LPAREN, RPAREN,
            SEMI,
            prec BINOP: _,
        }

        stmts = stmts SEMI expr => stmt
              | expr => first
              | _;

        expr = expr BINOP expr => binary
             | primary => primary;

        primary = NUM => num
                | LPAREN expr RPAREN => paren;
    }
}
```

One rule for all binary expressions. The lexer provides precedence:

```rust
fn tokenize(op: &str) -> calc::Terminal<Eval> {
    match op {
        "+" => calc::Terminal::Binop(BinOp::Add, Precedence::Left(11)),
        "-" => calc::Terminal::Binop(BinOp::Sub, Precedence::Left(11)),
        "*" => calc::Terminal::Binop(BinOp::Mul, Precedence::Left(12)),
        "/" => calc::Terminal::Binop(BinOp::Div, Precedence::Left(12)),
        // ...
    }
}
```

Higher numbers bind tighter. `Precedence::Left` for left-associative, `Precedence::Right` for right-associative.

The traits simplify:

```rust
impl calc::Types for Eval {
    type Error = ParseError;
    type Num = i64;
    type Binop = BinOp;
    type Expr = i64;
}

impl Reducer<calc::Expr<Self>> for Eval {
    fn reduce(&mut self, node: calc::Expr<Self>) -> Result<i64, ParseError> {
        Ok(match node {
            calc::Expr::Binary(l, op, r) => match op {
                BinOp::Add => l + r,
                BinOp::Sub => l - r,
                BinOp::Mul => l * r,
                BinOp::Div => l / r,
            },
            calc::Expr::Num(n) => n,
            calc::Expr::Paren(e) => e,
        })
    }
}
```

Now we can add all of C's binary operators by extending the `BinOp` enum and the `tokenize` function. The grammar doesn't change.

## Step 5: Variables

Let's add variables. We need to distinguish lvalues (assignable locations) from rvalues (plain values):

```rust
enum Val {
    Rval(i64),
    Lval(usize),  // index into variable storage
}
```

Update the grammar:

```rust
primary = NUM => num
        | IDENT => var
        | LPAREN expr RPAREN => paren;
```

And add assignment to the operators:

```rust
"=" => calc::Terminal::Binop(BinOp::Assign, Precedence::Right(1)),
```

Assignment is right-associative (`x = y = 5` assigns right-to-left) and lowest precedence.

```rust
impl Reducer<calc::Expr<Self>> for Eval {
    fn reduce(&mut self, node: calc::Expr<Self>) -> Result<Val, ParseError> {
        Ok(match node {
            calc::Expr::Binary(l, op, r) => match op {
                BinOp::Assign => {
                    let v = self.get(r);
                    self.store(l, v)
                }
                BinOp::Add => Val::Rval(self.get(l) + self.get(r)),
                // ...
            },
            calc::Expr::Num(n) => Val::Rval(n),
            calc::Expr::Var(name) => self.lookup(&name),
        })
    }
}
```

Now: `x = 10; y = 20; x + y` → `30`

## Step 6: Unary Operators

C has unary `+`, `-`, `!`, `~`, and the dual-role `*` (dereference) and `&` (address-of). Here's the problem: `*` and `&` are also binary operators (multiply and bitwise AND).

If `STAR` is a `prec BINOP`, we can't use it in unary rules:

```rust
unary_expr = STAR expr => deref   // won't work - STAR is BINOP
```

Solution: declare them as separate precedence terminals:

```rust
terminals {
    // ...
    prec STAR,
    prec AMP,
    prec PLUS,
    prec MINUS,
    prec BINOP: _,
}
```

Now we can use them in unary rules:

```rust
unary_expr = STAR unary_expr => deref
           | AMP unary_expr => addr
           | PLUS unary_expr => uplus
           | MINUS unary_expr => uminus
           | BANG unary_expr => lognot
           | TILDE unary_expr => bitnot
           | postfix_expr => postfix_expr;
```

But wait — now binary expressions don't see `STAR` as an operator. We need to collect all binary operators into one place:

```rust
binary_op = BINOP => binop   // BINOP already has the right type
          | STAR => op_mul
          | AMP => op_bitand
          | PLUS => op_add
          | MINUS => op_sub;

expr = expr binary_op expr => binary
     | unary_expr => unary_expr;
```

When `STAR` (precedence 12) reduces to `binary_op`, the non-terminal inherits that precedence. The parser resolves `1 + 2 * 3` correctly — the `binary_op` carrying `STAR`'s precedence wins over `PLUS`.

All alternatives need `=> name`. The `Reducer` for `BinaryOp` maps each variant to a `Binop` value.

## Step 7: Postfix Expressions

Function calls, array indexing, post-increment/decrement:

```rust
postfix_expr = primary => primary
             | postfix_expr LPAREN RPAREN => call0
             | postfix_expr LPAREN args RPAREN => call
             | postfix_expr LBRACK expr RBRACK => index
             | postfix_expr INC => postinc
             | postfix_expr DEC => postdec;

args = expr => arg1
     | args COMMA expr => arg;
```

Implementation handles function calls — we'll support builtins like `pow(2, 10)`:

```rust
impl Reducer<calc::PostfixExpr<Self>> for Eval {
    fn reduce(&mut self, node: calc::PostfixExpr<Self>) -> Result<Val, ParseError> {
        Ok(match node {
            calc::PostfixExpr::Call(func, args) => {
                let name = self.slot_name(func);
                match name.as_str() {
                    "pow" => {
                        let base = self.get(args[0]);
                        let exp = self.get(args[1]);
                        Val::Rval(base.pow(exp as u32))
                    }
                    "min" => Val::Rval(self.get(args[0]).min(self.get(args[1]))),
                    "max" => Val::Rval(self.get(args[0]).max(self.get(args[1]))),
                    _ => panic!("unknown function: {}", name),
                }
            }
            // ...
        })
    }
}
```

## Step 8: User-Defined Operators

The payoff. Let users define new operators:

```
operator @ pow right 13
2 @ 3 @ 2
```

This defines `@` as a right-associative operator at precedence 13, bound to the `pow` function. Then `2 @ 3 @ 2` computes `2^(3^2) = 2^9 = 512`.

Add statements for operator definition:

```rust
terminals {
    // ...
    LEFT, RIGHT,
}

assoc = LEFT => left | RIGHT => right;

stmt = OPERATOR BINOP IDENT assoc NUM => def_op
     | add_expr SEMI => print;
```

`LEFT` and `RIGHT` are unit terminals — no payload. The `Reducer` impls return the precedence constructor:

```rust
type Assoc = fn(u8) -> Precedence;

impl Reducer<calc::Assoc<Self>> for Eval {
    fn reduce(&mut self, node: calc::Assoc<Self>) -> Result<fn(u8) -> Precedence, ParseError> {
        Ok(match node {
            calc::Assoc::Left => Precedence::Left,
            calc::Assoc::Right => Precedence::Right,
        })
    }
}

impl Reducer<calc::Stmt<Self>> for Eval {
    fn reduce(&mut self, node: calc::Stmt<Self>) -> Result<(), ParseError> {
        match node {
            calc::Stmt::DefOp(op, func, assoc, prec) => {
                if let BinOp::Custom(ch) = op {
                    self.custom_ops.insert(ch, OpDef { func, prec: assoc(prec as u8) });
                }
                Ok(())
            }
            calc::Stmt::Print(e) => { println!("{}", e); Ok(()) }
        }
    }
}
```

The terminals distinguish left from right. The actions provide the behavior.

This shows the power of trait-based semantics: `type Assoc = fn(u8) -> Precedence` uses a function type as the associated type. You're not limited to simple data — any Rust type works.

Now the lexer needs to see this table. This is **lexer feedback** — information flowing from parser back to lexer.

The parse loop makes it natural:

```rust
let mut parser = calc::Parser::<Eval>::new();
let mut actions = Eval::new();

loop {
    // Lexer sees the current custom_ops table
    match tokenizer.next(&actions.custom_ops)? {
        Some(tok) => parser.push(tok, &mut actions)?,
        None => break,
    }
}
```

The lexer consults `custom_ops` to get precedence for unknown single-character operators:

```rust
fn next(&mut self, custom_ops: &HashMap<char, OpDef>) -> Option<Token> {
    // ...
    if s.len() == 1 {
        let ch = s.chars().next().unwrap();
        let prec = custom_ops.get(&ch)
            .map(|d| d.prec)
            .unwrap_or(Precedence::Left(0));
        return Some(Token::Binop(BinOp::Custom(ch), prec));
    }
}
```

An unknown operator gets precedence 0 (lowest) until defined. Once defined, subsequent uses get the registered precedence.

## The Complete Picture

We've built:

1. **Basic arithmetic** with cascading grammar
2. **Statements** for interactive use
3. **Runtime precedence** to collapse the cascade
4. **Variables** with lvalue/rvalue distinction
5. **Unary operators** including dual-role `*` and `&`
6. **Precedence-carrying non-terminals** to unify binary operators
7. **Postfix expressions** for calls and indexing
8. **User-defined operators** with lexer feedback

The final grammar is ~40 lines. It handles the full C11 expression syntax plus user-defined operators. The grammar is clean because precedence lives in tokens, not grammar structure. Lexer feedback works because you control the parse loop.

See `examples/c11_calculator.rs` for the complete implementation.