gazelle-parser 0.2.0

# Building a Calculator: From Simple to C11

This tutorial builds a calculator step by step, starting simple and adding features until we have full C11 expression syntax with user-defined operators. Each step introduces a Gazelle feature to solve a real problem.

## Step 1: A Basic Calculator

Let's start with arithmetic: `+`, `-`, `*`, `/`. We need precedence — `*` binds tighter than `+` — so we use the traditional cascading grammar:

```rust
use gazelle_macros::gazelle;

gazelle! {
    grammar Calc {
        start expr;
        terminals {
            NUM: Num,
            PLUS, MINUS, STAR, SLASH,
            LPAREN, RPAREN,
        }

        expr: Expr = add_expr;

        add_expr: Expr = add_expr PLUS mul_expr @add
                       | add_expr MINUS mul_expr @sub
                       | mul_expr;

        mul_expr: Expr = mul_expr STAR primary @mul
                       | mul_expr SLASH primary @div
                       | primary;

        primary: Expr = NUM @num
                      | LPAREN expr RPAREN;  // passthrough - parens don't transform the value
    }
}
```

This works. `1 + 2 * 3` parses as `1 + (2 * 3)` because `mul_expr` is lower in the cascade than `add_expr`.

All expression non-terminals share the same type (`Expr`). Alternatives without `@action` are passthroughs — the value flows through unchanged, no method call needed. Notice `LPAREN expr RPAREN` has no action: parentheses affect parsing (grouping) but don't transform the value, so no method is required.

### What Gets Generated

The `gazelle!` macro generates several things. First, two traits — one for types, one for actions:

```rust
trait CalcTypes {
    type Num;
    type Expr;
}

trait CalcActions<E: From<ParseError> = ParseError>: CalcTypes {
    fn add(&mut self, l: Self::Expr, r: Self::Expr) -> Result<Self::Expr, E>;
    fn sub(&mut self, l: Self::Expr, r: Self::Expr) -> Result<Self::Expr, E>;
    fn mul(&mut self, l: Self::Expr, r: Self::Expr) -> Result<Self::Expr, E>;
    fn div(&mut self, l: Self::Expr, r: Self::Expr) -> Result<Self::Expr, E>;
    fn num(&mut self, n: Self::Num) -> Result<Self::Expr, E>;
}
```

No `paren` method — the parenthesized expression is a passthrough.

Only two associated types: `Num` for the terminal payload, `Expr` for all expression non-terminals. Passthrough alternatives (without `@action`) don't generate methods. Action methods return `Result` — the error type defaults to `ParseError` but can be customized for actions that can fail with domain errors.

Second, a terminal enum generic over the Actions trait:

```rust
enum CalcTerminal<A: CalcActions> {
    Num(A::Num),
    Plus,
    Minus,
    Star,
    Slash,
    Lparen,
    Rparen,
}
```

Third, a parser struct `CalcParser<A: CalcActions>` with `push` and `finish` methods.

### Implementing the Traits

```rust
struct Eval;

impl CalcTypes for Eval {
    type Num = i64;
    type Expr = i64;
}

impl CalcActions for Eval {
    fn add(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l + r) }
    fn sub(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l - r) }
    fn mul(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l * r) }
    fn div(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l / r) }
    fn num(&mut self, n: i64) -> Result<i64, ParseError> { Ok(n) }
}
```

Five methods — only the operations that transform values.

### The Lexer

Gazelle provides a `Source` type with composable methods for building lexers. We use its methods to read tokens and map them to our terminal enum:

```rust
use gazelle::lexer::Source;

fn tokenize(input: &str) -> Result<Vec<CalcTerminal<Eval>>, String> {
    let mut src = Source::from_str(input);
    let mut tokens = Vec::new();

    loop {
        src.skip_whitespace();
        if src.at_end() { break; }

        if let Some(span) = src.read_number() {
            let s = &input[span.start..span.end];
            tokens.push(CalcTerminal::Num(s.parse().unwrap()));
        } else if let Some(c) = src.peek() {
            src.advance();
            tokens.push(match c {
                '(' => CalcTerminal::Lparen,
                ')' => CalcTerminal::Rparen,
                '+' => CalcTerminal::Plus,
                '-' => CalcTerminal::Minus,
                '*' => CalcTerminal::Star,
                '/' => CalcTerminal::Slash,
                _ => return Err(format!("unexpected char: {}", c)),
            });
        }
    }
    Ok(tokens)
}
```

`Source` provides methods like `read_number()`, `read_ident()`, `skip_whitespace()` that return `Span` values you can use to extract text from the input.

### Running the Parser

```rust
fn run(input: &str) -> Result<i64, String> {
    let tokens = tokenize(input)?;
    let mut parser = CalcParser::<Eval>::new();
    let mut actions = Eval;

    for token in tokens {
        parser.push(token, &mut actions).map_err(|e| parser.format_error(&e))?;
    }
    parser.finish(&mut actions).map_err(|(p, e)| p.format_error(&e))
}

fn main() {
    let result = run("1 + 2 * 3").unwrap();
    println!("{}", result);  // prints: 7
}
```

The parser is push-based: you feed tokens one at a time. Each `push` may trigger reductions, calling methods on your `actions` impl. When input is exhausted, `finish` returns the final value — the result of reducing the start symbol (`expr`).

## Step 2: Multiple Statements

Let's allow multiple expressions separated by semicolons. We can simplify: the passthrough `expr = add_expr` is unnecessary since they share the same type.

```rust
gazelle! {
    grammar Calc {
        start stmts;
        terminals {
            NUM: Num,
            PLUS, MINUS, STAR, SLASH,
            LPAREN, RPAREN,
            SEMI,
        }

        stmts = stmt*;
        stmt = add_expr @print SEMI;

        add_expr: Expr = add_expr PLUS mul_expr @add
                       | add_expr MINUS mul_expr @sub
                       | mul_expr;

        mul_expr: Expr = mul_expr STAR primary @mul
                       | mul_expr SLASH primary @div
                       | primary;

        primary: Expr = NUM @num
                      | LPAREN add_expr RPAREN;
    }
}
```

Gazelle supports modifiers on symbols: `*` (zero or more), `+` (one or more), `?` (optional). Here `stmt*` means zero or more statements. Each `stmt` prints an expression and expects a semicolon.

Since `stmts` is untyped, `finish` returns `Result<(), _>`. The `print` action just prints directly:

```rust
struct Eval;

impl CalcTypes for Eval {
    type Num = i64;
    type Expr = i64;
}

impl CalcActions for Eval {
    fn print(&mut self, e: i64) -> Result<(), ParseError> { println!("{}", e); Ok(()) }
    fn add(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l + r) }
    fn sub(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l - r) }
    fn mul(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l * r) }
    fn div(&mut self, l: i64, r: i64) -> Result<i64, ParseError> { Ok(l / r) }
    fn num(&mut self, n: i64) -> Result<i64, ParseError> { Ok(n) }
}

fn run(input: &str) -> Result<(), String> {
    let tokens = tokenize(input)?;
    let mut parser = CalcParser::<Eval>::new();
    let mut actions = Eval;

    for token in tokens {
        parser.push(token, &mut actions).map_err(|e| parser.format_error(&e))?;
    }
    parser.finish(&mut actions).map_err(|(p, e)| p.format_error(&e))
}
```

Now users can type several calculations:

```
> 1 + 2; 3 * 4; 5;
3
12
5
```

## Step 3: The Cascade Problem

Our grammar handles 2 precedence levels with 2 non-terminals. C11 has 15 levels. That means 15 non-terminals:

```
assignment_expr → conditional_expr → logical_or_expr → logical_and_expr →
bitwise_or_expr → bitwise_xor_expr → bitwise_and_expr → equality_expr →
relational_expr → shift_expr → additive_expr → multiplicative_expr →
cast_expr → unary_expr → postfix_expr → primary_expr
```

Each level has the same pattern: `this_level OP next_level`. Every rule produces the same semantic result — a value. The repetition is painful and obscures the actual structure: binary operation, two operands, one operator.

## Step 4: Runtime Precedence

Gazelle solves this with precedence terminals. Mark operators with `prec` and attach precedence values in the lexer:

```rust
gazelle! {
    grammar Calc {
        start stmts;
        terminals {
            NUM: Num,
            LPAREN, RPAREN,
            SEMI,
            prec BINOP: Binop,
        }

        stmts = stmts SEMI expr @stmt
              | expr @first
              | _;

        expr: Expr = expr BINOP expr @binary
                   | primary;

        primary: Expr = NUM @num
                      | LPAREN expr RPAREN;  // passthrough - same type flows through
    }
}
```

One rule for all binary expressions. The lexer provides precedence:

```rust
fn tokenize(op: &str) -> CalcTerminal {
    match op {
        "+" => CalcTerminal::Binop(BinOp::Add, Precedence::Left(11)),
        "-" => CalcTerminal::Binop(BinOp::Sub, Precedence::Left(11)),
        "*" => CalcTerminal::Binop(BinOp::Mul, Precedence::Left(12)),
        "/" => CalcTerminal::Binop(BinOp::Div, Precedence::Left(12)),
        // ...
    }
}
```

Higher numbers bind tighter. `Precedence::Left` for left-associative, `Precedence::Right` for right-associative.

The trait simplifies:

```rust
impl CalcTypes for Eval {
    type Num = i64;
    type Binop = BinOp;
    type Expr = i64;
}

impl CalcActions for Eval {
    fn binary(&mut self, l: i64, op: BinOp, r: i64) -> Result<i64, ParseError> {
        Ok(match op {
            BinOp::Add => l + r,
            BinOp::Sub => l - r,
            BinOp::Mul => l * r,
            BinOp::Div => l / r,
        })
    }
    fn num(&mut self, n: i64) -> Result<i64, ParseError> { Ok(n) }
    // No paren method needed - passthrough!
}
```

Now we can add all of C's binary operators by extending the `BinOp` enum and the `tokenize` function. The grammar doesn't change.

## Step 5: Variables

Let's add variables. We need to distinguish lvalues (assignable locations) from rvalues (plain values):

```rust
enum Val {
    Rval(i64),
    Lval(usize),  // index into variable storage
}
```

Update the grammar:

```rust
primary = NUM @num
        | IDENT @var
        | LPAREN expr RPAREN @paren;
```

And add assignment to the operators:

```rust
"=" => CalcTerminal::Binop(BinOp::Assign, Precedence::right(1)),
```

Assignment is right-associative (`x = y = 5` assigns right-to-left) and lowest precedence.

```rust
fn binary(&mut self, l: Val, op: BinOp, r: Val) -> Result<Val, ParseError> {
    Ok(match op {
        BinOp::Assign => {
            let v = self.get(r);
            self.store(l, v)
        }
        BinOp::Add => Val::Rval(self.get(l) + self.get(r)),
        // ...
    })
}
```

Now: `x = 10; y = 20; x + y` → `30`

## Step 6: Unary Operators

C has unary `+`, `-`, `!`, `~`, and the dual-role `*` (dereference) and `&` (address-of). Here's the problem: `*` and `&` are also binary operators (multiply and bitwise AND).

If `STAR` is a `prec BINOP`, we can't use it in unary rules:

```rust
unary_expr = STAR expr @deref   // won't work - STAR is BINOP
```

Solution: declare them as separate precedence terminals:

```rust
terminals {
    // ...
    prec STAR,
    prec AMP,
    prec PLUS,
    prec MINUS,
    prec BINOP: BinOp,
}
```

Now we can use them in unary rules:

```rust
unary_expr: Expr = STAR unary_expr @deref
                 | AMP unary_expr @addr
                 | PLUS unary_expr @uplus
                 | MINUS unary_expr @uminus
                 | BANG unary_expr @lognot
                 | TILDE unary_expr @bitnot
                 | postfix_expr;
```

But wait — now binary expressions don't see `STAR` as an operator. We need to collect all binary operators into one place:

```rust
binary_op: Binop = BINOP            // passthrough - BINOP already has type Binop
                 | STAR @op_mul
                 | AMP @op_bitand
                 | PLUS @op_add
                 | MINUS @op_sub;

expr: Expr = expr binary_op expr @binary
           | unary_expr;
```

When `STAR` (precedence 12) reduces to `binary_op`, the non-terminal inherits that precedence. The parser resolves `1 + 2 * 3` correctly — the `binary_op` carrying `STAR`'s precedence wins over `PLUS`.

Note that `BINOP` is a passthrough — it already has type `Binop`, so no action method is needed. The other operators (`STAR`, `AMP`, etc.) are untyped precedence terminals, so they need action methods to produce a `Binop` value.

## Step 7: Postfix Expressions

Function calls, array indexing, post-increment/decrement:

```rust
postfix_expr = primary @primary
             | postfix_expr LPAREN RPAREN @call0
             | postfix_expr LPAREN args RPAREN @call
             | postfix_expr LBRACK expr RBRACK @index
             | postfix_expr INC @postinc
             | postfix_expr DEC @postdec;

args = expr @arg1
     | args COMMA expr @arg;
```

Implementation handles function calls — we'll support builtins like `pow(2, 10)`:

```rust
fn call(&mut self, func: Val, args: Vec<Val>) -> Result<Val, ParseError> {
    let name = self.slot_name(func);
    Ok(match name.as_str() {
        "pow" => {
            let base = self.get(args[0]);
            let exp = self.get(args[1]);
            Val::Rval(base.pow(exp as u32))
        }
        "min" => Val::Rval(self.get(args[0]).min(self.get(args[1]))),
        "max" => Val::Rval(self.get(args[0]).max(self.get(args[1]))),
        _ => panic!("unknown function: {}", name),
    })
}
```

## Step 8: User-Defined Operators

The payoff. Let users define new operators:

```
operator @ pow right 13
2 @ 3 @ 2
```

This defines `@` as a right-associative operator at precedence 13, bound to the `pow` function. Then `2 @ 3 @ 2` computes `2^(3^2) = 2^9 = 512`.

Add statements for operator definition:

```rust
terminals {
    // ...
    LEFT, RIGHT,
}

assoc: Assoc = LEFT @left | RIGHT @right;

stmt = OPERATOR BINOP IDENT assoc NUM @def_op
     | add_expr @print SEMI;
```

`LEFT` and `RIGHT` are unit terminals — no payload. The actions return the precedence constructor:

```rust
type Assoc = fn(u8) -> Precedence;

fn left(&mut self) -> Result<fn(u8) -> Precedence, ParseError> { Ok(Precedence::left) }
fn right(&mut self) -> Result<fn(u8) -> Precedence, ParseError> { Ok(Precedence::right) }

fn def_op(&mut self, op: BinOp, func: String, assoc: fn(u8) -> Precedence, prec: i64) -> Result<(), ParseError> {
    if let BinOp::Custom(ch) = op {
        self.custom_ops.insert(ch, OpDef { func, prec: assoc(prec as u8) });
    }
    Ok(())
}
```

The terminals distinguish left from right. The actions provide the behavior.

This shows the power of trait-based semantics: `type Assoc = fn(u8) -> Precedence` uses a function type as the associated type. You're not limited to simple data — any Rust type works.

Now the lexer needs to see this table. This is **lexer feedback** — information flowing from parser back to lexer.

The parse loop makes it natural:

```rust
let mut parser = CalcParser::new();
let mut actions = Eval::new();

loop {
    // Lexer sees the current custom_ops table
    match tokenizer.next(&actions.custom_ops)? {
        Some(tok) => parser.push(tok, &mut actions)?,
        None => break,
    }
}
```

The lexer consults `custom_ops` to get precedence for unknown single-character operators:

```rust
fn next(&mut self, custom_ops: &HashMap<char, OpDef>) -> Option<Token> {
    // ...
    if s.len() == 1 {
        let ch = s.chars().next().unwrap();
        let prec = custom_ops.get(&ch)
            .map(|d| d.prec)
            .unwrap_or(Precedence::left(0));
        return Some(Token::Binop(BinOp::Custom(ch), prec));
    }
}
```

An unknown operator gets precedence 0 (lowest) until defined. Once defined, subsequent uses get the registered precedence.

## The Complete Picture

We've built:

1. **Basic arithmetic** with cascading grammar
2. **Statements** for interactive use
3. **Runtime precedence** to collapse the cascade
4. **Variables** with lvalue/rvalue distinction
5. **Unary operators** including dual-role `*` and `&`
6. **Precedence-carrying non-terminals** to unify binary operators
7. **Postfix expressions** for calls and indexing
8. **User-defined operators** with lexer feedback

The final grammar is ~40 lines. It handles the full C11 expression syntax plus user-defined operators. The grammar is clean because precedence lives in tokens, not grammar structure. Lexer feedback works because you control the parse loop.

See `examples/c11_calculator.rs` for the complete implementation.