Gazelle

An LR parser generator for Rust with runtime operator precedence and natural lexer feedback.

Getting Started

Add both crates to your Cargo.toml:

[dependencies]
gazelle-parser = "0.9"
gazelle-macros = "0.9"

gazelle-parser provides the runtime (parser, tokens, precedence, error handling). gazelle-macros provides the gazelle! proc macro that generates parser code from your grammar. Both are required for typical use.

For runtime-only usage (dynamic grammars without the macro), gazelle-parser alone is sufficient — see Parser Generator as a Library.

What Makes Gazelle Different

1. Runtime Operator Precedence

Traditional grammars encode precedence through structure - one rule per level:

expr: add_expr;
add_expr: add_expr '+' mul_expr | mul_expr;
mul_expr: mul_expr '*' unary_expr | unary_expr;
// ... 10 more levels for a full language

Gazelle's prec terminals carry precedence at runtime:

gazelle! {
    grammar Calc {
        start expr;
        terminals {
            NUM: _,
            prec OP: _,  // precedence attached to each token
        }
        expr = expr OP expr => binop | NUM => literal;
    }
}

One rule. The lexer provides precedence per token:

'+' => calc::Terminal::Op('+', Precedence::Left(1)),
'*' => calc::Terminal::Op('*', Precedence::Left(2)),

This enables user-defined operators at runtime - see examples/c11_calculator.rs.

2. Natural Lexer Feedback

The infamous C typedef problem: is T * x a multiplication or pointer declaration? The lexer needs parser state to decide.

Traditional parsers hide the parse loop, requiring globals or hacks. Gazelle uses a push-based API - you drive the loop:

loop {
    let token = lexer.next(&actions.ctx)?;  // lexer sees current state
    parser.push(token, &mut actions)?;       // actions update state
    // next iteration: lexer sees updated state
}

No magic. The lexer and parser share state through actions, and you control when each runs. See examples/c11/ for a complete C11 parser using this for typedef disambiguation.

3. Clean Separation of Grammar and Actions

The grammar is pure grammar:

gazelle! {
    grammar Calc {
        terminals { NUM: _, prec OP: _, LPAREN, RPAREN }

        expr = expr OP expr => binop
             | NUM => literal
             | LPAREN expr RPAREN => paren;
    }
}

Actions are split into an ErrorType trait, a Types trait, and per-node Action impls:

impl gazelle::ErrorType for Evaluator {
    type Error = String;
}

impl calc::Types for Evaluator {
    type Num = f64;
    type Op = char;
    type Expr = f64;
}

impl gazelle::Action<calc::Expr<Self>> for Evaluator {
    fn build(&mut self, node: calc::Expr<Self>) -> Result<f64, Self::Error> {
        match node {
            calc::Expr::Binop(left, op, right) => match op {
                '+' => Ok(left + right),
                '*' => Ok(left * right),
                _ => Err(format!("unknown op: {op}")),
            },
            calc::Expr::Literal(n) => Ok(n),
            calc::Expr::Paren(e) => Ok(e),
        }
    }
}

Action methods return Result — when actions are infallible, use type Error = core::convert::Infallible.

This gives you:

Full IDE support in action code (autocomplete, type hints, go-to-definition)
Compile errors point to your code, not generated code
Multiple implementations (interpreter, AST builder, pretty-printer)
Grammars reusable across different backends

4. CST to AST — You Choose

Parser generators typically force a choice: either you get a concrete syntax tree (CST) that you walk and transform, or you write untyped action callbacks and build your own result from scratch.

This reflects a real distinction, but not the one people usually describe. The conventional definition — "a CST keeps all tokens, an AST drops some" — confuses a consequence for a cause. A better way to think about it:

A CST is a concrete tree: an actual data structure with nodes and children, faithfully representing the grammar's productions.
An AST is the same structural information represented abstractly — through a sequence of typed reduction calls rather than a materialized tree.

When the tree is abstract, you only capture what you care about. A calculator folds Add(2, 3) straight into 5.0 — the parentheses, the operator token, even the child nodes were never stored because there was never a tree to store them in. The information loss traditionally associated with ASTs isn't a design choice; it's what naturally happens when you only materialize what you need.

Gazelle makes this a smooth continuum rather than a binary choice. The gazelle! macro generates an enum for each nonterminal whose variants mirror the grammar's alternatives, but the variant fields are generic — their types come from your Types trait implementation:

// Generated by gazelle! — variants match the grammar, types are yours
pub enum Expr<A: Types> {
    Binop(A::Expr, A::Op, A::Expr),
    Literal(A::Num),
    Paren(A::Expr),
}

Set the associated type to the generated enum itself, and a blanket impl kicks in — you get a full CST with zero action code:

impl calc::Types for CstBuilder {
    type Expr = Box<calc::Expr<Self>>;  // Box for recursive types
    type Num = f64;
    type Op = char;
    // No Action impl needed — blanket impl auto-boxes the node
}

Set it to f64, and you write a reducer that folds values during parsing — a fully custom AST (or no tree at all):

impl calc::Types for Evaluator {
    type Expr = f64;  // No tree, just a number
    type Num = f64;
    type Op = char;
}

Set it to Ignore, and the node is discarded — useful for validation-only parsing. Mix and match freely across nonterminals: auto-box some, evaluate others, ignore the rest. The same grammar, the same generated enums — the representation depends entirely on what types you plug in.

5. Declarative Conflict Resolution

For grammars with intentional ambiguities, terminal modifiers resolve S/R conflicts at the grammar level:

terminals {
    shift ELSE,         // dangling else: always shift
    prec OP: _,         // operators: runtime precedence
    conflict TOK: _,    // caller decides per-token
}

shift and reduce bake the decision into the parse table — zero runtime cost. prec and conflict defer to per-token resolution at parse time. No %expect hacks — the intent is explicit in the grammar.

6. Conflict Diagnostics with Examples

When a grammar has shift/reduce or reduce/reduce conflicts, Gazelle shows concrete example inputs with two bracketings — one for each parse:

Shift/reduce conflict on 'ELSE':
  Shift:  stmt -> IF COND THEN stmt • ELSE stmt (wins)
  Reduce: stmt -> IF COND THEN stmt •
  Example: IF COND THEN IF COND THEN stmt • ELSE stmt
  Shift:  IF COND THEN IF COND THEN (stmt ELSE stmt)
  Reduce: IF COND THEN (IF COND THEN stmt) ELSE stmt (reduce to stmt)

The example is found by BFS on the raw DFA — shortest viable prefix to the conflict state, then a joint suffix search that drives both parser interpretations to acceptance. The dot marks the conflict point. When the grammar is LR(k>1) rather than ambiguous (no single string has two parses), separate examples are shown for each interpretation.

7. Parser Generator as a Library

Most parser generators are build tools. Gazelle exposes table construction as a library:

use gazelle::{parse_grammar, ErrorContext};
use gazelle::table::CompiledTable;
use gazelle::runtime::{Parser, Token};

let grammar = parse_grammar(r#"
    start expr;
    terminals { NUM, PLUS, STAR }
    expr = expr PLUS term => add | term => term;
    term = term STAR factor => mul | factor => factor;
    factor = NUM => num;
"#).unwrap();

// Build tables and parse
let compiled = CompiledTable::build(&grammar).unwrap();
let mut parser = Parser::new(compiled.table());

let num_id = compiled.symbol_id("NUM").unwrap();
// ... push tokens with parser.maybe_reduce() and parser.shift()

Enables grammar analyzers, conflict debuggers, or parsers for dynamic grammars. See examples/runtime_grammar.rs for complete examples.

Examples

Calculator with User-Defined Operators

$ cargo run --example c11_calculator

> operator ^ pow right 3;
defined: ^ = pow right 3

> 2 ^ 3 ^ 2;
512                        // right-assoc: 2^(3^2) = 2^9

> x = 2 * 3 ^ 2;
x = 18                     // ^ binds tighter than *

C11 Parser

A complete C11 parser demonstrating:

Lexer feedback for typedef disambiguation (Jourdan's approach)
Dynamic precedence collapsing 10+ expression levels into one rule
Full C11 test suite (41 tests)

$ cargo test --example c11

The expression grammar:

// Traditional C grammar: 10+ cascading rules
// Gazelle: one rule with prec terminals
assignment_expression = cast_expression
                      | assignment_expression BINOP assignment_expression
                      | assignment_expression STAR assignment_expression
                      | assignment_expression QUESTION expression COLON assignment_expression;

Usage

use gazelle::Precedence;
use gazelle_macros::gazelle;

gazelle! {
    grammar MyParser {
        start expr;
        terminals {
            NUM: _,        // terminal with payload
            LPAREN, RPAREN,
            prec OP: _,    // precedence terminal with payload
        }

        expr = expr OP expr => binop
             | NUM => num
             | LPAREN expr RPAREN => paren;
    }
}

struct Eval;

impl gazelle::ErrorType for Eval {
    type Error = core::convert::Infallible;
}

impl my_parser::Types for Eval {
    type Num = i32;
    type Op = char;
    type Expr = i32;
}

impl gazelle::Action<my_parser::Expr<Self>> for Eval {
    fn build(&mut self, node: my_parser::Expr<Self>) -> Result<i32, Self::Error> {
        Ok(match node {
            my_parser::Expr::Binop(l, op, r) => match op {
                '+' => l + r, '*' => l * r, _ => 0,
            },
            my_parser::Expr::Num(n) => n,
            my_parser::Expr::Paren(e) => e,
        })
    }
}

fn main() {
    let mut parser = my_parser::Parser::<Eval>::new();
    let mut actions = Eval;

    // Push tokens (you control the loop)
    parser.push(my_parser::Terminal::Num(2), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Op('+', Precedence::Left(1)), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Num(3), &mut actions).unwrap();

    let result = parser.finish(&mut actions).map_err(|(_, e)| e).unwrap();
    assert_eq!(result, 5);
}

When to Use Gazelle

Good fit:

Languages with user-definable operators or precedence
C-family languages needing lexer feedback (typedef disambiguation)
Complex expression grammars you want to simplify
When you want full IDE support in semantic actions

Consider alternatives for:

Simple formats (JSON, TOML) - nom or pest may be simpler
Maximum ecosystem maturity - lalrpop or tree-sitter

License

MIT

gazelle-parser 0.9.3