gazelle-parser 0.9.3

LR parser generator with runtime operator precedence and natural lexer feedback
Documentation

Gazelle

An LR parser generator for Rust with runtime operator precedence and natural lexer feedback.

Getting Started

Add both crates to your Cargo.toml:

[dependencies]
gazelle-parser = "0.9"
gazelle-macros = "0.9"

gazelle-parser provides the runtime (parser, tokens, precedence, error handling). gazelle-macros provides the gazelle! proc macro that generates parser code from your grammar. Both are required for typical use.

For runtime-only usage (dynamic grammars without the macro), gazelle-parser alone is sufficient — see Parser Generator as a Library.

What Makes Gazelle Different

1. Runtime Operator Precedence

Traditional grammars encode precedence through structure - one rule per level:

expr: add_expr;
add_expr: add_expr '+' mul_expr | mul_expr;
mul_expr: mul_expr '*' unary_expr | unary_expr;
// ... 10 more levels for a full language

Gazelle's prec terminals carry precedence at runtime:

gazelle! {
    grammar Calc {
        start expr;
        terminals {
            NUM: _,
            prec OP: _,  // precedence attached to each token
        }
        expr = expr OP expr => binop | NUM => literal;
    }
}

One rule. The lexer provides precedence per token:

'+' => calc::Terminal::Op('+', Precedence::Left(1)),
'*' => calc::Terminal::Op('*', Precedence::Left(2)),

This enables user-defined operators at runtime - see examples/c11_calculator.rs.

2. Natural Lexer Feedback

The infamous C typedef problem: is T * x a multiplication or pointer declaration? The lexer needs parser state to decide.

Traditional parsers hide the parse loop, requiring globals or hacks. Gazelle uses a push-based API - you drive the loop:

loop {
    let token = lexer.next(&actions.ctx)?;  // lexer sees current state
    parser.push(token, &mut actions)?;       // actions update state
    // next iteration: lexer sees updated state
}

No magic. The lexer and parser share state through actions, and you control when each runs. See examples/c11/ for a complete C11 parser using this for typedef disambiguation.

3. Clean Separation of Grammar and Actions

The grammar is pure grammar:

gazelle! {
    grammar Calc {
        terminals { NUM: _, prec OP: _, LPAREN, RPAREN }

        expr = expr OP expr => binop
             | NUM => literal
             | LPAREN expr RPAREN => paren;
    }
}

Actions are split into an ErrorType trait, a Types trait, and per-node Action impls:

impl gazelle::ErrorType for Evaluator {
    type Error = String;
}

impl calc::Types for Evaluator {
    type Num = f64;
    type Op = char;
    type Expr = f64;
}

impl gazelle::Action<calc::Expr<Self>> for Evaluator {
    fn build(&mut self, node: calc::Expr<Self>) -> Result<f64, Self::Error> {
        match node {
            calc::Expr::Binop(left, op, right) => match op {
                '+' => Ok(left + right),
                '*' => Ok(left * right),
                _ => Err(format!("unknown op: {op}")),
            },
            calc::Expr::Literal(n) => Ok(n),
            calc::Expr::Paren(e) => Ok(e),
        }
    }
}

Action methods return Result — when actions are infallible, use type Error = core::convert::Infallible.

This gives you:

  • Full IDE support in action code (autocomplete, type hints, go-to-definition)
  • Compile errors point to your code, not generated code
  • Multiple implementations (interpreter, AST builder, pretty-printer)
  • Grammars reusable across different backends

4. CST to AST — You Choose

Parser generators typically force a choice: either you get a concrete syntax tree (CST) that you walk and transform, or you write untyped action callbacks and build your own result from scratch.

This reflects a real distinction, but not the one people usually describe. The conventional definition — "a CST keeps all tokens, an AST drops some" — confuses a consequence for a cause. A better way to think about it:

  • A CST is a concrete tree: an actual data structure with nodes and children, faithfully representing the grammar's productions.
  • An AST is the same structural information represented abstractly — through a sequence of typed reduction calls rather than a materialized tree.

When the tree is abstract, you only capture what you care about. A calculator folds Add(2, 3) straight into 5.0 — the parentheses, the operator token, even the child nodes were never stored because there was never a tree to store them in. The information loss traditionally associated with ASTs isn't a design choice; it's what naturally happens when you only materialize what you need.

Gazelle makes this a smooth continuum rather than a binary choice. The gazelle! macro generates an enum for each nonterminal whose variants mirror the grammar's alternatives, but the variant fields are generic — their types come from your Types trait implementation:

// Generated by gazelle! — variants match the grammar, types are yours
pub enum Expr<A: Types> {
    Binop(A::Expr, A::Op, A::Expr),
    Literal(A::Num),
    Paren(A::Expr),
}

Set the associated type to the generated enum itself, and a blanket impl kicks in — you get a full CST with zero action code:

impl calc::Types for CstBuilder {
    type Expr = Box<calc::Expr<Self>>;  // Box for recursive types
    type Num = f64;
    type Op = char;
    // No Action impl needed — blanket impl auto-boxes the node
}

Set it to f64, and you write a reducer that folds values during parsing — a fully custom AST (or no tree at all):

impl calc::Types for Evaluator {
    type Expr = f64;  // No tree, just a number
    type Num = f64;
    type Op = char;
}

Set it to Ignore, and the node is discarded — useful for validation-only parsing. Mix and match freely across nonterminals: auto-box some, evaluate others, ignore the rest. The same grammar, the same generated enums — the representation depends entirely on what types you plug in.

5. Declarative Conflict Resolution

For grammars with intentional ambiguities, terminal modifiers resolve S/R conflicts at the grammar level:

terminals {
    shift ELSE,         // dangling else: always shift
    prec OP: _,         // operators: runtime precedence
    conflict TOK: _,    // caller decides per-token
}

shift and reduce bake the decision into the parse table — zero runtime cost. prec and conflict defer to per-token resolution at parse time. No %expect hacks — the intent is explicit in the grammar.

6. Conflict Diagnostics with Examples

When a grammar has shift/reduce or reduce/reduce conflicts, Gazelle shows concrete example inputs with two bracketings — one for each parse:

Shift/reduce conflict on 'ELSE':
  Shift:  stmt -> IF COND THEN stmt • ELSE stmt (wins)
  Reduce: stmt -> IF COND THEN stmt •
  Example: IF COND THEN IF COND THEN stmt • ELSE stmt
  Shift:  IF COND THEN IF COND THEN (stmt ELSE stmt)
  Reduce: IF COND THEN (IF COND THEN stmt) ELSE stmt (reduce to stmt)

The example is found by BFS on the raw DFA — shortest viable prefix to the conflict state, then a joint suffix search that drives both parser interpretations to acceptance. The dot marks the conflict point. When the grammar is LR(k>1) rather than ambiguous (no single string has two parses), separate examples are shown for each interpretation.

7. Parser Generator as a Library

Most parser generators are build tools. Gazelle exposes table construction as a library:

use gazelle::{parse_grammar, ErrorContext};
use gazelle::table::CompiledTable;
use gazelle::runtime::{Parser, Token};

let grammar = parse_grammar(r#"
    start expr;
    terminals { NUM, PLUS, STAR }
    expr = expr PLUS term => add | term => term;
    term = term STAR factor => mul | factor => factor;
    factor = NUM => num;
"#).unwrap();

// Build tables and parse
let compiled = CompiledTable::build(&grammar).unwrap();
let mut parser = Parser::new(compiled.table());

let num_id = compiled.symbol_id("NUM").unwrap();
// ... push tokens with parser.maybe_reduce() and parser.shift()

Enables grammar analyzers, conflict debuggers, or parsers for dynamic grammars. See examples/runtime_grammar.rs for complete examples.

Examples

Calculator with User-Defined Operators

$ cargo run --example c11_calculator

> operator ^ pow right 3;
defined: ^ = pow right 3

> 2 ^ 3 ^ 2;
512                        // right-assoc: 2^(3^2) = 2^9

> x = 2 * 3 ^ 2;
x = 18                     // ^ binds tighter than *

C11 Parser

A complete C11 parser demonstrating:

  • Lexer feedback for typedef disambiguation (Jourdan's approach)
  • Dynamic precedence collapsing 10+ expression levels into one rule
  • Full C11 test suite (41 tests)
$ cargo test --example c11

The expression grammar:

// Traditional C grammar: 10+ cascading rules
// Gazelle: one rule with prec terminals
assignment_expression = cast_expression
                      | assignment_expression BINOP assignment_expression
                      | assignment_expression STAR assignment_expression
                      | assignment_expression QUESTION expression COLON assignment_expression;

Usage

use gazelle::Precedence;
use gazelle_macros::gazelle;

gazelle! {
    grammar MyParser {
        start expr;
        terminals {
            NUM: _,        // terminal with payload
            LPAREN, RPAREN,
            prec OP: _,    // precedence terminal with payload
        }

        expr = expr OP expr => binop
             | NUM => num
             | LPAREN expr RPAREN => paren;
    }
}

struct Eval;

impl gazelle::ErrorType for Eval {
    type Error = core::convert::Infallible;
}

impl my_parser::Types for Eval {
    type Num = i32;
    type Op = char;
    type Expr = i32;
}

impl gazelle::Action<my_parser::Expr<Self>> for Eval {
    fn build(&mut self, node: my_parser::Expr<Self>) -> Result<i32, Self::Error> {
        Ok(match node {
            my_parser::Expr::Binop(l, op, r) => match op {
                '+' => l + r, '*' => l * r, _ => 0,
            },
            my_parser::Expr::Num(n) => n,
            my_parser::Expr::Paren(e) => e,
        })
    }
}

fn main() {
    let mut parser = my_parser::Parser::<Eval>::new();
    let mut actions = Eval;

    // Push tokens (you control the loop)
    parser.push(my_parser::Terminal::Num(2), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Op('+', Precedence::Left(1)), &mut actions).unwrap();
    parser.push(my_parser::Terminal::Num(3), &mut actions).unwrap();

    let result = parser.finish(&mut actions).map_err(|(_, e)| e).unwrap();
    assert_eq!(result, 5);
}

When to Use Gazelle

Good fit:

  • Languages with user-definable operators or precedence
  • C-family languages needing lexer feedback (typedef disambiguation)
  • Complex expression grammars you want to simplify
  • When you want full IDE support in semantic actions

Consider alternatives for:

  • Simple formats (JSON, TOML) - nom or pest may be simpler
  • Maximum ecosystem maturity - lalrpop or tree-sitter

License

MIT