# Gazelle Reference
Complete reference for the Gazelle parser generator.
## Project Status
**Working and tested:**
- Grammar definition (`.gzl` files and `gazelle!` macro)
- Minimal LR table generation
- Type-safe parser generation with `Types`/`Reducer` traits
- Precedence terminals (`prec`) for runtime operator precedence
- Modifiers: `?` (optional), `*` (zero+), `+` (one+), `%` (separated list)
- Expected conflict declarations (`expect N rr/sr`)
- Token range tracking for source spans
- Push-based parsing (you control the loop)
- Detailed error messages with parser state context
- Automatic error recovery (Dijkstra-based minimum-cost repair)
**Not yet implemented:**
- Debug dump via `GAZELLE_DEBUG` env var
- Grammar visualization (FIRST/FOLLOW sets)
**Tested on:**
- Calculator with user-defined operators
- C11 parser (complete grammar, 18 tests)
## Table of Contents
- [How It Works](#how-it-works)
- [Grammar Syntax](#grammar-syntax)
- [The gazelle! Macro](#the-grammar-macro)
- [Generated Types](#generated-types)
- [Using the Parser](#using-the-parser)
- [Advanced Features](#advanced-features)
---
## How It Works
Most parser generators either embed semantic actions in the grammar (yacc-style `$$` / `$1`) or produce a generic, type-erased concrete syntax tree you have to walk later. Gazelle does neither. It produces an abstract syntax tree — abstract because it's not presented as a data structure, but as a **pattern of calls**.
The AST nodes are enum variants that arise naturally from the grammar — one enum per non-terminal, one variant per alternative. During parsing, every time a rule is reduced, Gazelle calls your `Reducer` with a node containing the children's already-reduced values. This sequence of calls *is* a post-order traversal of the parse tree — the tree is abstractly present without ever being materialized.
You provide the (possibly stateful) mapping from each node to its output via the `Reducer` trait. A `Types` trait declares the output type per symbol. What you return is up to you: a computed value, a tree node, or nothing at all.
### Direct evaluation
Since reductions see already-reduced children, you can fold the tree into values as it's parsed — no intermediate tree:
```rust
impl calc::Types for Evaluator {
type Error = ParseError;
type Num = f64;
type Op = char;
type Expr = f64; // expressions reduce to numbers
}
impl Reducer<calc::Expr<Self>> for Evaluator {
fn reduce(&mut self, node: calc::Expr<Self>) -> Result<f64, ParseError> {
// node.0 is f64, node.1 is char, node.2 is f64
Ok(match node {
calc::Expr::Binop(l, op, r) => match op {
'+' => l + r, '*' => l * r, _ => 0.0,
},
calc::Expr::Literal(n) => n,
})
}
}
```
### Materializing a tree
If you *do* want a tree, set associated types to the node enums themselves. Each reduction stores its node, and the tree materializes through the normal reduction flow — no custom `Reducer` needed:
```rust
impl calc::Types for CstBuilder {
type Error = ParseError;
type Num = f64;
type Op = char;
type Expr = Box<calc::Expr<Self>>; // recursive, needs Box
}
// No Reducer impl — blanket handles it
```
Gazelle supports a few blanket reductions that allow you to generate a CST or just validate syntax without writing `Reducer` impls:
- **Identity**: `type Expr = calc::Expr<Self>` — node passes through unchanged (CST)
- **Box**: `type Expr = Box<calc::Expr<Self>>` — node is auto-boxed (CST with recursive types)
- **Ignore**: `type Expr = Ignore` — node is discarded (validation only)
You only write a custom `Reducer` when you need custom logic.
### Why Box is needed for recursive types
The generated enum for a recursive rule like `expr = expr OP expr => binop | NUM => literal` contains itself:
```rust
pub enum Expr<A: Types> {
Binop(A::Expr, A::Op, A::Expr), // contains A::Expr
Literal(A::Num),
}
```
If `type Expr = calc::Expr<Self>`, the type is infinitely sized — `Expr` contains `Expr` contains `Expr`... Rust won't allow this. Wrapping in `Box` breaks the cycle: `type Expr = Box<calc::Expr<Self>>` gives the compiler a known size (one pointer). The auto-box blanket handles the wrapping automatically.
Non-recursive types (like a `Statement` that contains an `Expr` but not another `Statement`) don't need boxing and can use identity directly.
### Mix and match
You can mix strategies in one implementation — evaluate some nodes, build trees for others, ignore the rest:
```rust
impl my_grammar::Types for MyActions {
type Expr = i64; // evaluate
type Statement = Box<my_grammar::Statement<Self>>; // build tree (boxed)
type Comment = Ignore; // discard
// ...
}
// Only need Reducer for Expr (custom logic)
// Statement and Comment are handled by blankets
```
---
## Grammar Syntax
Gazelle grammars can be written in `.gzl` files or inline with the `gazelle!` macro.
### Basic Structure
`.gzl` files contain the grammar directly:
```
start rule_name;
terminals {
// terminal declarations
}
// rule definitions
```
In the `gazelle!` macro, the grammar is wrapped with `grammar Name { ... }`:
```rust
gazelle! {
grammar Name {
start rule_name;
terminals { ... }
// rule definitions
}
}
```
### Terminal Declarations
Terminals are tokens from your lexer. Declare them in the `terminals` block:
```
terminals {
// Simple terminal (no payload, becomes () in generated code)
LPAREN,
RPAREN,
// Terminal with payload — `: _` generates an associated type
// named after the terminal (NUM → type Num, IDENT → type Ident)
NUM: _,
IDENT: _,
// Precedence terminal (for operator precedence parsing)
prec OP: _, // Carries precedence at runtime
prec BINOP, // Prec terminal without payload
}
```
**Terminal naming convention:** Terminals should be UPPERCASE to distinguish from non-terminals.
### Rule Definitions
Rules define the grammar's structure. Every alternative requires `=> name`:
```
// Single alternative
expr = expr PLUS term => add;
| expr MINUS term => sub
| term => term;
```
### Actions and Enum Generation
Each `=> name` generates an enum variant. Untyped symbols (no `: Type`) are omitted from variant fields:
```
expr = expr PLUS term => add;
// Generates enum variant: Expr::Add(A::Expr, A::Term)
// PLUS is untyped, so it's omitted — only typed symbols become fields
```
**Untyped rules** - if the non-terminal has no type annotation, output is `()`. The `Reducer` is still called for side effects:
```
// statement has no type annotation — output is ()
statement = expr SEMI => on_statement;
// Generates: Statement::OnStatement(A::Expr)
// Reducer<Statement<Self>> returns Result<(), Error>
```
### Modifiers
**Optional** (`?`) - zero or one:
```
trailing_comma = COMMA?;
// Generates Option<T> where T is the symbol's type
```
**Repetition** (`*`) - zero or more:
```
args = arg*;
// Generates Vec<T>
```
**One or more** (`+`) - at least one:
```
statements = statement+;
// Generates Vec<T>
```
**Separated list** (`%`) - one or more separated by a delimiter:
```
args = expr % COMMA;
// Generates Vec<T> where T is expr's type
// Parses: expr, expr COMMA expr, expr COMMA expr COMMA expr, ...
```
### Expect Declarations
Declare expected conflicts to suppress errors:
```
start translation_unit;
expect 3 rr; // Expect 3 reduce/reduce conflicts
expect 1 sr; // Expect 1 shift/reduce conflict
// ... rest of grammar
```
Use this for grammars with known ambiguities (like C's typedef or dangling else).
---
## The gazelle! Macro
The `gazelle!` macro generates a type-safe parser at compile time.
### Basic Usage
```rust
use gazelle_macros::gazelle;
use gazelle::Precedence;
gazelle! {
grammar Calc {
start expr;
terminals {
NUM: _,
LPAREN, RPAREN,
prec OP: _,
}
expr = expr OP expr => binop
| NUM => literal
| LPAREN expr RPAREN => paren;
}
}
```
### Visibility
Add `pub` before `grammar` for public visibility:
```rust
gazelle! {
pub grammar MyParser {
// ...
}
}
```
Or with restricted visibility:
```rust
gazelle! {
pub(crate) grammar MyParser {
// ...
}
}
```
---
## Generated Types
The macro generates a module (snake_case of grammar name, e.g., `calc`) containing:
### Types Trait
`Types` declares associated types and the error type:
```rust
pub trait Types: Sized {
type Error: From<ParseError>;
type Num: Debug;
type Op: Debug;
type Expr: Debug;
fn set_token_range(&mut self, start: usize, end: usize) {}
}
```
### Per-Node Enums
Each non-terminal with named alternatives generates an enum generic over `A: Types`:
```rust
pub enum Expr<A: Types> {
Binop(A::Expr, A::Op, A::Expr),
Literal(A::Num),
}
impl<A: Types> AstNode for Expr<A> {
type Output = A::Expr;
type Error = A::Error;
}
```
A blanket `Reducer` impl handles identity (CST), `Box<N>` (auto-boxing), and `Ignore` (discard) automatically. You only write a custom `Reducer` impl when you need custom logic.
### Terminal Enum
Represents input tokens:
```rust
pub enum Terminal<A: Types> {
Num(A::Num),
Lparen,
Rparen,
Op(A::Op, Precedence), // prec terminals include Precedence
}
```
### Parser Struct
```rust
pub struct Parser<A: Types> { /* ... */ }
impl<A: Types + Reducer<Expr<A>>> Parser<A> {
pub fn new() -> Self;
pub fn push(&mut self, terminal: Terminal<A>, actions: &mut A) -> Result<(), A::Error>;
pub fn finish(self, actions: &mut A) -> Result<A::Expr, (Self, A::Error)>;
pub fn state(&self) -> usize;
pub fn format_error(&self, err: &ParseError) -> String;
}
```
Note: `finish` returns `(Self, A::Error)` on error, giving back the parser so you can still call `format_error`.
---
## Using the Parser
### Step 1: Implement Types and Reducers
```rust
use gazelle::{ParseError, Reducer};
struct Evaluator;
impl calc::Types for Evaluator {
type Error = ParseError;
type Num = f64;
type Op = char;
type Expr = f64;
}
impl Reducer<calc::Expr<Self>> for Evaluator {
fn reduce(&mut self, node: calc::Expr<Self>) -> Result<f64, ParseError> {
Ok(match node {
calc::Expr::Binop(left, op, right) => match op {
'+' => left + right,
'-' => left - right,
'*' => left * right,
'/' => left / right,
_ => panic!("unknown operator"),
},
calc::Expr::Literal(n) => n,
})
}
}
```
### Step 2: Create Parser and Push Tokens
```rust
use gazelle::Precedence;
fn parse(input: &str) -> Result<f64, String> {
let mut parser = calc::Parser::<Evaluator>::new();
let mut actions = Evaluator;
// Your lexer loop
for token in lex(input) {
let terminal = match token {
Token::Num(n) => calc::Terminal::Num(n),
Token::Plus => calc::Terminal::Op('+', Precedence::Left(1)),
Token::Star => calc::Terminal::Op('*', Precedence::Left(2)),
Token::LParen => calc::Terminal::Lparen,
Token::RParen => calc::Terminal::Rparen,
};
parser.push(terminal, &mut actions)
.map_err(|e| parser.format_error(&e))?;
}
parser.finish(&mut actions)
.map_err(|(p, e)| p.format_error(&e))
}
```
### Step 3: Handle Errors
```rust
// Push errors - parser is still available for format_error
match parser.push(terminal, &mut actions) {
Ok(()) => { /* continue */ }
Err(e) => {
let msg = parser.format_error(&e);
// msg contains: "unexpected 'X', expected: A, B, C"
// " after: tokens parsed so far"
// " in rule: context"
return Err(msg);
}
}
// Finish errors - parser returned in the error tuple
match parser.finish(&mut actions) {
Ok(result) => { /* use result */ }
Err((parser, e)) => {
let msg = parser.format_error(&e);
return Err(msg);
}
}
```
---
## Advanced Features
### Precedence Terminals
For expression grammars, `prec` terminals carry precedence at runtime instead of encoding it in grammar structure:
```rust
terminals {
prec OP: _,
}
When lexing, attach precedence to each operator:
```rust
'+' => calc::Terminal::Op('+', Precedence::Left(1)), // Lower precedence
'*' => calc::Terminal::Op('*', Precedence::Left(2)), // Higher precedence
'^' => calc::Terminal::Op('^', Precedence::Right(3)), // Right-associative
```
**Precedence values:**
- `Precedence::Left(n)` - left-associative with level n
- `Precedence::Right(n)` - right-associative with level n
- Higher n = tighter binding
This enables user-defined operators at runtime!
### Token Range Tracking (Spans)
Implement `set_token_range` to track source positions:
```rust
impl calc::Types for SpanTracker {
type Error = ParseError;
type Op = char;
type Num = f64;
type Expr = Expr;
fn set_token_range(&mut self, start: usize, end: usize) {
// Called before each reduction with [start, end) token indices
self.current_span = self.token_spans[start].start..self.token_spans[end-1].end;
}
}
impl Reducer<calc::Expr<Self>> for SpanTracker {
fn reduce(&mut self, node: calc::Expr<Self>) -> Result<Expr, ParseError> {
Ok(match node {
calc::Expr::Binop(left, op, right) => Expr {
kind: ExprKind::BinOp(Box::new(left), op, Box::new(right)),
span: self.current_span.clone(),
},
// ...
})
}
}
```
### Lexer Feedback
For languages like C where lexing depends on parse state (typedef disambiguation):
```rust
struct CActions {
typedefs: HashSet<String>,
}
impl Reducer<c11::DeclTypedef<Self>> for CActions {
fn reduce(&mut self, node: c11::DeclTypedef<Self>) -> Result<...> {
// Extract name, register it
self.typedefs.insert(name.clone());
// ...
}
}
// In your parse loop:
loop {
// Lexer sees current typedef set
let token = lexer.next(&actions.typedefs)?;
parser.push(token, &mut actions)?;
}
```
The push-based architecture makes this natural - you control the loop.
### Multiple Implementations
The same grammar can have multiple action implementations:
```rust
// Evaluator
impl calc::Types for Evaluator { type Expr = f64; /* ... */ }
impl Reducer<calc::Expr<Self>> for Evaluator {
fn reduce(&mut self, node: calc::Expr<Self>) -> Result<f64, ParseError> { /* evaluate */ }
}
// CST Builder — Box needed because Expr is recursive
// The blanket Reducer handles it automatically — no impl needed!
impl calc::Types for CstBuilder { type Expr = Box<calc::Expr<Self>>; /* ... */ }
// Pretty Printer
impl calc::Types for Printer { type Expr = String; /* ... */ }
impl Reducer<calc::Expr<Self>> for Printer {
fn reduce(&mut self, node: calc::Expr<Self>) -> Result<String, ParseError> { /* format */ }
}
```
### Runtime Grammar API
For dynamic grammars, use the library API directly:
```rust
use gazelle::{parse_grammar, GrammarBuilder};
use gazelle::table::CompiledTable;
use gazelle::runtime::{CstParser, Token};
// Parse from string
let grammar = parse_grammar(r#"
start expr;
terminals { NUM, PLUS }
expr = expr PLUS expr => add | NUM => num;
"#)?;
// Or build programmatically
let mut gb = GrammarBuilder::new();
let num = gb.t("NUM");
let plus = gb.t("PLUS");
let expr = gb.nt("expr");
gb.rule(expr, vec![expr, plus, expr]);
gb.rule(expr, vec![num]);
let grammar = gb.build();
// Compile and parse
let compiled = CompiledTable::build(&grammar);
let mut parser = CstParser::new(compiled.table());
let num_id = compiled.symbol_id("NUM").unwrap();
let plus_id = compiled.symbol_id("PLUS").unwrap();
parser.push(Token::new(num_id))?; // NUM
parser.push(Token::new(plus_id))?; // PLUS
parser.push(Token::new(num_id))?; // NUM
let tree = parser.finish().map_err(|(p, e)| p.format_error(&e, &compiled))?;
// tree is a Cst: Leaf nodes carry SymbolId + token index,
// Node carry rule index + children
```
`CstParser` mirrors the generated parser's `push`/`finish` pattern. `Cst::Leaf` includes a token index so you can map back to your own token data (values, source positions, etc.).
For building a custom AST instead of a CST, use `Parser` directly with `maybe_reduce`/`shift`:
```rust
use gazelle::runtime::{Parser, Token};
let compiled = CompiledTable::build(&grammar);
let mut parser = Parser::new(compiled.table());
let mut stack: Vec<MyAst> = Vec::new();
let mut iter = tokens.into_iter();
loop {
let token = iter.next();
// Reduce while possible (rule 0 = accept)
while let Some((rule, len, _)) = parser.maybe_reduce(token)? {
if rule == 0 { return stack.pop().unwrap(); }
let children: Vec<MyAst> = stack.drain(stack.len() - len..).collect();
stack.push(build_ast_node(rule, children));
}
let Some(token) = token else { unreachable!() };
stack.push(MyAst::Leaf(token));
parser.shift(token);
}
```
---
## Errors and Recovery
### Parse errors
When `push` or `finish` returns an error, `format_error` produces a message showing what went wrong, what was expected, and where in the grammar the parser was:
```
unexpected 'STAR', expected: NUM, LPAREN
after: expr PLUS
in expr: expr OP • expr
```
The `•` marks the parser's position in the rule — it had seen `expr OP` and expected the right-hand operand.
Errors from `push` leave the parser intact so you can still call `format_error`:
```rust
parser.push(terminal, &mut actions).map_err(|e| parser.format_error(&e))?;
```
Errors from `finish` return the parser in the error tuple for the same reason:
```rust
parser.finish(&mut actions).map_err(|(p, e)| p.format_error(&e))?;
```
For nicer output, `format_error_with` lets you provide display names (mapping internal symbol names to user-facing ones) and the actual token texts for the "after:" context line:
```rust
let display_names = HashMap::from([("PLUS", "+"), ("STAR", "*"), ("LPAREN", "(")]);
let token_texts = vec!["1", "+", "*"]; // the tokens parsed so far
let msg = parser.format_error_with(&e, &display_names, &token_texts);
// unexpected '*', expected: NUM, (
// after: 1 +
// in expr: expr OP • expr
```
### Error recovery
When a parse error occurs, you can call `recover` on the low-level `Parser` to find a minimum-cost repair and continue parsing. Recovery uses Dijkstra search over possible insert/delete edits to find the cheapest way to get the parser back on track.
```rust
use gazelle::runtime::{Parser, Token, Repair};
// Parse tokens, recover on error
let mut pos = 0;
while pos < tokens.len() {
loop {
match parser.maybe_reduce(Some(tokens[pos])) {
Ok(None) => break, // ready to shift
Ok(Some((0, _, _))) => return, // accept
Ok(Some(_)) => continue, // reduce, keep going
Err(_) => {
// Error — recover with remaining tokens
let errors = parser.recover(&tokens[pos..]);
for e in &errors {
let repairs: Vec<_> = e.repairs.iter().map(|r| match r {
Repair::Insert(id) => format!("insert '{}'", ctx.symbol_name(*id)),
Repair::Delete(id) => format!("delete '{}'", ctx.symbol_name(*id)),
Repair::Shift => "shift".to_string(),
}).collect();
eprintln!("error at token {}: {}", e.position, repairs.join(", "));
}
return;
}
}
}
parser.shift(tokens[pos]);
pos += 1;
}
```
Each `RecoveryInfo` contains a position (token index where the error was detected) and a list of `Repair` actions:
- `Repair::Insert(id)` — a missing token was inserted (e.g., a forgotten `;`)
- `Repair::Delete(id)` — an extra token was deleted (e.g., a stray `+`)
- `Repair::Shift` — a token was shifted normally (free cost, not a real edit)
Recovery can find multiple errors in one pass — it repairs and continues until the end of input.
### Conflict errors
During table generation, Gazelle reports shift/reduce and reduce/reduce conflicts with full context:
```
Shift/reduce conflict on 'IDENT':
- Shift (continue parsing)
- Reduce by: __ident_dot_star -> (empty)
Parser state when seeing 'IDENT':
arg -> • IDENT EQ path [shift]
arg -> • path
__ident_dot_star -> • [reduce on IDENT]
```
Use `expect N rr;` / `expect N sr;` in the grammar to acknowledge known conflicts (like C's dangling else or typedef ambiguity).