Expand description
RACC – Rust Another Compiler-Compiler
This is a port of Barkeley YACC to Rust. It runs as a procedural macro, and so allows you to
define grammars directly in Rust source code, rather than calling an external tool or writing
a build.rs
script.
How to write a grammar
Here is a very brief example of how to use RACC. This program evaluates a very limited class of numeric expressions.
In `Cargo.toml:
racc = "0.1.0"
In your code:
#[racc::grammar]
mod grammar {
// This is the list of tokens defined for your grammar. The `Token` type defines the input
// the generated parser.
enum Token {
NUM(i32),
PLUS,
MINUS,
LPAREN,
RPAREN,
}
// Define the rules of your language. The first rule implicitly defines the goal symbol.
// Note the presence of '=x' in the rule definitions. These are name bindings, which RACC
// uses in order to allow your code blocks (which are in { ... } braces) to access the
// values for each symbol. The values come from the value stack in the parser state machine.
// When you call push_token(), you provide both the token code and the "value" for that token.
Expr -> i32 : NUM(x) { x };
Expr : LPAREN Expr(x) RPAREN { x };
Expr : Expr(left) PLUS Expr(right) {
// You can put arbitrary code here.
println!("evaluating: {} + {}", left, right);
// The value of the action block is used as the
// value of the rule (reduction). Note the absence
// of a semi-colon here.
left + right
};
Expr : Expr(left) MINUS Expr(right) {
println!("evaluating: {} - {}", left, right);
left - right
};
}
use Token::*;
fn main() {
// The tokens in our input, and their numeric values.
let tokens = vec![
LPAREN,
NUM(50),
PLUS,
NUM(25),
RPAREN,
MINUS,
NUM(10),
];
match Parser::parse(tokens.into_iter()) {
Ok(value) => println!("Accepted: {}", value),
Err(e) => println!("Syntax error: {:?}", e),
}
}
Generating a parser
The parser generator (the racc::grammar!
macro) analyzes your grammar definition and
generates a parser for your grammar. The parser uses lookup tables and generated code. It also
contains the “actions” that you define in your grammar. Actions are code blocks that you provide,
which run when one of the rules defined in your grammar is matched (is “reduced”).
Invoking the parser
You can invoke the parser in two ways: push or pull.
Pull model
In the pull model, you invoke the Parser::parse()
function and give it an iterator that
produces tokens. This is the easiest way to invoke the parser; everything happens within the
scope of the parse()
function call.
match Parser::parse(vec![FOO, BAR, /* ... more tokens ... */]) {
Ok(result) => println!("Parsing produced: {:?}", result),
Err(e) => println!("Parsing failed: {:?}", e);
}
The parse()
function is generic over its input. It accepts any Iterator
implementation that
produces the Token
values defined by your grammar.
Push model
In the push model, you create an instance of Parser
and then call parser.push(..)
to push
tokens into it. Each call to push()
advances the state of the parser. If the token is not valid
according to the rules defined by your grammar, then push()
will return an Err
value; you must
always check the return value of push()
.
When you are done processing tokens, you can call parser.finish()
to indicate that there are no
more tokens and to receive the result of parsing.
let mut parser = Parser::new();
parser.push(FOO)?;
parser.push(BAR)?;
// ... more tokens ...
match parser.finish() {
Ok(result) => println!("Parsing produced: {:?}", result),
Err(e) => println!("Parsing failed: {:?}", e);
}
If your token source uses async
, then you will need to use the push model, since the
Iterator
trait does not support asynchronous data sources.
Actions
Most grammars will want to execute an action when certain rules are matched. Actions can be
added at the end of a rule by adding a Rust code block, .e.g { ... }
.
The parser generator collects all of the actions and generates a single function which handles running the correct action when a rule fires.
Context: Accessing external data during parsing
Many parsers need to access external state. For example, a parser that implements an interpreter might need to store a table of named variables. RACC supports this by specifying a context type. For example:
#[racc::grammar]
mod grammar {
type Context = MyContext;
enum Token {
FOO,
BAR,
}
G : FOO { context.output.push_str("FOO! "); }
| BAR { context.output.push_str("BAR! "); } ;
}
use Token::*;
#[derive(Default)]
pub struct MyContext {
output: String,
}
let mut context = MyContext::default();
Parser::parse(&mut context, vec![FOO, FOO, BAR]);
print!("output: {}", context.output);
It is often necessary, when imlementing a parser, to access external or “environmental” state, when executing rule actions. In C parsers, this is usually done with global variables. However, this is not an option in Rust (and would be undesirable, even if it were an option).
RACC provides a safe means to access such data. Rules may access an “app context”.
When the app calls push
or finish
, the app also passes a &mut
reference
to an “app context” value. The type of this value can be anything defined by the
application. (It is necessary to specify the type in the grammar!
definition.)
Within the scope of the rule action, this value may be accessed by using the identifier
specified in the grammar definition. In the example above, the identifier is ctx
,
and the type of the context is uint
.
Propagating values through the parsing tree
Rules may optionally produce a value. Rules which consume the output of other rules may use those values. This allows grammars to propagate data through the parsing tree.
To use this feature, you must specify a type for each (non-terminal) symbol that carries data,
by using -> <type>
after the left-hand side symbol when defining a rule. A type can only be
specified once per symbol; redundant definitions are not allowed, even if they specify the same
type. Then, to receive values as input for a rule, specify a binding for each symbol on the
right-hand side of a rule, by specifying <symbol> = <binding>
, where <symbol>
is a symbol on
the right-hand side of a rule (either a terminal or a non-terminal) and <binding>
is the name
of the binding for that value.
Tokens may also carry values. For example, an interpreter might define a LITERAL
token which
contains a literal numeric value.
For example:
#[racc::grammar]
mod grammar {
enum Token {
LITERAL(f64), // This specifies that LITERAL contains a value.
PLUS,
MINUS,
LPAREN,
RPAREN,
}
// `Expr -> f64` specifies that `Expr` returns `f64`.
Expr -> f64 : LITERAL(lit) { lit }
| Expr(a) PLUS Expr(b) { a + b } // Expr=a and Expr=b specify bindings
| Expr(a) MINUS Expr(b) { a - b }
| LPAREN Expr(e) RPAREN { e }
;
}
let result: f64 = Parser::parse(
vec![
Token::LITERAL(10.0),
Token::PLUS,
Token::LITERAL(20.0),
]
).unwrap();
assert_eq!(result, 30.0);
In this code, Expr=left
means “match the symbol Expr
and bind its value to the
name left
within the scope of the action,” and similarly for Expr=right
.
Instead of using $$
for setting the value of the rule action, the value of the rule
action is simply the value of the action, when evaluated using the normal rules of Rust.
This is why the action block in the example ends with left + right
and not left + right;
.
Ending the action with ;
would mean that the rule evaluates to ()
.
Note that all rules must evaluate to a value, even if that value is ()
or None
, and
the type of the value must match the type specified in the grammar. RACC (like Rust) will
not perform any implicit conversions, or insert any implicit None
values.
If you do not wish to propagate values in this way, you can use a symbol value of ()
.
If you do this, then you may have empty rule actions.
Finishing parsing
In Berkeley YACC, the lexer indicates the end of an input stream by reporting a YYEOF
token. Because RACC uses a push model rather than a pull model, a RACC-based parser
indicates the end of the input stream by calling the parser.finish()
method. The
finish
method performs any final reductions that are necessary, and then checks whether
the grammar accepts the input. If the grammar accepts the input, then finish
will
return FinishParseResult::Accept(value)
, where value
is the value of the entire
parse tree.
License
Berkeley YACC is in the public domain. From its README
file:
Berkeley Yacc is in the public domain. The data structures and algorithms
used in Berkeley Yacc are all either taken from documents available to the
general public or are inventions of the author. Anyone may freely distribute
source or binary forms of Berkeley Yacc whether unchanged or modified.
Distributers may charge whatever fees they can obtain for Berkeley Yacc.
Programs generated by Berkeley Yacc may be distributed freely.
RACC is published under the MIT open-source license, which should be compatible with nearly all open source needs and should be compatible with the letter and spirit of Berkeley YACC’s license.
Stability
The ideas implemented in YACC are stable and time-tested. RACC is a port of YACC, and should be considered unstable. The implementation may contain porting bugs, where the behavior of RACC is not faithful to the original YACC. Rust procedural macros and the ecosystem supporting them is also still growing and changing.
So if you build anything using RACC, please be aware that both Rust and RACC are still evolving quickly, and your code may break quickly and without notice.
The original Berkeley YACC contains a strident disclaimer, which is repeated here:
Berkeley Yacc is distributed with no warranty whatever. The author
and any other contributors take no responsibility for the consequences of
its use.
That disclaimer applies to this Rust port, as well. The author (of the Rust port) makes no claim of suitability for any purpose, and takes no responsibility for the consequences of its use.
TODO List
-
Allow grammars to specify precedence and associativity. The underlying code implements support for precedence and associativity, exactly as in Berkeley YACC, but this is not yet exposed.
-
Port a lexical analyzer, too.
Author
RACC was implemented by Arlie Davis arlie.davis@gmail.com
. I did this as an experiment
in porting a well-known (and useful) tool to Rust. I was also intrigued by Rust’s support
for procedural macros, and I wanted to see whether I could implement something interesting
using procedural macros.
Feedback
Feel free to send me any feedback on RACC to arlie.davis@gmail.com
.