RACC -- Rust Another Compiler-Compiler
This is a port of Barkeley YACC to Rust. It runs as a procedural macro, and so allows you to
define grammars directly in Rust source code, rather than calling an external tool or writing
a build.rs
script.
How to write a grammar
Here is a very brief example of how to use RACC. This program evaluates a very limited class of numeric expressions.
In `Cargo.toml:
= "0.1.0"
In your code:
!
*/
grammar
Advancing the parser state machine
Berkeley YACC generates a yyparse
function, as the primary entry point to the parser.
Your code is integrated into yyparse
in several ways. First, yyparse
will call your
yylex
function in order to read the next token from the input. Then yyparse
will
advance the state machine, and when rules have been matched ("reduced"), the action code
that you provided (in { ... }
blocks) will execute.
In this model, the yyparse
method runs until all of the tokens have been processed, or
until an action block prematurely exits the parser. However, this model suffers from
several problems. It puts too much control in the generated code, and requires the
parser generator (YACC / RACC) to call into too many "hook" functions, such as yylex
.
Instead, in RACC I have decided to use a different API model. Instead of generating a
yyparse
function, RACC generates parsing tables and a reduce
function. The reduce
function contains all of the rule action blocks (your code). RACC also generates a
new_parser
method, which returns a new ParsingState
struct which contains references
to the parsing tables and the generated reduce
method. Your app then makes calls
to parser.push_token()
to push tokens into the parser. This inverts the control-flow
model -- your app code is in control, and makes brief calls into the RACC runtime and
generated code in order to advance the state of the parser.
This is simpler and more flexible, and I hope will be a more natural fit for Rust.
This parsing model also works well with Rust's lifetime model. Each parser object
(each instance of ParsingState
) contains only the state necessary to advance the
state machine, and the contents of the "value" stack.
Accessing external data during parsing
It is often necessary, when imlementing a parser, to access external or "environmental" state, when executing rule actions. In C parsers, this is usually done with global variables. However, this is not an option in Rust (and would be undesirable, even if it were an option).
RACC provides a safe means to access such data. Rules may access an "app context".
When the app calls push_token
or finish
, the app also passes a &mut
reference
to an "app context" value. The type of this value can be anything defined by the
application. (It is necessary to specify the type in the grammar!
definition.)
Within the scope of the rule action, this value may be accessed by using the identifier
specified in the grammar definition. In the example above, the identifier is ctx
,
and the type of the context is uint
.
Propagating values through the parsing tree
In Berkeley YACC, the tokenizer stage (lexer) may set the yylval
variable to a value,
in order to specify a "value" for the current token. This value is shifted onto the
value stack, and is accessible to rule actions using the $1
.. $n
syntax. Rules
specify the result value of the rule by assigning the $$
value.
RACC has a similar facility, but the syntax for using it is different. The syntax in
RACC is a more natural fit for Rust. Instead of using $1
bindings, RACC grammars
specify name bindings using = name
after a symbol in a rule definition. For example:
Expr : Expr=left PLUS Expr=right ;
In this code, Expr=left
means "match the symbol Expr
and bind its value to the
name left
within the scope of the action," and similarly for Expr=right
.
Instead of using $$
for setting the value of the rule action, the value of the rule
action is simply the value of the action, when evaluated using the normal rules of Rust.
This is why the action block in the example ends with left + right
and not left + right;
.
Ending the action with ;
would mean that the rule evaluates to ()
.
Note that all rules must evaluate to a value, even if that value is ()
or None
, and
the type of the value must match the type specified in the grammar. RACC (like Rust) will
not perform any implicit conversions, or insert any implicit None
values.
If you do not wish to propagate values in this way, you can use a symbol value of ()
.
If you do this, then you may have empty rule actions.
Finishing parsing
In Berkeley YACC, the lexer indicates the end of an input stream by reporting a YYEOF
token. Because RACC uses a push model rather than a pull model, a RACC-based parser
indicates the end of the input stream by calling the parser.finish()
method. The
finish
method performs any final reductions that are necessary, and then checks whether
the grammar accepts the input. If the grammar accepts the input, then finish
will
return FinishParseResult::Accept(value)
, where value
is the value of the entire
parse tree.
License
Berkeley YACC is in the public domain. From its README
file:
Berkeley Yacc is in the public domain. The data structures and algorithms
used in Berkeley Yacc are all either taken from documents available to the
general public or are inventions of the author. Anyone may freely distribute
source or binary forms of Berkeley Yacc whether unchanged or modified.
Distributers may charge whatever fees they can obtain for Berkeley Yacc.
Programs generated by Berkeley Yacc may be distributed freely.
RACC is published under the MIT open-source license, which should be compatible with nearly all open source needs and should be compatible with the letter and spirit of Berkeley YACC's license.
Stability
The ideas implemented in YACC are stable and time-tested. RACC is a port of YACC, and should be considered unstable. The implementation may contain porting bugs, where the behavior of RACC is not faithful to the original YACC. Rust procedural macros and the ecosystem supporting them is also still growing and changing.
So if you build anything using RACC, please be aware that both Rust and RACC are still evolving quickly, and your code may break quickly and without notice.
The original Berkeley YACC contains a strident disclaimer, which is repeated here:
Berkeley Yacc is distributed with no warranty whatever. The author
and any other contributors take no responsibility for the consequences of
its use.
That disclaimer applies to this Rust port, as well. The author (of the Rust port) makes no claim of suitability for any purpose, and takes no responsibility for the consequences of its use.
TODO List
-
Allow grammars to specify precedence and associativity. The underlying code implements support for precedence and associativity, exactly as in Berkeley YACC, but this is not yet expose.
-
Support reading standalone grammars, either using the Rust parser or something else.
-
Port a lexical analyzer, too.
Author
RACC was implemented by Arlie Davis arlie.davis@gmail.com
. I did this as an experiment
in porting a well-known (and useful) tool to Rust. I was also intrigued by Rust's support
for procedural macros, and I wanted to see whether I could implement something interesting
using procedural macros.
Feedback
Feel free to send me any feedback on RACC to arlie.davis@gmail.com
.