rustlr 0.3.98

LR Parser Generator with Advanced Options
Documentation
<!DOCTYPE html>
    <html>
    <head>
        <meta charset="UTF-8">
        <title>Generating &lbrack;Bump&rbrack;&lbrack;bumpalo&rbrack;-Allocated ASTs</title>
        <style>
/* From extension vscode.github */
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

.vscode-dark img[src$=\#gh-light-mode-only],
.vscode-light img[src$=\#gh-dark-mode-only] {
	display: none;
}

</style>
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.css">
<link href="https://cdn.jsdelivr.net/npm/katex-copytex@latest/dist/katex-copytex.min.css" rel="stylesheet" type="text/css">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css">
<style>
            body {
                font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif;
                font-size: 14px;
                line-height: 1.6;
            }
        </style>
        <style>
.task-list-item {
    list-style-type: none;
}

.task-list-item-checkbox {
    margin-left: -20px;
    vertical-align: middle;
    pointer-events: none;
}
</style>
        
    </head>
    <body class="vscode-body vscode-light">
        <h2 id="generating-bump-allocated-asts">Generating <a href="https://docs.rs/bumpalo/latest/bumpalo/index.html">Bump</a>-Allocated ASTs</h2>
<p>Abstract syntax trees are usually defined recursively.  This normally means
some form of smart pointer is requried to implement them.  The problem with
this aspect of Rust is that the smart pointers block <em>nested</em> patterns from
matching against AST expressions.  It is possible to define recursive structures
using references (borrows) but all references in an AST structure must have
compatible lifetimes.  An &quot;arena&quot; structure is required to hold the values that
they reference.  While it's possible to define such an arena manually, rustlr
provides support for using the <a href="https://docs.rs/bumpalo/latest/bumpalo/index.html">bumpalo</a> crate.</p>
<p>A grammar can begin with the declaration</p>
<pre><code>auto-bump
</code></pre>
<p>in place of <code>auto</code>, which enables the generation of bump-allocated ASTs.
<strong>It is also necessary to place <code>bumpalo = &quot;3&quot;</code> in your crate dependencies</strong>
(in Cargo.toml). Although user code do not need to reference the crate
directly, the generated parser code does.</p>
<p>The disadvantage of bumpalo is that it bipasses some of the memory
safety checks of Rust. Bump-allocation is not recommended if frequent
changes are made to the allocated structures.  They are appropriate if
the AST remains relatively stable once created, with few changes if
any, until the entire structure can be dropped.</p>
<p>The advantage of bump-allocation, besides an increase in speed, is
that it enables the matching of nested patterns against recursive
types.  Consider the following recursive type that tries to avoid
smart pointers:</p>
<pre><code>enum Expr&lt;'t&gt; {
  Var(&amp;'t str),
  Negative(&amp;'t Expr&lt;'t&gt;),
  Plus(&amp;'t Expr&lt;'t&gt;, &amp;'t Expr&lt;'t&gt;),
  Minus(&amp;'t Expr&lt;'t&gt;, &amp;'t Expr&lt;'t&gt;),
}
</code></pre>
<p>and a function that &quot;pretty-prints&quot; such expressions. The function will print
<code>x</code> instead of <code>--x</code>, <code>x-y</code> instead of <code>x+-y</code>, and <code>x+y</code> instead of <code>x--y</code>.
It also pushes negation inside plus/minus, and only prints parentheses as
required by the non-associative minus symbol.</p>
<pre><code>fn pprint&lt;'t&gt;(expr:&amp;'t Expr&lt;'t&gt;,) {
  use crate::Expr::*;
  match expr {
    Negative(Negative(n)) =&gt; pprint(n),
    Negative(Plus(a,b)) =&gt; pprint(&amp;Minus(&amp;Negative(a),b)),
    Negative(Minus(a,b)) =&gt; pprint(&amp;Plus(&amp;Negative(a),b)),
    Negative(n) =&gt; {print!(&quot;-&quot;); pprint(n)},
    Plus(a,Negative(b)) =&gt; pprint(&amp;Minus(a,b)),
    Minus(a,Negative(b)) =&gt; pprint(&amp;Plus(a,b)),
    Minus(a,p@Minus(_,_)) =&gt; { pprint(a); print!(&quot;-(&quot;); pprint(p); print!(&quot;)&quot;)},
    Minus(a,p@Plus(_,_)) =&gt; { pprint(a); print!(&quot;+(&quot;); pprint(p); print!(&quot;)&quot;)},
    Plus(a,b) =&gt; {pprint(a); print!(&quot;+&quot;); pprint(b);},
    Minus(a,b) =&gt; {pprint(a); print!(&quot;-&quot;); pprint(b)},    
    Var(x) =&gt; print!(&quot;{}&quot;,x),
  }//match expr
}//pprint
</code></pre>
<p>Then given</p>
<pre><code>    let q = Plus(&amp;Negative(&amp;Negative(&amp;Var(&quot;x&quot;))),&amp;Negative(&amp;Var(&quot;y&quot;)));
    pprint(&amp;q);
</code></pre>
<p>will print <code>x-y</code>. Such a &quot;declarative&quot; style of programming is not
possible if the type is defined using Box (or <a href="https://docs.rs/rustlr/latest/rustlr/generic_absyn/struct.LBox.html">LBox</a>).</p>
<p>Notice that the references to new structures created by the function are
passed recursively &quot;up-stack&quot;, and so are safe.
However, allocating such structures temporarily on the stack is impractical.
We would not be able to return references to them.
You may also get the dreaded compiler error <em>&quot;creates a temporary which is freed while still in use&quot;</em> when writing expressions such as <code>Plus(&amp;q,&amp;Negative(&amp;q))</code>.</p>
<p>The <a href="https://docs.rs/bumpalo/latest/bumpalo/index.html">bumpalo</a> crate offers an &quot;arena&quot;, a kind of <em>simulated heap</em>,
that allow us to access these structures within the lifetime of the arena:</p>
<pre><code> let bump = bumpalo::Bump::new(); // creates new bump arena
 let p = bump.alloc(Negative(bump.alloc(Negative(bump.alloc(Var(&quot;z&quot;))))));
</code></pre>
<p>The <a href="https://docs.rs/bumpalo/latest/bumpalo/struct.Bump.html#method.alloc">Bump::alloc</a> function returns a reference to the allocated memory within
the arena.  We can pass a reference to the &quot;bump&quot; to a function, which would
then be able to construct and return bump-allocated structures using references
of the same lifetime.</p>
<p>Rustlr (since version 0.3.93) can generate such structures automatically.
Most of bumpalo's usage is well-encapsulated, although
<code>bumpalo = &quot;3&quot;</code> does need to be added to the crate's
dependencies.  The easiest way to enable bumpalo is through enhancements to
the <a href="https://docs.rs/rustlr/latest/rustlr/lexer_interface/struct.LexSource.html">Lexsource</a> structure.  The following code fragment
demonstrates how to envoke a parser generated from an <code>auto-bump</code> grammar,
<a href="https://cs.hofstra.edu/~cscccl/rustlr_project/bumpcalc/bautocalc.grammar">bautocalc.grammar</a>,</p>
<pre><code>   let srcfile = &quot;test1.txt&quot;; // srcfile names file to parse
   let source=LexSource::with_bump(srcfile).unwrap();
   let mut scanner = bautocalcparser::bautocalclexer::from_source(&amp;source);   
   let mut parser = bautocalcparser::make_parser();
   let result = bautocalcparser::parse_with(&amp;mut parser, &amp;mut scanner);
</code></pre>
<p>A Rustlr Lexsource object containing a <a href="https://docs.rs/bumpalo/latest/bumpalo/struct.Bump.html">Bump</a> is created with <a href="https://docs.rs/rustlr/latest/rustlr/lexer_interface/struct.LexSource.html#method.with_bump">Lexsource::with_bump</a>.
The <code>parse_with</code> function, which is generated for individual parsers, will place
a reference to the bump arena inside the parser.  The automatically
generated semantic actions will call <a href="https://docs.rs/bumpalo/latest/bumpalo/struct.Bump.html#method.alloc">Bump::alloc</a> to create ASTS that will have <strong>the
same lifetime as the Lexsource structure.</strong></p>
<p>The link to the entire sample crate is <a href="https://cs.hofstra.edu/~cscccl/rustlr_project/bumpcalc/">here</a>.</p>
<h4 id="replace-the-lbox"><strong>Replace the LBox</strong></h4>
<p>Although the <a href="https://docs.rs/rustlr/latest/rustlr/generic_absyn/struct.LBox.html">LBox</a> is no longer needed, a device to capture the
lexical position (line/column) information in the AST in a
non-intrusive manner is still re required.  Along with the <code>auto-bump</code>
option is the struct <a href="https://docs.rs/rustlr/latest/rustlr/generic_absyn/struct.LC.html">LC</a>.  This is just a tuple with open fields
for the value, and an inner tuple with line and column numbers. We've
implemented The Deref/DerefMut traits so the value can be exposed in
the manner of a smart pointer, but there's in fact no pointer
underneath.  Inside a grammar, right-hand side labels such as <code>[a]</code>
will automatically bind <code>a</code> to an LC struct and generate AST types
using LC.  For example,</p>
<pre><code>Expr:Div --&gt; Expr:[e1] / Expr:[e2]
</code></pre>
<p>will generate an <code>Expr&lt;'lt&gt;</code> enum with a variant</p>
<pre><code>Div{e1:&amp;'lt LC&lt;Expr&lt;'lt&gt;&gt;,e2:&amp;'lt LC&lt;Expr&lt;'lt&gt;&gt;},
</code></pre>
<p>as well as semantic actions that inserts line/column information into the
AST as LC enclosures.</p>
<p>Note that a lifetime argument is required for all recursive types.</p>
<h4 id="dealing-with-recursive-structs"><strong>Dealing with Recursive Structs</strong></h4>
<p>There is currently one minor limitation with the <code>auto-bump</code> option.  When
a struct is recursive, it may not be possible to generate code that
<code>#[derive(Default)]</code>, which is required for all types in Rustlr, without some
help from the user.  A reference type <code>&amp;A</code> does not have a default and so
a type <code>A</code> that contains a reference to itself also does not have a default. In
such cases, help from the user is required to implement the trait for
these reference types.  For example,</p>
<pre><code>A1 --&gt; B1 int
B1 --&gt; var var A1
flatten B1
</code></pre>
<p>would generate the types</p>
<pre><code>#[derive(Default,Debug)]
pub struct A1&lt;'lt&gt;(pub &amp;'lt str,pub &amp;'lt str,pub &amp;'lt A1&lt;'lt&gt;,pub i64,);

#[derive(Default,Debug)]
pub struct B1&lt;'lt&gt;(pub &amp;'lt str,pub &amp;'lt str,pub &amp;'lt A1&lt;'lt&gt;,);
</code></pre>
<p>but will fail to compile. The following warning will be given by rustlr:</p>
<pre><code>WARNING: Recursive structs may require the manual implementation of the Default trait for reference types, as in
  impl&lt;'lt&gt; Default for &amp;'lt A1&lt;'lt&gt; ...
</code></pre>
<p>The warning should be heeded by including the following in the grammar:</p>
<pre><code>$static A1DEFAULT:A1&lt;'static&gt; = A1(&quot;&quot;,&quot;&quot;,&amp;A1DEFAULT,0);
$impl&lt;'t&gt; Default for &amp;'t A1&lt;'t&gt; { fn default() -&gt; Self { &amp;A1DEFAULT } }
</code></pre>
<p>Lines that begin with <code>$</code> are injected into the <code>_ast.rs</code> file created from the
grammar.  These definitions will allow the <code>#[derive(Default)]</code> tags for the
<code>A1</code> and <code>B1</code> types to be effective.  Currently, such lines are not generated
automatically because of Rust's restrictions to what static definitions can
reference.</p>
<p>This problem can also be avoided by just using enums instead of
structs in such cases: just assign a left-side label to the sole
production rules for <code>A1</code> and <code>B1</code>.  Enums always include a <code>_Nothing</code> variant
so that a default can always be defined.</p>
<h4 id="requisitioning-the-external-state"><strong>Requisitioning the 'External State'</strong></h4>
<p>There is another implemention-related detail that may be of occassional
concern.</p>
<p>To integrate the bumpalo option into Rustlr with minimal disturbance to its
other components, the <a href="https://docs.rs/rustlr/0.3.97/rustlr/generic_absyn/struct.Bumper.html">Bumper</a> struct was introduced.  A grammar with the <code>auto-bump</code>
option will generate a parser with a Bumper struct as its <code>exstate</code> field.
This struct contains procedures to access the encapsulated bump arena.  One
can still declare and use an arbitrary <code>externtype</code>
for a <em>Bumper</em> will also include
a field of this type, which is accessible as a ref mut via the
<a href="https://docs.rs/rustlr/0.3.97/rustlr/generic_absyn/struct.Bumper.html#method.state">Bumper::state</a> procedure.</p>
<p>In most cases, this detail is only required to be understood if the
<code>exstate</code> is to be used for another purpose, in which case all references
to <code>parser.exstate</code> should be replaced by <code>parser.exstate.state()</code>.
In addition to <code>exstate</code> each parser also contains a separate,
reference-counted <code>shared_state</code> field of the same type.<br>
The only other situation that requires understanding of this implementation
detail is when writing semantic actions manually that creates
bump-allocated structures.  They should be created by calling
<code>parser.exstate.make(...)</code>.</p>
<h4 id="yet-another-calculator"><strong>Yet another calculator...</strong></h4>
<p>We conclude this Chapter with a full grammar with the <code>auto-bump</code> option.
Two sample productions with manually written semantic actions are included
after <code>EOF</code> to illustrate accessing the <code>parser.exstate</code> as a &quot;Bumper&quot;.</p>
<pre><code># auto-generate bump-allocated ASTs with refs instead of smart pointers
auto-bump
lifetime 'lt
nonterminals E ES
nonterminals A1 B1 C1 A2
terminals + - * / ( ) = ;
terminals let in
valueterminal int ~ i64 ~ Num(n) ~ n
valueterminal var ~ &amp;'lt str ~ Alphanum(x) ~ x
topsym ES
resync ;

left * 500
left / 500
left + 400
left - 400
nonassoc = 300

lexattribute set_line_comment(&quot;#&quot;)

E:Val --&gt; int
E:Var --&gt; var
E:Plus --&gt; E + E
E:Minus --&gt; E - E
E:Times --&gt; E * E
E:Div --&gt; E:[e1] / E:[e2]
E(600):Neg --&gt; - E:[e]
E:Let --&gt; let var = E in E
E --&gt; ( E )
ES --&gt; E&lt;;+&gt; ;?

A1 --&gt; B1 int
B1 --&gt; var var A1
flatten B1
flatten A2
C1 --&gt; A2 A1 B1 ; E
A2 --&gt; var int

$static A1DEFAULT:A1&lt;'static&gt; = A1(&quot;&quot;,&quot;&quot;,&amp;A1DEFAULT,0);
$static B1DEFAULT:B1&lt;'static&gt; = B1(&quot;&quot;,&quot;&quot;,&amp;A1DEFAULT);
$impl&lt;'t&gt; Default for &amp;'t A1&lt;'t&gt; { fn default() -&gt; Self { &amp;A1DEFAULT } }
$impl&lt;'t&gt; Default for &amp;'t B1&lt;'t&gt; { fn default() -&gt; Self { &amp;B1DEFAULT } }

EOF

#Unused productions for example:
ES ==&gt; E:n ; {
  let bump = &amp;parser.exstate;
  let mut v1 = Vec::new(); /* not bump-allocated */
  v1.push(bump.make(parser.lc(0,n)));
  Seq(v1)
  } &lt;==
  
ES ==&gt; ES:@Seq(mut v)@  E:e ;  {
   v.push(parser.exstate.make(parser.lc(1,e)));
   Seq(v)
   } &lt;==
</code></pre>
<hr>

        <script async src="https://cdn.jsdelivr.net/npm/katex-copytex@latest/dist/katex-copytex.min.js"></script>
        
    </body>
    </html>