Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
About
Peepmatic is a DSL for peephole optimizations and compiler for generating
peephole optimizers from them. The user writes a set of optimizations in the
DSL, and then peepmatic
compiles the set of optimizations into an efficient
peephole optimizer:
DSL peepmatic> Peephole Optimizer
The generated peephole optimizer has all of its optimizations' lefthand sides collapsed into a compact automata that makes matching candidate instruction sequences fast.
The DSL's optimizations may be written by hand or discovered mechanically with a
superoptimizer like Souper. Eventually, peepmatic
should have a verifier
that ensures that the DSL's optimizations are sound, similar to what Alive
does for LLVM optimizations.
Example
This snippet of our DSL describes optimizations for removing redundant bitwiseor instructions that are noops:
(=> (bor $x (bor $x $y))
(bor $x $y))
(=> (bor $y (bor $x $y))
(bor $x $y))
(=> (bor (bor $x $y) $x)
(bor $x $y))
(=> (bor (bor $x $y) $y)
(bor $x $y))
When compiled into a peephole optimizer automaton, they look like this:
A DSL for Optimizations
A single peephole optimization has two parts:
 A lefthand side that describes candidate instruction sequences that the optimization applies to.
 A righthand side that contains the new instruction sequence that replaces old instruction sequences that the lefthand side matched.
A lefthand side may bind subexpressions to variables and the righthand side may contain those bound variables to reuse the subexpressions. The operations inside the lefthand and righthand sides are a subset of clif operations.
Let's take a look at an example:
(=> (imul $x 2)
(ishl $x 1))
As you can see, the DSL uses Sexpressions. (Sexpressions are easy to parse and
we also have a bunch of nice parsing infrastructure for Sexpressions already
for our wat
and wast
crates.)
The lefthand side of this optimization is (imul $x 2)
. It matches integer
multiplication operations where a value is multiplied by the constant two. The
value multiplied by two is bound to the variable $x
.
The righthand side of this optimization is (ishl $x 1)
. It reuses the $x
variable that was bound in the lefthand side.
This optimization replaces expressions of the form x * 2
with x << 1
. This
is sound because multiplication by two is the same as shifting left by one for
binary integers, and it is desirable because a shiftleft instruction executes
in fewer cycles than a multiplication.
The general form of an optimization is:
(=> <lefthandside> <righthandside>)
Variables
Variables begin with a dollar sign and are followed by lowercase letters,
numbers, hyphens, and underscores: $x
, $y
, $myvar
, $operand2
.
Lefthand side patterns may contain variables that match any kind of subexpression and give it a name so that it may be reused in the righthand side.
;; Replace `x + 0` with simply `x`.
(=> (iadd $x 0)
$x)
Within a pattern, every occurrence of a variable with the same name must match
the same value. That is (iadd $x $x)
matches (iadd 1 1)
but does not match
(iadd 1 2)
. This lets us write optimizations such as this:
;; Xor'ing a value with itself is always zero.
(=> (bxor $x $x)
(iconst 0))
Constants
We've already seen specific integer literals and wildcard variables in patterns,
but we can also match any constant. These are written similar to variables, but
use uppercase letters rather than lowercase: $C
, $MYCONST
, and $OPERAND2
.
For example, we can use constant patterns to combine an iconst
and iadd
into
a single iadd_imm
instruction:
(=> (iadd (iconst $C) $x)
(iadd_imm $C $x))
Nested Patterns
Patterns can also match nested operations with their own nesting:
(=> (bor $x (bor $x $y))
(bor $x $y))
Preconditions and Unquoting
Let's reconsider our first example optimization:
(=> (imul $x 2)
(ishl $x 1))
This optimization is a little too specific. Here is another version of this optimization that we'd like to support:
(=> (imul $x 4)
(ishl $x 2))
We don't want to have to write out all instances of this general class of optimizations! That would be a lot of repetition and could also bloat the size of our generated peephole optimizer's matching automata.
Instead, we can generalize this optimization by matching any multiplication by a
power of two constant C
and replacing it with a shift left of log2(C)
.
First, rather than match 2
directly, we want to match any constant variable C
:
(imul $x $C)
Note that variables begin with lowercase letters, while constants begin with
uppercase letters. Both the constant pattern $C
and variable pattern $x
will
match 5
, but only the variable pattern $x
will match a whole subexpression
like (iadd 1 2)
. The constant pattern $C
only matches constant values.
Next, we augment our lefthand side's pattern with a precondition that the
constant $C
must be a power of two. Preconditions are introduced by wrapping
a pattern in the when
form:
;; Our new lefthand side, augmenting a pattern with a precondition!
(when
;; The pattern matching multiplication by a constant value.
(imul $x $C)
;; The precondition that $C must be a power of two.
(ispoweroftwo $C))
In the righthand side, we use unquoting to perform compiletime evaluation
of log2($C)
. Unquoting is done with the $(...)
form:
;; Our new righthand side, using unqouting to do compiletime evaluation of
;; constants that were matched and bound in the lefthand side!
(ishl $x $(log2 $C))
Finally, here is the general optimization putting our new lefthand and righthand sides together:
(=> (when (imul $x $C)
(ispoweroftwo $C))
(ishl $x $(log2 $C)))
Bit Widths
Similar to how Cranelift's instructions are bitwidth polymorphic, peepmatic
optimizations are also bitwidth polymorphic. Unless otherwise specified, a
pattern will match expressions manipulating i32
s just the same as expressions
manipulating i64
s, etc... An optimization that doesn't constrain its pattern's
bit widths must be valid for all bit widths:
 1
 8
 16
 32
 64
 128
To constrain an optimization to only match i32
s, for example, you can use the
bitwidth
precondition:
(=> (when (iadd $x $y)
(bitwidth $x 32)
(bitwidth $y 32))
...)
Alternatively, you can ascribe a type to an operation by putting the type inside curly brackets after the operator, like this:
(=> (when (sextend{i64} (ireduce{i32} $x))
(bitwidth $x 64))
(sshr (ishl $x 32) 32))
Implementation
Peepmatic has roughly four phases:
 Parsing
 Type Checking
 Linearization
 Automatization
(I say "roughly" because there are a couple micropasses that happen after linearization and before automatization. But those are the four main phases.)
Parsing
Parsing transforms the DSL source text into an abstract syntax tree (AST).
We use the wast
crate. It gives us nicely formatted errors with source
context, as well as some other generally nicetohave parsing infrastructure.
Relevant source files:
src/parser.rs
src/ast.rs
Type Checking
Type checking operates on the AST. It checks that types and bit widths in the optimizations are all valid. For example, it ensures that the type and bit width of an optimization's lefthand side is the same as its righthand side, because it doesn't make sense to replace an integer expression with a boolean expression.
After type checking is complete, certain AST nodes are assigned a type and bit width, that are later used in linearization and when matching and applying optimizations.
We walk the AST and gather type constraints. Every constraint is associated with
a span in the source file. We hand these constraints off to Z3. In the case that
there are type errors (i.e. Z3 returns unsat
), we get the constraints that are
in conflict with each other via z3::Solver::get_unsat_core
and report the type
errors to the user, with the source context, thanks to the constraints'
associated spans.
Using Z3 not only makes implementing type checking easier than it otherwise would be, but makes it that much easier to extend type checking with searching for counterexample inputs in the future. That is, inputs for which the RHS is not equivalent to the LHS, implying that the optimization is unsound.
Relevant source files:
src/verify.rs
Linearization
Linearization takes the AST of optimizations and converts each optimization into a linear form. The goal is to make automaton construction easier in the automatization step, as well as simplifying the language to make matching and applying optimizations easier.
Each optimization's lefthand side is converted into a sequence of
 match operation,
 path to the instruction/value/immediate to which the operation is applied, and
 expected result of the operation.
All match operations must have the expected result for the optimization to be applicable to an instruction sequence.
Each optimization's righthand side is converted into a sequence of build actions. These are commands that describe how to construct the righthand side, given that the lefthand side has been matched.
Relevant source files:
src/linearize.rs
src/linear_passes.rs
crates/runtime/src/linear.rs
Automatization
Automatization takes a set of linear optimizations and combines them into a transducer automaton. This automaton is the final, compiled peephole optimizations. The goal is to deduplicate as much as we can from all the linear optimizations, producing as compact and cachefriendly a representation as we can.
Plain automata can tell you whether it matches an input string. It can be thought of as a compact representation of a set of strings. A transducer is a type of automaton that doesn't just match input strings, but can map them to output values. It can be thought of as a compact representation of a dictionary or map. By using transducers, we deduplicate not only the prefix and suffix of the match operations, but also the righthand side build actions.
Each state in the emitted transducers is associated with a match operation and path. The transitions out of that state are over the result of the match operation. Each transition optionally accumulates some RHS build actions. By the time we reach a final state, the RHS build actions are complete and can be interpreted to apply the matched optimization.
The relevant source files for constructing the transducer automaton are:
src/automatize.rs
crates/automata/src/lib.rs
The relevant source files for the runtime that interprets the transducers and applies optimizations are:
crates/runtime/src/optimizations.rs
crates/runtime/src/optimizer.rs
References
I found these resources helpful when designing peepmatic
:

Extending tree pattern matching for application to peephole optimizations by van Oirschot

Interpreted Pattern Match Execution for MLIR by Jeff Niu

Direct Construction of Minimal Acyclic Subsequential Transducers by Mihov and Maurel

Index 1,600,000,000 Keys with Automata and Rust and the
fst
crate by Andrew Gallant
Acknowledgments
Thanks to Jubi Taneja, Dan Gohman, John Regehr, and Nuno Lopes for their input in design discussions and for sharing helpful resources!