Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
haybale
: Symbolic execution of LLVM IR, written in Rust
haybale
is a general-purpose symbolic execution engine written in Rust.
It operates on LLVM IR, which allows it to analyze programs written in C/C++,
Rust, Swift, or any other language which compiles to LLVM IR.
In this way, it may be compared to KLEE, as it has similar goals, except
that haybale
is written in Rust and makes some different design decisions.
That said, haybale
makes no claim of being at feature parity with KLEE.
Okay, but what is a symbolic execution engine?
A symbolic execution engine is a way of reasoning - rigorously and
mathematically - about the behavior of a function or program.
It can reason about all possible inputs to a function without literally
brute-forcing every single one.
For instance, a symbolic execution engine like haybale
can answer questions
like:
- Are there any inputs to (some function) that cause it to return 0? What are they?
- Is it possible for this loop to execute exactly 17 times?
- Can this pointer ever be NULL?
Symbolic execution engines answer these questions by converting each variable in the program or function into a mathematical expression which depends on the function or program inputs. Then they use an SMT solver to answer questions about these expressions, such as the questions listed above.
Getting started
1. Install
haybale
is on crates.io, so you can simply
add it as a dependency in your Cargo.toml
:
[]
= "0.3.1"
haybale
also depends (indirectly) on the LLVM 9 and Boolector libraries, which
must both be available on your system.
See the llvm-sys
or boolector-sys
READMEs for more details and instructions.
2. Acquire bitcode to analyze
Since haybale
operates on LLVM bitcode, you'll need some bitcode to get started.
If the program or function you want to analyze is written in C, you can generate
LLVM bitcode (*.bc
files) with clang
's -c
and -emit-llvm
flags:
For debugging purposes, you may also want LLVM text-format (*.ll
) files, which
you can generate with clang
's -S
and -emit-llvm
flags:
If the program or function you want to analyze is written in Rust, you can likewise
use rustc
's --emit=llvm-bc
and --emit=llvm-ir
flags.
Note that in order for haybale
to print source-location information (e.g.,
source filename and line number) in error messages and backtraces, the LLVM
bitcode will need to include debuginfo.
You can ensure debuginfo is included by passing the -g
flag to clang
,
clang++
, or rustc
when generating bitcode.
3. Create a Project
A haybale
Project
contains all of the code currently being analyzed, which
may be one or more LLVM modules.
To get started, simply create a Project
from a single bitcode file:
let project = from_bc_path?;
For more ways to create Project
s, including analyzing entire libraries, see
the Project
documentation.
4. Use built-in analyses
haybale
currently includes two simple built-in analyses:
get_possible_return_values_of_func()
, which describes all the possible
values a function could return for any input, and find_zero_of_func()
,
which finds a set of inputs to a function such that it returns 0
.
These analyses are provided both because they may be of some use themselves,
but also because they illustrate how to use haybale
.
For an introductory example, let's suppose foo
is the following C function:
int
We can use find_zero_of_func()
to find inputs such that foo
will return 0
:
match find_zero_of_func
Writing custom analyses
haybale
can do much more than just describe possible function return values
and find function zeroes.
In this section, we'll walk through how we could find a zero of the function
foo
above without using the built-in find_zero_of_func()
.
This will illustrate how to write a custom analysis using haybale
.
ExecutionManager
All analyses will use an ExecutionManager
to control the progress of the
symbolic execution.
In the code snippet below, we call symex_function()
to create an
ExecutionManager
which will analyze the function foo
- it will start at
the top of the function, and end when the function returns. In between, it
will also analyze any functions called by foo
, as necessary and depending
on the Config
settings.
let mut em = symex_function;
Here it was necessary to not only specify the default haybale
configuration, as we did when calling find_zero_of_func()
, but also what
"backend" we want to use.
The default BtorBackend
should be fine for most purposes.
Paths
The ExecutionManager
acts like an Iterator
over paths through the function foo
.
Each path is one possible sequence of control-flow decisions (e.g., which direction
do we take at each if
statement) leading to the function returning some value.
The function foo
in this example has two paths, one following the "true" branch and
one following the "false" branch of the if
.
Let's examine the first path through the function:
let result = em.next.expect;
In the common case, result
contains the function return value on this path,
as a Boolector BV
(bitvector) wrapped in the ReturnValue
enum.
Since we know that foo
isn't a void-typed function (and won't throw an
exception or abort), we can simply unwrap the ReturnValue
to get the BV
:
let retval = match result
result
could also be an Err
describing an Error
which was encountered
while processing the path. In this case, we could just ignore the error and
keep calling next()
to try to find paths which didn't have errors. Or we
could get information about the error like this:
...
Err => panic!,
};
This gets information about the error from the program State
, which we'll
discuss next. But for the rest of this tutorial, we'll assume that we got the
Ok
result, and at this point retval
is a BV
representing the function
return value on the first path.
States
For each path, the ExecutionManager
provides not only the final result of
the path (either aReturnValue
or an Error
), but also the final program
State
at the end of that path.
We can get immutable access to the State
with state()
, or mutable access
with mut_state()
.
let state = em.mut_state; // the final program state along this path
To test whether retval
can be equal to 0
in this State
, we can use
state.bvs_can_be_equal()
:
let zero = state.zero; // The 32-bit constant 0
if state.bvs_can_be_equal?
Getting solutions for variables
If retval
can be 0
, let's find what values of the function parameters
would cause that.
First, we'll add a constraint to the State
requiring that the return value
must be 0
:
retval._eq.assert;
and then we'll ask for solutions for each of the parameters, given this constraint:
// Get a possible solution for the first parameter.
// In this case, from looking at the text-format LLVM IR, we know the variable
// we're looking for is variable #0 in the function "foo".
let a = state.get_a_solution_for_irname?
.expect
.as_u64
.expect;
// Likewise the second parameter, which is variable #1 in "foo"
let b = state.get_a_solution_for_irname?
.expect
.as_u64
.expect;
println!;
Alternately, we could also have gotten the parameter BV
s from the ExecutionManager
like this:
let a_bv = em.param_bvs.clone;
let b_bv = em.param_bvs.clone;
let a = em.state.get_a_solution_for_bv?
.expect
.as_u64
.expect;
let b = em.state.get_a_solution_for_bv?
.expect
.as_u64
.expect;
println!;
Documentation
Full documentation for haybale
can be found here,
or of course you can generate local documentation with cargo doc --open
.
Compatibility
Currently, haybale
only supports LLVM 9. A version of haybale
supporting
LLVM 8 is available on the llvm-8
branch of this repo; it is approximately
at feature parity with haybale
0.2.1, and will likely be stuck at that
point indefinitely unless there is demand for additional backported features.
haybale
works on stable Rust, and requires Rust 1.40+.
Under the hood
haybale
is built using the Rust llvm-ir
crate and the Boolector SMT
solver (via the Rust boolector
crate).
Changelog
Version 0.3.1 (Feb 5, 2020)
- Fix some broken links in the README and docs. No functional changes.
Version 0.3.0 (Feb 5, 2020)
Solver timeouts:
- New setting
Config.solver_query_timeout
controls the maximum amount of timehaybale
will spend on a single solver query before returningError::SolverError
. This setting defaults to 300 seconds (5 minutes). The setting can also be disabled entirely, which results in the same behavior as previous versions ofhaybale
(no time limit on solver queries).
Error handling:
- The errors returned by
ExecutionManager.next()
are nowhaybale::Error
s instead ofString
s, allowing callers to more easily handle different kinds of errors different ways. To get a string representation of theError
,.to_string()
gives the short description, whileState.full_error_message_with_context()
gives the full description which previously was returned byExecutionManager.next()
. The usage example in the README has been updated accordingly. - The toplevel function
find_zero_of_func()
now returns aResult
, with the error type beingString
. - New setting
Config.squash_unsats
controls whetherError::Unsat
s are silently squashed (the default behavior, and the behavior of previous versions ofhaybale
), or returned to the user. For more details, see the docs on that setting.
Logging, error messages, backtraces, etc:
haybale
now prints source-location information (e.g., source filename and line number) in error messages and backtraces when it is available. Similarly, theHAYBALE_DUMP_PATH
environment variable now has the optionsLLVM
,SRC
, andBOTH
. For more details on all of this, seeConfig.print_source_info
.- You can also now disable printing the LLVM module name along with LLVM
location info in error messages, backtraces, path dumps, and log messages.
For more details, see
Config.print_module_name
. haybale
will now by default autodetect when C++ or Rust demangling is appropriate for theProject
, unless a different setting is chosen inConfig.demangling
.- Numeric constants representing
BV
values in log messages,HAYBALE_DUMP_VARS
dumps, etc are now all printed in hexadecimal (previously binary, or an inconsistent mix of binary and hexadecimal).
Function hooks and intrinsics:
- Built-in support for LLVM arithmetic-with-overflow intrinsics.
- Built-in support for LLVM saturating-arithmetic intrinsics.
- Built-in support for the
llvm.assume
intrinsic, with an associated settingConfig.trust_llvm_assumes
. - Built-in support for the
llvm.bswap
intrinsic with argument sizes 48 or 64 bits (previously only supported 16 or 32 bits). - Default hooks for a number of Rust standard-library functions which
always panic, such as
core::result::unwrap_failed()
. - New module
hook_utils
contains the implementations ofmemset
andmemcpy
used by the corresponding built-in hooks. These are now publically available for use in custom hooks for other functions.
Changes to data structures and traits:
- The
Location
andPathEntry
structs have been refactored to include source-location information when it is available, to be capable of indicating basic block terminators in addition to normal instructions, and to support some internal refactoring. - The
backend::BV
trait has a new required method,get_solver()
, which returns aSolverRef
of the appropriate type. (This is similar to the same method on thebackend::Memory
trait.) - Saturating-arithmetic methods (signed and unsigned addition and subtraction)
are now available on
backend::BV
, with default implementations in terms of the other trait methods. That means that these come "for free" once the required trait methods are implemented. zero_extend_to_bits()
andsign_extend_to_bits()
are also now available as trait methods onbackend::BV
, with default implementations in terms of the other trait methods. Previously they were private utility functions inhaybale
.- Many other structures have had minor changes and improvements, including some small breaking changes.
Compatibility:
- Updated
boolector
dependency to crate version 0.3.0, which requires Boolector version 3.1.0 (up from 3.0.0). - This version of
haybale
now requires Rust 1.40+, up from 1.36+ for previous versions ofhaybale
.
Version 0.2.1 (Jan 15, 2020)
- New
HAYBALE_DUMP_PATH
andHAYBALE_DUMP_VARS
environment-variable optionsHAYBALE_DUMP_PATH
: if set to1
, then on error,haybale
will print a description of the path to the error: every LLVM basic block touched from the top of the function until the error location, in order.HAYBALE_DUMP_VARS
: if set to1
, then on error,haybale
will print the latest value assigned to each variable in the function containing the error.
- New setting
Config.demangling
allows you to apply C++ or Rust demangling to function names in error messages and backtraces - Support hooking calls to inline assembly, with some limitations inherited
from
llvm-ir
(see comments onFunctionHooks::add_inline_asm_hook()
) - Built-in support for (the most common cases of) the
llvm.bswap
intrinsic - Other tiny tweaks - e.g., downgrade one panic to a warning
Version 0.2.0 (Jan 8, 2020)
- Support LLVM
extractvalue
andinsertvalue
instructions - Support LLVM
invoke
,resume
, andlandingpad
instructions, and thus C++throw
/catch
. Also provide built-in hooks for some related C++ ABI functions such as__cxa_throw()
. This support isn't perfect, particularly surrounding the matching of catch blocks to exceptions:haybale
may explore some additional paths which aren't actually valid. But all actually valid paths should be found and explored correctly. - Since functions can be called not only with the LLVM
call
instruction but also with the LLVMinvoke
instruction, function hooks now receive a&dyn IsCall
object which may represent either acall
orinvoke
instruction. haybale
now uses LLVM 9 rather than LLVM 8. See the "Compatibility" section in the README.- Improvements for
Project
s containing C++ and/or Rust code:- For the function-name arguments to
symex_function()
,get_possible_return_values_of_func()
,find_zero_of_func()
, andProject::get_func_by_name()
, you may now pass either the (mangled) function name as it appears in LLVM (as was supported previously), or the demangled function name. That is, you can pass in"foo::bar"
rather than"_ZN3foo3barE"
. - Likewise, you may add function hooks based on the demangled name of
the hooked function. See
FunctionHooks::add_cpp_demangled()
andFunctionHooks::add_rust_demangled()
. - Also,
llvm-ir
versions 0.3.3 and later contain an important bugfix for parsing LLVM bitcode generated byrustc
.haybale
0.2.0 usesllvm-ir
0.4.1.
- For the function-name arguments to
- The
ReturnValue
enum now has additional optionsThrow
, indicating an uncaught exception, andAbort
, indicating a program abort (e.g. Rust panic, or call to Cexit()
). - Relatedly,
haybale
now has built-in hooks for the Cexit()
function and for Rust panics (and for a few more LLVM intrinsics). haybale
also now contains a built-ingeneric_stub_hook
andabort_hook
which you can supply as hooks for any functions which you want to ignore the implementation of, or which always abort, respectively. See docs on thefunction_hooks
module.Config.initial_mem_watchpoints
is now aHashMap
instead of aHashSet
of pairs.
Version 0.1.3 (Jan 1, 2020)
- Memory watchpoints: specify a range of memory addresses, and get
a log message for any memory operation which reads or writes any data in
that range. See
State::add_mem_watchpoint()
. - Convenience methods on
State
for constructing constant-valuedBV
s (rather than having to use the corresponding methods onBV
and passstate.solver
):bv_from_i32()
,bv_from_u32()
,bv_from_i64()
,bv_from_u64()
,bv_from_bool()
,zero()
,one()
, andones()
. - Some internal code refactoring to prepare for 0.2.0 features
Version 0.1.2 (Dec 18, 2019)
- New method
Project::get_inner_struct_type_from_named()
which handles opaque struct types by searching the entireProject
for a definition of the given struct - Support memory reads of size 1-7 bits (in particular, reads of LLVM
i1
) - Performance optimization: during
State
initialization, global variables are now only allocated, and not initialized until first use (lazy initialization). This gives the SMT solver fewer memory writes to think about, and helps especially for largeProject
s which may contain many global variables that won't actually be used in a given analysis. - Minor bugfixes and improved error messages
Version 0.1.1 (Nov 26, 2019)
Changes to README text only; no functional changes.
Version 0.1.0 (Nov 25, 2019)
Initial release!