Module evmil::analysis

source ·
Expand description

Functionality related to the analysis (or execution) of bytecode contracts. Using this module we can, for example, extract the control-flow graph of a legacy contract. We can also write arbitrary dataflow analyses which operate over bytecode contracts (e.g. for constant propagation).

§Examples

A minimal example is the following, which determines the instruction reachability. Here, an instruction is considered unreachable if there is no path to it through the contract’s control-flow graph, starting from the first instruction.

use evmil::analysis::find_reachable;
use evmil::bytecode::Disassemble;
use evmil::util::FromHexString;

// Convert hex string into bytes
let bytes = "0x600456fe00".from_hex_string().unwrap();
// Disassemble bytes into instructions.
let insns = bytes.disassemble();
// Compute reachability information.
let reachable = find_reachable(&insns);
// Check `INVALID` instruction (`0xfe`) is never executed.
assert_eq!(reachable,vec![true,true,false,true]);

Here, the sequence 0x600456fe00 corresponds to the following program:

.code
   push lab0
   jump
   invalid
lab0:
   stop

Thus, we can see that the invalid instruction can never be executed.

Structs§

  • The next simplest possible implementation of EvmMemory which only manages “concrete” addresses (i.e. it doesn’t perform any symbolic analysis).
  • An implementation of EvmStack which gives a concrete view of the stack. In other words, it represents the stack exactly.
  • An EvmState composed from three distinct (and potentially abstract) components: stack, memory and storage.
  • Identifies the dependency frames for each instruction in a given sequence of instructions.
  • The simplest possible implementation of EvmMemory which simply returns “unknown” for every location. In other words, it doesn’t actually analyse memory at all.
  • The simplest possible implementation of EvmStorage which simply returns “unknown” for every location. In other words, it doesn’t actually analyse storage at all.

Enums§

  • Simplest possible (abstract) word which is either a concrete word or unknown.
  • A very simple abstract word which does not support any operations (e.g. arithmetic or comparators). Thus, it typically becomes top. This is useful in some specific situations where we want to prevent the possibility of infinite ascending chains (i.e. prior to havoc analysis).

Traits§

  • Abstraction of memory within an EVM. This provides the minimal set of operations required to implement the semantics of a given bytecode instruction. For example, reading/writing to memory.
  • Abstraction of the operand stack within an EVM. This provides the minimal set of operations required to implement the semantics of a given bytecode instruction. For example, pushing / popping items from the stack.
  • Describes the state of an EVM at a given point (which could be running or terminated). In essence, this simply packages all the key components (e.g. stack, memory, storage) of the EVM state together.
  • An EvmStateSet represents a set of distinct states at a given point in a bytecode sequence. The assumption is that all states in a given set have the same pc value. The purpose of this is to collect all distinct states encountered during an execution or analysis of a bytecode sequence.
  • Abstraction of peristent storage within an EVM. This provides the minimal set of operations required to implement the semantics of a given bytecode instruction. For example, reading/writing from storage.
  • Represents the fundamental unit of computation within the EVM, namely a word. This is intentially left abstract, so that it could be reused across both concrete and abstract semantics.

Functions§

  • For a given bytecode sequence, identify the dependency frames for all instructions. A dependency frame contains an entry for each operand of the instruction, where each entry identifies a source instruction within the sequence. If there are multiple paths through the sequence to the given instruction, then there may be multiple frames for a given instruction.
  • For a given bytecode sequence, identify all reachable instructions. An instruction is reachable if there exists a path from the first instruction in the sequence to the given instruction. For example, consider this sequence:
  • For a given bytecode sequence, identify and insert havoc instructions to prevent non-termination of more precise analyses. A havoc instruction is needed anywhere the value of a given stack location depends upon itself. This can arise, for example, in loops with code of the form x := x + 1. To understand this, consider the following example:

Type Aliases§

  • A block graph is a directed graph over the basic blocks of a bytecode sequence.