thumb2-stack-size 0.1.1

# thumb2-stack-size

A tool for estimating Thumb/Thumb2 binary stack usage.

Microcontroller development often requires explicit management of stacks.
Microcontrollers by and large do not have virtual addressing capabilities, thus
their stacks must be pre-allocated entirely in advance. They also mostly
feature only small quantities of RAM, thus stacks on microcontrollers are
usually small. This present a bit of a dilemma: enough stack space must be
allocated to ensure that no overruns occur, but as little memory as possible
should be wasted on stacks.

This tool can estimate how much stack space is used by a given Thumb or Thumb2 binary.

# Prerequisited

thumb2-stack-size supports only 32 bit statically linked ARM executable ELF
files at this time. The ELF files must have symbols information, debug
information is not necessary. The symbol table are used to reconstruct
functions from the input binary.

# Estimation algorithm

The input file is read and split into functions using the symbol table to
locate functions and their sizes. Each function, at this point a stream of
instructions, is then parsed instruction for instruction. If any instructions
are encountered that are not defined by the ARM ISA an error is raised and
estimation is aborted. ("Defined" includes the permanently undefined
instructions `udf` and `udf.w`).

The resulting functions are run through a virtual machine that tracks stacks
manipulations done by each instruction. Stack manipulations can be either
static (eg push, pop, reserving large stack chunks at once) or dynamic (eg
alloca). Traces can also be static or dynamic, but may also be *divergent*: if
two traces lead to the same instruction, but reach that instruction with
different stack pointer values, the function cannot be analyzed fully. The tool
will attempt a best-effort analysis and calculate a lower bound on the
functions stack requirements.

All possible traces through the function are simulated to give three
possible results for a given function:

  - stack usage is fully static. In this case the result of the estimation is exact.
  - stack usage is dynamic, but not provably divergent. This is often caused by
    indirect branches in the code, eg by calling function pointers or methods
    on trait objects.
  - stack usage is provably divergent. Mutually recursive functions are a likely
    cause for this result, or functions that call `alloca` in a loop.

This tool notable does not support analysis of table branches (ie the `tbb` and
`tbh` instructions). Table branches are often emitted by the Rust formatting
code. Whenever such a branch is discovered a warning is issued. The binary can
be recompile with `-C llvm-args=-min-jump-table-entries=99999` and reanalyzed;
the `-min-jump-table-entries` option is set to such a high value here that LLVM
will not allowed to emit table branches. The resulting binary is functionally
identical with the same stack properties, but likely larger.

# Output format

When run without options the tool will output all warnings, followed by a table
that contains one line for each function. The table lists

  - the address of the function
  - the stack requirements of that function
  - whether the function is static, dynamic or divergent
  - the Rust demangled name of the function, or the raw symbol name if demangling fails

Each warning is annotated with the (demangled) name of the symbol that caused
the warning and its address. The address is useful to disambiguate functions
that have the same demangled name, as is common for generic functions that are
called with different type signatures.

A small program that contains indirect branches (in this case calls through a
function pointer) could produce the following output:

```
warning: function __reset @ 8000040 cannot be statically analyzed
 * indirect branches found
warning: function hcl::timer::TimerSet::tick @ 80001cc cannot be statically analyzed
 * indirect branches found
warning: function timer::systick @ 8000080 cannot be statically analyzed
 * calls to dynamic functions found: 80001cc,

 8000040  0 dynamic            __reset
 8000080  48 dynamic           timer::systick
 8000084  128 static           timer::main
 8000134  136 static           __hcl_entry
 800013e  0 static             hcl::qlist::QlistData::unlink
 800015c  0 static             hcl::qlist::QlistData::pop_front
 8000168  0 static             <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
 80001cc  48 dynamic           hcl::timer::TimerSet::tick
 800023c  32 static            hcl::timer::TimerSet::add_stub
 8000272  0 static             hcl::timer::invoke_indirect
 8000296  16 static            hcl::timer::invoke_indirect
 80002b6  32 static            hcl::timer::invoke_indirect
```

Additionally to the table the tool can also output the call graph of all
functions. The call graph may be used to track down indirect function calls if
the developer has knowledge about certain functions that is not available to
the estimation algorithm.

The call graph is split into newline-separated sections, one section for each
function in the input. Each function is traversed in depth-first order, all
callees of the function are listed recursively in the process. Mutually
recursive functions are also printed, but the call graph is truncated with an
ellipsis marker at the first function that appears twice in the graph.

Each line of a call graph contains the address of the called function,
beginning with the function for which the call graph is printed. For each
function the analysis result is printed as well, followed by the function name.

The same binary as above produces the following call graph:

```
 8000040  0 dynamic             __reset

 8000080  48 dynamic            timer::systick
 80001cc  48 dynamic             └ hcl::timer::TimerSet::tick
 800015c  0 static                 ├ hcl::qlist::QlistData::pop_front
 800013e  0 static                 │ └ hcl::qlist::QlistData::unlink
 8000168  0 static                 └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 8000084  128 static            timer::main
 800023c  32 static              └ hcl::timer::TimerSet::add_stub
 8000168  0 static                 └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 8000134  136 static            __hcl_entry
 8000084  128 static             └ timer::main
 800023c  32 static                └ hcl::timer::TimerSet::add_stub
 8000168  0 static                   └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 800013e  0 static              hcl::qlist::QlistData::unlink

 800015c  0 static              hcl::qlist::QlistData::pop_front
 800013e  0 static               └ hcl::qlist::QlistData::unlink

 8000168  0 static              <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 80001cc  48 dynamic            hcl::timer::TimerSet::tick
 800015c  0 static               ├ hcl::qlist::QlistData::pop_front
 800013e  0 static               │ └ hcl::qlist::QlistData::unlink
 8000168  0 static               └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 800023c  32 static             hcl::timer::TimerSet::add_stub
 8000168  0 static               └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 8000272  0 static              hcl::timer::invoke_indirect

 8000296  16 static             hcl::timer::invoke_indirect
 800013e  0 static               └ hcl::qlist::QlistData::unlink

 80002b6  32 static             hcl::timer::invoke_indirect
 800013e  0 static               ├ hcl::qlist::QlistData::unlink
 8000168  0 static               └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before

 8000040  0 dynamic            __reset
 8000080  48 dynamic           timer::systick
 8000084  128 static           timer::main
 8000134  136 static           __hcl_entry
 800013e  0 static             hcl::qlist::QlistData::unlink
 800015c  0 static             hcl::qlist::QlistData::pop_front
 8000168  0 static             <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
 80001cc  48 dynamic           hcl::timer::TimerSet::tick
 800023c  32 static            hcl::timer::TimerSet::add_stub
 8000272  0 static             hcl::timer::invoke_indirect
 8000296  16 static            hcl::timer::invoke_indirect
 80002b6  32 static            hcl::timer::invoke_indirect
```

Note that `hcl::timer::invoke_indirect` appears multiple times, bit with different addresses.

This view may be used to deduce more information about stack usage. In the
example case it is known that `systick` calls `hcl::timer::TimerSet::tick`,
which in turn may call any of the `hcl::timer::invoke_indirect` functions. Here
`tick` uses at least 48 bytes, plus at most 32 bytes for any of the
`invoke_indirect` functions. With this we can deduce that `tick` will not
require more than 80 bytes of stack space.