# thumb2-stack-size
A tool for estimating Thumb/Thumb2 binary stack usage.
Microcontroller development often requires explicit management of stacks.
Microcontrollers by and large do not have virtual addressing capabilities, thus
their stacks must be pre-allocated entirely in advance. They also mostly
feature only small quantities of RAM, thus stacks on microcontrollers are
usually small. This present a bit of a dilemma: enough stack space must be
allocated to ensure that no overruns occur, but as little memory as possible
should be wasted on stacks.
This tool can estimate how much stack space is used by a given Thumb or Thumb2 binary.
# Prerequisited
thumb2-stack-size supports only 32 bit statically linked ARM executable ELF
files at this time. The ELF files must have symbols information, debug
information is not necessary. The symbol table are used to reconstruct
functions from the input binary.
# Estimation algorithm
The input file is read and split into functions using the symbol table to
locate functions and their sizes. Each function, at this point a stream of
instructions, is then parsed instruction for instruction. If any instructions
are encountered that are not defined by the ARM ISA an error is raised and
estimation is aborted. ("Defined" includes the permanently undefined
instructions `udf` and `udf.w`).
The resulting functions are run through a virtual machine that tracks stacks
manipulations done by each instruction. Stack manipulations can be either
static (eg push, pop, reserving large stack chunks at once) or dynamic (eg
alloca). Traces can also be static or dynamic, but may also be *divergent*: if
two traces lead to the same instruction, but reach that instruction with
different stack pointer values, the function cannot be analyzed fully. The tool
will attempt a best-effort analysis and calculate a lower bound on the
functions stack requirements.
All possible traces through the function are simulated to give three
possible results for a given function:
- stack usage is fully static. In this case the result of the estimation is exact.
- stack usage is dynamic, but not provably divergent. This is often caused by
indirect branches in the code, eg by calling function pointers or methods
on trait objects.
- stack usage is provably divergent. Mutually recursive functions are a likely
cause for this result, or functions that call `alloca` in a loop.
This tool notable does not support analysis of table branches (ie the `tbb` and
`tbh` instructions). Table branches are often emitted by the Rust formatting
code. Whenever such a branch is discovered a warning is issued. The binary can
be recompile with `-C llvm-args=-min-jump-table-entries=99999` and reanalyzed;
the `-min-jump-table-entries` option is set to such a high value here that LLVM
will not allowed to emit table branches. The resulting binary is functionally
identical with the same stack properties, but likely larger.
# Output format
When run without options the tool will output all warnings, followed by a table
that contains one line for each function. The table lists
- the address of the function
- the stack requirements of that function
- whether the function is static, dynamic or divergent
- the Rust demangled name of the function, or the raw symbol name if demangling fails
Each warning is annotated with the (demangled) name of the symbol that caused
the warning and its address. The address is useful to disambiguate functions
that have the same demangled name, as is common for generic functions that are
called with different type signatures.
A small program that contains indirect branches (in this case calls through a
function pointer) could produce the following output:
```
warning: function __reset @ 8000040 cannot be statically analyzed
* indirect branches found
warning: function hcl::timer::TimerSet::tick @ 80001cc cannot be statically analyzed
* indirect branches found
warning: function timer::systick @ 8000080 cannot be statically analyzed
* calls to dynamic functions found: 80001cc,
8000040 0 dynamic __reset
8000080 48 dynamic timer::systick
8000084 128 static timer::main
8000134 136 static __hcl_entry
800013e 0 static hcl::qlist::QlistData::unlink
800015c 0 static hcl::qlist::QlistData::pop_front
8000168 0 static <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
80001cc 48 dynamic hcl::timer::TimerSet::tick
800023c 32 static hcl::timer::TimerSet::add_stub
8000272 0 static hcl::timer::invoke_indirect
8000296 16 static hcl::timer::invoke_indirect
80002b6 32 static hcl::timer::invoke_indirect
```
Additionally to the table the tool can also output the call graph of all
functions. The call graph may be used to track down indirect function calls if
the developer has knowledge about certain functions that is not available to
the estimation algorithm.
The call graph is split into newline-separated sections, one section for each
function in the input. Each function is traversed in depth-first order, all
callees of the function are listed recursively in the process. Mutually
recursive functions are also printed, but the call graph is truncated with an
ellipsis marker at the first function that appears twice in the graph.
Each line of a call graph contains the address of the called function,
beginning with the function for which the call graph is printed. For each
function the analysis result is printed as well, followed by the function name.
The same binary as above produces the following call graph:
```
8000040 0 dynamic __reset
8000080 48 dynamic timer::systick
80001cc 48 dynamic └ hcl::timer::TimerSet::tick
800015c 0 static ├ hcl::qlist::QlistData::pop_front
800013e 0 static │ └ hcl::qlist::QlistData::unlink
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
8000084 128 static timer::main
800023c 32 static └ hcl::timer::TimerSet::add_stub
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
8000134 136 static __hcl_entry
8000084 128 static └ timer::main
800023c 32 static └ hcl::timer::TimerSet::add_stub
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
800013e 0 static hcl::qlist::QlistData::unlink
800015c 0 static hcl::qlist::QlistData::pop_front
800013e 0 static └ hcl::qlist::QlistData::unlink
8000168 0 static <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
80001cc 48 dynamic hcl::timer::TimerSet::tick
800015c 0 static ├ hcl::qlist::QlistData::pop_front
800013e 0 static │ └ hcl::qlist::QlistData::unlink
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
800023c 32 static hcl::timer::TimerSet::add_stub
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
8000272 0 static hcl::timer::invoke_indirect
8000296 16 static hcl::timer::invoke_indirect
800013e 0 static └ hcl::qlist::QlistData::unlink
80002b6 32 static hcl::timer::invoke_indirect
800013e 0 static ├ hcl::qlist::QlistData::unlink
8000168 0 static └ <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
8000040 0 dynamic __reset
8000080 48 dynamic timer::systick
8000084 128 static timer::main
8000134 136 static __hcl_entry
800013e 0 static hcl::qlist::QlistData::unlink
800015c 0 static hcl::qlist::QlistData::pop_front
8000168 0 static <hcl::qlist::QlistLock<'q, A>>::unsafe_insert_before
80001cc 48 dynamic hcl::timer::TimerSet::tick
800023c 32 static hcl::timer::TimerSet::add_stub
8000272 0 static hcl::timer::invoke_indirect
8000296 16 static hcl::timer::invoke_indirect
80002b6 32 static hcl::timer::invoke_indirect
```
Note that `hcl::timer::invoke_indirect` appears multiple times, bit with different addresses.
This view may be used to deduce more information about stack usage. In the
example case it is known that `systick` calls `hcl::timer::TimerSet::tick`,
which in turn may call any of the `hcl::timer::invoke_indirect` functions. Here
`tick` uses at least 48 bytes, plus at most 32 bytes for any of the
`invoke_indirect` functions. With this we can deduce that `tick` will not
require more than 80 bytes of stack space.