smda
A minimalist recursive x86 / x64 disassembler library, optimized for accurate Control Flow Graph (CFG) recovery from PE / ELF binaries and arbitrary memory dumps.
The output is a collection of functions, basic blocks, and instructions with their respective edges (block-to-block, function-to-function). Optionally, references to the Windows API can be inferred via the ApiScout method.
smda-rs is a Rust port of danielplohmann/smda (Python). It powers capa-rs, the Rust port of Mandiant's capability extractor.
Features
-
Input formats: PE (32 / 64-bit), ELF (32 / 64-bit).
-
Function discovery: prologue scan (MSVC + GCC / clang
endbr64family, 0.4.1+), call-target propagation, PE exception-handler (.pdata) seeding, PE export-table $ -
Per-function output: basic blocks, in / out references, API calls (ApiScout — embedded Win7 + WinXP DBs), stack-string refs (0.4.1+), block-to-block edges, `is_exp$
-
Report-level:
oep(0.4.1+),find_function_by_offset/find_block_by_offsetlookups (0.4.1+), per-disassembly timeout viaparse_with_timeout(0.4.1+). -
Architecture: x86 / x86_64.
-
Zero-copy disassembly.
BinaryInfo<'a>borrows the input bytes directly. No mapped-image allocation, no per-instruction byte clone, noDisassemblyReport.buffer$ -
Modern Linux ELF coverage: added GCC / clang
endbr64(F3 0F 1E FA) plus the extended GCC AMD64 prologue family (48 89 5C 24 ??,48 83 EC ??,41 57 41 56). On CET-enabled binaries (most modern distros) function discovery improves dramatically — one test ELF went from 3280 → 10106 functions. MSVC samples unchanged. -
Linux exit-syscall recognition:
mov eax, 60; syscall(andexit_group/int 0x80equivalents) now end the containing function correctly. -
PE exports as candidate seeds: the export RVA list, previously only surfaced in the public report, now seeds the function-candidate scanner. Free coverage win on stripped DLLs.
-
New report fields:
report.oep(original entry point VA),function.is_exported(PE only),function.stringrefs(VAs of stack-string writes — wires up the existingInstruction::get_printable_len). -
New lookups:
report.find_function_by_offset(addr)/find_block_by_offset(addr). -
Timeout support:
Disassembler::parse_with_timeout(..., Duration)+ newError::AnalysisTimeoutfor batch processors of untrusted samples. -
Section-table abstraction. Byte access goes through
binary_info.bytes_at(va, len) -> Result<&[u8]>, which looks up the VA in a small per-binarySectionMaptable and returns a borrowed slice into the input. Replaces the old contiguous mapped image. -
Instructionslimmed down. The 0.3.x per-instructionmnemonic: String,operands: Option<String>, andbytes: String(hex) fields are gone. Use the typed iced accessors (mnemonic_enum(),op_kind(),flow_control(), …) for hot paths, orformat_mnemonic()/format_operands()/bytes_in(&binary_info)for on-demand formatting. -
Decoder still iced-x86 (no C/C++ build dep, ~2–3× faster than capstone).
-
Same security guards. All the checked-arithmetic, allocation caps, and bounds checks added in 0.3.0 are preserved — the
pe::map_binaryandelf::map_binaryrewrites kept every defensive check, just changed the return type fromVec<u8>toVec<SectionMap>. -
Rust 2024 edition, MSRV 1.95.
-
Same dependencies (
iced-x86 1,goblin 0.10,thiserror 2,itertools 0.14,hex 0.4,regex 1,sha2 0.10,serde 1,maplit 1).
Quick start
Add to your Cargo.toml:
[]
= "0.4"
Then disassemble a file:
use Disassembler;
Typed iced accessors
Each Instruction carries the fully-decoded iced_x86::Instruction (16 bytes, Copy) and exposes typed accessors. New code should prefer these over the on-demand string formatters — no allocation, no string parsing.
use Instruction;
use BinaryInfo;
use ;
Requirements
- Rust 1.95 or newer (2024 edition).
- No C/C++ toolchain required — pure Rust.
Why a Rust port?
smda-rs exists to give capa-rs and other Rust-side static-analysis tools a fast, dependency-light recursive disassembler without pulling in capstone, vivisect, or a Python runtime.
Used by
- capa-rs — static capability extractor for PE / ELF / shellcode / .NET binaries.
License
Licensed under the MIT License.
Acknowledgements
- danielplohmann/smda — original Python implementation by Daniel Plohmann and Steffen Enders.
- iced-x86 — the Rust decoder powering the disassembler backend.