smda
A minimalist recursive x86 / x64 disassembler library, optimized for accurate Control Flow Graph (CFG) recovery from PE / ELF binaries and arbitrary memory dumps.
The output is a collection of functions, basic blocks, and instructions with their respective edges (block-to-block, function-to-function). Optionally, references to the Windows API can be inferred via the ApiScout method.
smda-rs is a Rust port of danielplohmann/smda (Python). It powers capa-rs, the Rust port of Mandiant's capability extractor.
What changed in 0.4.0
0.4.0 lands the full zero-copy refactor that 0.3.0 deferred. Combined with the iced-x86 decoder swap and the security hardening that landed in 0.3.0, this is now the full Path X scope in a single major.
- Zero-copy disassembly.
BinaryInfo<'a>borrows the input bytes directly. No mapped-image allocation, no per-instruction byte clone, noDisassemblyReport.bufferclone. For a 10 MB binary with ~100k instructions, peak memory dropped from ~3× input size to ~1.05×. - Section-table abstraction. Byte access goes through
binary_info.bytes_at(va, len) -> Result<&[u8]>, which looks up the VA in a small per-binarySectionMaptable and returns a borrowed slice into the input. Replaces the old contiguous mapped image. Instructionslimmed down. The 0.3.x per-instructionmnemonic: String,operands: Option<String>, andbytes: String(hex) fields are gone. Use the typed iced accessors (mnemonic_enum(),op_kind(),flow_control(), …) for hot paths, orformat_mnemonic()/format_operands()/bytes_in(&binary_info)for on-demand formatting.- Decoder still iced-x86 (no C/C++ build dep, ~2–3× faster than capstone).
- Same security guards. All the checked-arithmetic, allocation caps, and bounds checks added in 0.3.0 are preserved — the
pe::map_binaryandelf::map_binaryrewrites kept every defensive check, just changed the return type fromVec<u8>toVec<SectionMap>. - Rust 2024 edition, MSRV 1.95.
- Same dependencies (
iced-x86 1,goblin 0.10,thiserror 2,itertools 0.14,hex 0.4,regex 1,sha2 0.10,serde 1,maplit 1).
See CHANGELOG.md for the full list of breaking changes and the migration guide. 0.3.0 is superseded by 0.4.0; consumers should migrate directly from 0.2.x to 0.4.0.
Quick start
Add to your Cargo.toml:
[]
= "0.4"
Then disassemble a file:
use Disassembler;
Typed iced accessors
Each Instruction carries the fully-decoded iced_x86::Instruction (16 bytes, Copy) and exposes typed accessors. New code should prefer these over the on-demand string formatters — no allocation, no string parsing.
use Instruction;
use BinaryInfo;
use ;
Zero-copy
0.4.0 is zero-copy in the strict sense: no copies of the input bytes are made between Disassembler::parse(&buf, …) and the returned DisassemblyReport. The only allocations during disassembly are the iced instruction Vec, the section-map table (tiny), the function CFG metadata, and on-demand formatted strings.
BinaryInfo<'a>borrows the input viaraw_data: &'a [u8].- The PE / ELF mapped image is replaced by
section_maps: Vec<SectionMap>— a small descriptor table (typically < 10 entries) that maps virtual-address ranges to file-offset ranges. - Byte access goes through
bytes_at(va, len) -> Result<&[u8]>, which does a section lookup and slices into the borrowed input. Per-byte cost is one section-table scan (linear, < 10 entries, cache-friendly). Instructionis{ offset, length, iced }— no per-instructionStringorVec<u8>storage.DecodedInsnisCopy(16 bytes).DisassemblyReport<'a>carries theBinaryInfo<'a>for downstreambytes_atlookups.
Feature coverage
- Input formats: PE (32 / 64-bit), ELF (32 / 64-bit), raw memory dumps with optional base address.
- Function discovery: prologue scan, call-target propagation, indirect-call analysis, jump-table recovery, tail-call analysis, alignment / NOP-gap walking, mnemonic TF-IDF confidence scoring.
- Per-function output: basic blocks, in / out references, API calls (ApiScout-style), block-to-block edges.
- Architecture: x86 / x86_64.
Not currently implemented (vs. upstream Python smda; planned for 0.3.1):
- 64-bit GCC
endbr64-style prologue scans. - Exception-handler-based candidate seeding (Python
IntelInstructionEscaper§2.4.7). - Delphi VMT scanning.
Requirements
- Rust 1.95 or newer (2024 edition).
- No C/C++ toolchain required — pure Rust.
Compatibility note (for capa-rs users)
The Instruction::mnemonic and Instruction::operands strings are formatted through a configured iced IntelFormatter (capstone_compat_formatter) that matches capstone's output byte-for-byte (lowercase, 0x prefix, spaces around memory +, full memory-size annotations). Existing regex-based capa rules continue to match. New consumers should prefer the typed iced accessors instead of re-parsing strings.
Why a Rust port?
smda-rs exists to give capa-rs and other Rust-side static-analysis tools a fast, dependency-light recursive disassembler without pulling in capstone, vivisect, or a Python runtime.
Used by
- capa-rs — static capability extractor for PE / ELF / shellcode / .NET binaries.
License
Licensed under the MIT License.
Acknowledgements
- danielplohmann/smda — original Python implementation by Daniel Plohmann and Steffen Enders.
- iced-x86 — the Rust decoder powering the disassembler backend.