smda 0.3.0

Recursive x86/x64 disassembler library for control-flow recovery from memory dumps. Iced-x86-backed; zero-copy.
Documentation

smda

CI Crates.io Docs.rs License MSRV

A minimalist recursive x86 / x64 disassembler library, optimized for accurate Control Flow Graph (CFG) recovery from PE / ELF binaries and arbitrary memory dumps.

The output is a collection of functions, basic blocks, and instructions with their respective edges (block-to-block, function-to-function). Optionally, references to the Windows API can be inferred via the ApiScout method.

smda-rs is a Rust port of danielplohmann/smda (Python). It powers capa-rs, the Rust port of Mandiant's capability extractor.

What changed in 0.3.0

This is a substantial overhaul of the disassembly backend:

  • Decoder swap: capstone → iced-x86. Pure-Rust, ~2–3× faster than capstone, and gives every consumer typed Mnemonic / OpKind / Register / FlowControl enums without re-parsing strings. The old text-based output is preserved bit-for-bit via a capstone_compat_formatter so capa-rs and any other downstream that regex-matches operand strings keeps working unchanged.
  • No more C/C++ build dependency (capstone-sys is gone). Builds on Linux / macOS / Windows with stock rustup.
  • Rust 2024 edition, MSRV 1.95.
  • Lighter SHA-256. Switched from ring to sha2 for the buffer hash — drops a large C dependency for a single hash.
  • All dependencies on latest major versions (iced-x86 1, goblin 0.10, thiserror 2, itertools 0.14, hex 0.4, regex 1, serde 1).

See CHANGELOG.md for the full list of breaking changes.

Quick start

Add to your Cargo.toml:

[dependencies]
smda = "0.3"

Then disassemble a file:

use smda::Disassembler;

fn main() -> smda::Result<()> {
    // disassemble_file(path, high_accuracy, resolve_tailcalls, optional_buffer)
    let report = Disassembler::disassemble_file(
        "Sample.exe",
        false,  // high-accuracy heuristics (slower)
        false,  // tail-call resolution
        None,
    )?;

    println!("format       : {:?}", report.format);
    println!("architecture : {:?}", report.architecture);
    println!("bitness      : {}", report.bitness);
    println!("base addr    : 0x{:x}", report.base_addr);
    println!("functions    : {}", report.functions.len());

    for (addr, func) in report.get_functions()?.iter().take(5) {
        let blocks = func.get_blocks()?;
        let insns  = func.get_num_instructions()?;
        println!("  0x{:08x}  {} blocks, {} insns", addr, blocks.len(), insns);
    }
    Ok(())
}

Typed iced accessors

Every Instruction exposes both the legacy capstone-shaped fields and the fully-decoded iced instruction, so you can pick whichever interface is more ergonomic.

use smda::function::Instruction;
use iced_x86::{FlowControl, Mnemonic, OpKind};

fn classify(ins: &Instruction) {
    // Legacy fields (preserved for backward compat with capa-rs)
    println!(
        "{:08x}  {:7} {}",
        ins.offset,
        ins.mnemonic,
        ins.operands.as_deref().unwrap_or(""),
    );

    // New typed accessors — no string parsing
    if ins.is_call() {
        println!("  -> call");
    }
    if ins.is_conditional_jmp() {
        println!("  -> Jcc to 0x{:x}", ins.near_branch_target());
    }
    if ins.mnemonic_enum() == Mnemonic::Xor
        && ins.op_count() == 2
        && ins.op_kind(0) == OpKind::Register
        && ins.op_kind(1) == OpKind::Register
        && ins.op_register(0) == ins.op_register(1)
    {
        println!("  -> register clear ({:?})", ins.op_register(0));
    }
    if ins.flow_control() == FlowControl::Return {
        println!("  -> return");
    }
}

Feature coverage

  • Input formats: PE (32 / 64-bit), ELF (32 / 64-bit), raw memory dumps with optional base address.
  • Function discovery: prologue scan, call-target propagation, indirect-call analysis, jump-table recovery, tail-call analysis, alignment / NOP-gap walking, mnemonic TF-IDF confidence scoring.
  • Per-function output: basic blocks, in / out references, API calls (ApiScout-style), block-to-block edges.
  • Architecture: x86 / x86_64.

Not currently implemented (vs. upstream Python smda; planned for 0.3.1):

  • 64-bit GCC endbr64-style prologue scans.
  • Exception-handler-based candidate seeding (Python IntelInstructionEscaper §2.4.7).
  • Delphi VMT scanning.

Requirements

  • Rust 1.95 or newer (2024 edition).
  • No C/C++ toolchain required — pure Rust.

Compatibility note (for capa-rs users)

The Instruction::mnemonic and Instruction::operands strings are formatted through a configured iced IntelFormatter (capstone_compat_formatter) that matches capstone's output byte-for-byte (lowercase, 0x prefix, spaces around memory +, full memory-size annotations). Existing regex-based capa rules continue to match. New consumers should prefer the typed iced accessors instead of re-parsing strings.

Why a Rust port?

smda-rs exists to give capa-rs and other Rust-side static-analysis tools a fast, dependency-light recursive disassembler without pulling in capstone, vivisect, or a Python runtime.

Used by

  • capa-rs — static capability extractor for PE / ELF / shellcode / .NET binaries.

License

Licensed under the MIT License.

Acknowledgements

  • danielplohmann/smda — original Python implementation by Daniel Plohmann and Steffen Enders.
  • iced-x86 — the Rust decoder powering the disassembler backend.