pub enum Stmt {
Show 27 variants
Asm {
text: String,
bytes: Vec<u8>,
},
Comment(String),
Return {
value: u64,
bytes: Vec<u8>,
},
Prologue {
kind: String,
params: Option<PrologueParams>,
bytes: Vec<u8>,
},
Epilogue {
kind: String,
params: Option<EpilogueParams>,
bytes: Vec<u8>,
},
Save {
reg: String,
bytes: Vec<u8>,
},
Restore {
reg: String,
bytes: Vec<u8>,
},
IfReturn {
cond_text: String,
value_text: String,
target_addr: u64,
cmp_bytes: Vec<u8>,
cond_code: u8,
wide: bool,
},
Label {
addr: u64,
},
Goto {
target_addr: u64,
wide: bool,
},
IfGoto {
cond_text: String,
target_addr: u64,
cmp_bytes: Vec<u8>,
cond_code: u8,
wide: bool,
},
Switch {
selector: String,
cases: Vec<u64>,
default_addr: u64,
dispatch: String,
table_va: u64,
},
SehInstall {
bytes: Vec<u8>,
},
SehRestore {
bytes: Vec<u8>,
},
ReturnExpr {
text: String,
bytes: Vec<u8>,
},
ArgSpill {
arg_index: u32,
bytes: Vec<u8>,
},
Call {
name: String,
args: Vec<String>,
bytes: Vec<u8>,
direct_target: Option<u64>,
},
IfBranch {
cond_text: String,
cond_bytes: Vec<u8>,
attrs: Vec<Attribute>,
pre_body: Vec<Stmt>,
then_body: Vec<Stmt>,
else_body: Option<Vec<Stmt>>,
},
LocalSet {
slot: i64,
value: i64,
bytes: Vec<u8>,
},
LocalArith {
slot: i64,
op: String,
value: i64,
bytes: Vec<u8>,
},
LocalCompound {
dst: i64,
op: String,
src: i64,
bytes: Vec<u8>,
},
Move {
dst: String,
src: String,
bytes: Vec<u8>,
},
Inc16 {
lo: String,
hi: String,
bytes: Vec<u8>,
},
Loop {
cond_text: String,
entry_jmp_bytes: Option<Vec<u8>>,
tail_bytes: Vec<u8>,
body: Vec<Stmt>,
},
IfBlock {
cond_text: String,
cond_bytes: Vec<u8>,
then_body: Vec<Stmt>,
then_tail_jmp: Vec<u8>,
else_body: Vec<Stmt>,
},
WhileBlock {
cond_text: String,
entry_bytes: Vec<u8>,
tail_bytes: Vec<u8>,
body: Vec<Stmt>,
},
RegArith {
dst: String,
op: String,
src: String,
bytes: Vec<u8>,
},
}Expand description
A statement inside a function body.
Variants§
Asm
@asm("text") or @asm("text", [bytes]) — an instruction.
text is the human-readable assembly. bytes pins the exact
encoded bytes; when non-empty, it’s the ground truth for
recompilation and the assembler’s job is to verify that
assembling text produces matching bytes (with directive-pinned
encoding choices, when those land).
bytes may be empty: a future assembler will then derive them
from the text alone. v0 always populates bytes because we
don’t yet ship a text assembler that produces byte-identical
output for non-canonical encodings.
Comment(String)
// … line. Used by the decompiler to surface block boundaries
and direct-branch targets without committing to a structural
syntax for them yet.
Return
@return(value, [bytes]) — a recognised return-with-literal
pattern at the tail of a function. Lifted from sequences like
mov eax, N; [pop rbp;] ret or xor eax, eax; [pop rbp;] ret.
bytes carries every encoded byte of those instructions
concatenated, so the lower path just emits the bytes.
Prologue
@prologue("kind", [bytes]) — a recognised function prologue,
typically endbr64; push rbp; mov rbp, rsp; sub rsp, IMM or
a close variant. kind is a descriptive label
("std" / "std-no-cf" / "std-noframe"); bytes carries
every encoded byte for round-trip.
params carries the structured breakdown (saves list,
frame flag, sub_esp value, cf_protect) when the prologue’s
bytes round-trip through the canonical codec. Lets the
emitter render @prologue(saves: [ebx, esi, edi], frame, sub: 0x40) without the byte list. Empty for handwritten
or non-canonical prologues where bytes are the source of
truth.
Epilogue
@epilogue("kind", [bytes]) — a recognised function epilogue,
typically leave; ret or pop rbp; ret. Used at the tail of
the last block when no Stmt::Return consumed those bytes
(e.g. the return value was computed in an earlier block).
Save
@save("REG", [bytes]) — a mid-function callee-saved register
save. Pairs LIFO with a matching Stmt::Restore elsewhere in
the body; together they bracket a region where the function
borrows an extra register the prologue didn’t reserve. Bytes
are exactly the push REG encoding.
Restore
@restore("REG", [bytes]) — the matching restore for a prior
Stmt::Save. Bytes are exactly the pop REG encoding.
IfReturn
@if_return("cond", "value", [bytes]) — an early-return
pattern: a test/cmp + jcc whose taken target is a
return-shaped block elsewhere in the function. The bytes
are the original cmp/test + jcc encoding; the actual return
happens at the target block (whose bytes remain in place).
Renders as if (cond) return value; to convey the intent
even though the jcc semantically transfers control to a
shared cleanup tail.
value is the literal/expression the target block returns,
when statically known; empty when the target’s return value
can’t be folded.
Same shape as IfGoto: the jcc tail re-encodes from
the target’s implicit address (the return-block’s
position, captured at decompile time via the cmp-bytes
length + jcc rel resolution). cmp_bytes stays pinned
until the text assembler.
Fields
Label
label_XXXX: — a zero-byte marker for a jump target. The
addr is the run-time virtual address the label represents
(rendered as label_<hex>). Labels carry no bytes; they
occupy a position in the source so a Stmt::Goto or
Stmt::IfGoto elsewhere in the function can point at
them by name. Round-trip neutral.
Goto
goto label_XXXX; (or goto label_XXXX #[wide];) — an
unconditional jmp to a label somewhere in the function
body. No pinned bytes: the lower path picks the encoding
from target_addr, the cursor position, and the wide
flag:
wide=falseand the displacement fits ini8:jmp rel8(2 bytes).- otherwise:
jmp rel32(5 bytes).
The wide flag captures encoding choices the compiler
made that don’t follow the “always shortest” rule —
occasional, but real (some MSVC paths emit jmp rel32
even when jmp rel8 would fit). Editing the function so
a label moves auto-promotes wide=false → wide=true
when the displacement no longer fits in i8.
IfGoto
if (cond) goto label_XXXX; — a conditional jump folded
from cmp/test …; jcc …. The jcc tail is no longer
pinned in source: the lower path re-encodes
jcc rel8/rel32 from target_addr, cond_code, and
wide. cmp_bytes carries the cmp/test prefix (empty
when the source is a bare flag check); it stays pinned
until the text-assembler can re-encode it from
cond_text.
Editing a label so its position changes flows through to
the rebuilt binary. Editing cmp_bytes and cond_text
without keeping them consistent is the user’s job until
the assembler lands.
Switch
switch (selector) #[dispatch="…", table_va=…] { case N: goto … }
— a structured switch whose dispatch bytes are not pinned
to the source. The lower path regenerates cmp REG,MAX; ja DEFAULT; jmp dword ptr [REG*4+TABLE_VA] from the structured
fields, validating that the case/default/selector data
re-encodes to a correct dispatch sequence.
dispatch names the encoding shape (currently only
"msvc-jmp-table" is recognised). table_va is the
absolute address of the jump-table data the indirect jmp
reads — the table contents themselves still ride in a
@raw block under the appropriate data section.
Editing the source is the whole point: adding a case here,
changing default_addr, or renaming the selector all flow
through to the rebuilt binary via the lower-side encoder,
without any pinned bytes to silently invalidate.
SehInstall
@seh_install([bytes]) — MSVC’s Structured Exception
Handling frame install: mov fs:[0], esp after pushing
the handler-frame fields. Bytes are exactly the
mov fs:[0], esp encoding (7 bytes on x86-32).
SehRestore
@seh_restore([bytes]) — pops the SEH chain back to the
previously installed handler. Bytes encode
mov reg, [ebp-N]; mov fs:[0], reg (or similar pop
sequence). Pairs LIFO with a prior Stmt::SehInstall.
ReturnExpr
@return_expr("text", [bytes]) — a recognised
“compute-a-value-and-fall-through-to-the-epilogue” block whose
contents have been lifted into a single human-readable
expression. The expression text is informational; the pinned
bytes are the lower path’s source of truth, so the original
instruction stream re-emits exactly even if the expression is
edited.
ArgSpill
@arg_spill(N, [bytes]) — a recognised SysV-x64 argument
spill: mov [rbp+disp], REG_N where REG_N is the integer or
XMM register holding argument N at function entry. The slot
displacement is recoverable from the pinned bytes, so it
doesn’t appear in the directive shape.
Call
@call("name", [args], [bytes]) — a recognised direct-call
site whose preceding mov reg, … / lea reg, … instructions
have been folded into the args list. Each arg is a
human-readable rendering (string literal, integer constant,
global address, &function reference, or result for a
previous call’s return value); the pinned bytes cover both
name(args) — a function call (direct or indirect).
bytes pins the arg-setup prefix (pushes, movs, etc.).
For indirect calls (call dword ptr [imm] etc.) the
call instruction itself rides at the end of bytes
because we don’t yet re-encode arbitrary memory operands.
For direct calls (call rel32) the trailing 5 bytes
are stripped from bytes and direct_target carries the
callee’s IP. The lower path encodes call rel32 against
the current cursor + direct_target, so editing a
function’s position automatically re-resolves every
caller’s relative offset.
IfBranch
A structured cmp/test + jcc head plus its branches:
@if_branch("cond text", [cond bytes]) {
@then { …fallthrough body… }
@else { …taken body… } // optional
}else_body == None means the source-language if has no
else clause — the jcc-taken side jumps directly to whatever
code follows the @if_branch in source order. With Some,
both arms are real branches that converge somewhere later.
Bytes layout, exactly preserved on lower (in source order):
attrs["head_bytes"]if present (the cmp/test bytes that live before the intervening insns the compiler reordered between the comparison and the conditional branch),pre_bodystatement bytes (the “intervening” insns between cmp and jcc — empty for the adjacent-cmp case),cond_bytes(the jcc when there’shead_bytes; the full cmp+jcc when there isn’t),then_bodystatement bytes,else_bodystatement bytes if present.
Fields
attrs: Vec<Attribute>Free-form metadata. Recognised keys today: head_bytes
(load-bearing — see byte layout above).
LocalSet
@local_set(slot, value, [bytes]) — a recognised
mov dword/qword ptr [rbp+disp], IMM (or analogous on i386
[ebp+disp]) where the destination is a stack-frame local.
Lifts the common “initialise a local with a literal” pattern.
slot is the signed displacement from the frame pointer
(e.g. -8 for [rbp-8]); value is the immediate, signed.
LocalArith
@local_arith(slot, op, value, [bytes]) — a recognised
add/sub dword/qword ptr [rbp+disp], IMM pattern. Lifts
the loop-counter / accumulator-update idiom.
op is the arithmetic operation ("+=" or "-="); value
is the immediate, signed.
LocalCompound
@local_compound(dst, op, src, [bytes]) — a multi-instruction
pattern of the shape [rbp+dst] op= [rbp+src]. Either:
- 2-insn form:
mov reg, [rbp+src]; <op> [rbp+dst], regfor ops with a memory-destination form (add, sub, and, or, xor), - 3-insn form:
mov reg, [rbp+dst]; <op> reg, [rbp+src]; mov [rbp+dst], regfor ops without one (imul).
The pinned bytes cover the whole sequence; the lower path
re-emits them verbatim.
Move
@move("dst", "src", [bytes]) — an arch-agnostic
“dst := src” data move whose lowering is pinned by bytes.
The 6502 decompiler emits this for LDA src; STA dst pairs;
the dst and src strings are operand text from the
instruction stream (e.g. "IN,Y" and "KBD").
Round-trip: the source-language text is purely informational,
bytes is what the lower path emits.
Inc16
@inc16("lo", "hi", [bytes]) — a 16-bit increment composed
of INC lo; BNE +2; INC hi (with the BNE skipping the
high-byte INC unless the low byte just rolled over). The
canonical 6502 idiom for advancing a 16-bit pointer.
Loop
A structured loop with the test at the bottom. Canonical gcc -O0 shape:
@loop(entry_jmp=[bytes], "cond text", [tail bytes]) {
…body stmts…
}Lifted from a CFG triple where:
- a body block falls through to a tail block,
- the tail block ends with a conditional branch whose
takentarget is the body block (i.e. a back-edge),
entry_jmp_bytes is the pre-header jmp that enters the
loop at the tail (gcc’s “skip body on first iteration” idiom).
When detected, those bytes are folded into the directive so
no @asm line is left behind for them.
Lower-path byte order: entry_jmp_bytes (if any) → body
bytes → tail_bytes. The @loop itself contributes nothing
before entry_jmp_bytes — its placement in the function body
determines where the bytes land.
IfBlock
if (cond_text) #[bytes=[…]] { … } [else { … }] —
structured if/else recovered from a forward conditional
jump.
cond_bytes carries the jcc instruction itself (8 bytes
on BPF, 2–6 on x86). The body is whatever fell through
the conditional; the else body is whatever sat past
the unconditional jump at the end of the then body
(when present).
Lower order: cond_bytes → walk then_body Stmts → if
else_body is Some: then_tail_jmp (the unconditional
branch skipping the else) → walk else_body Stmts.
Round-trip preservation: every encoded byte rides
somewhere in cond_bytes / then_tail_jmp / a nested
@asm. Editing cond_text is purely cosmetic until an
arch-side text re-encoder lands.
Fields
WhileBlock
while (cond_text) #[bytes=[…]] { … } — a top-checked
loop. entry_bytes pins the loop-header jcc that skips
the body when the condition is false on first entry;
tail_bytes pins the unconditional jump at the end of
the body that branches back to the header.
Lower order: entry_bytes → walk body Stmts →
tail_bytes.
RegArith
dst op src; — a compound-assignment arithmetic stmt
where dst is a register name, op is a C-style
compound operator ("+=", "-=", "*=", "/=",
"%=", "|=", "&=", "^=", "<<=", ">>="), and
src is a register or immediate text.
Lifted from arch ALU instructions whose register-only
shape lets the codec round-trip via encode_arith.
On BPF that’s the 64-bit ALU class (add64, lsh64,
or64, etc.); the framework lets other arches plug in
the same way.
Round-trip: bytes rides pinned by default; the
decompile-side byte-drop clears bytes when
arch.encode_arith(dst, op, src) reproduces them, and
the lower path regenerates from the textual fields.