Expand description
x86 architecture backend.
Phase 1 scope: decode an x86 byte sequence into structured instructions
(via iced_x86), and provide two distinct emission paths:
emit_preserved— concatenate each instruction’s original bytes captured at decode time. This is byte-identical by construction and is what the round-trip contract is built on.reencode_via_iced— feed the structuredInstructions back throughBlockEncoder. This is not byte-identical for all real inputs: iced canonicalizes redundant prefixes (e.g. drops a66data16 override on a NOP that doesn’t need it), so for compiler- emitted alignment NOPs and.pltpadding the bytes will differ. Useful for “I edited an instruction” workflows in later phases, not for round-trip.
16- and 32-bit modes are exposed through Bitness and the same
API; the round-trip property is identical.
Structs§
- Call
Site - One detected direct-call site, including the index range of the
instructions that should be folded into the resulting
@call. - Decoded
Insn - A single decoded instruction together with the exact bytes it occupied in the source buffer.
- Expr
Render Ctx - Read-only context for rendering
ValueExpr. - Instruction
- A 16/32/64-bit x86 instruction. Created by
Decoder, byCodeAssembleror byInstruction::with*()methods. - Instruction
Info Factory - Creates
InstructionInfos. - Lifted
Epilogue - One recognised SysV-x64 epilogue at the tail of a function or shared between branches.
- Lifted
IfBranch Head - One recognised
cmp/test + jccpair at the tail of a basic block, suitable for lifting into a structuredif/elsedirective. - Lifted
Prologue - One recognised SysV-x64 prologue at the entry of a function.
- Lifted
Return - One recognised return-with-literal pattern at the tail of a function: how many trailing instructions matched, and the literal integer value the function returns.
- Lifted
Value Block - Lifted value-block: the expression sitting in EAX/RAX after the block plus the bytes the lift consumed.
- Post
Call Spill - One recognised post-call result spill: the instruction(s) that move the call’s return value into a stack slot.
- Profile
Inputs - Compiler-profile inputs the default-prologue computation
uses. Today these are all derived from the function’s body
at decompile time (callee-saved registers it writes, stack
it reserves, args it expects) and from the function’s
abiattribute at parse time. - Structured
Epilogue - Structured epilogue, the mirror of
StructuredPrologue. - Structured
Prologue - Structured prologue. Every field defaults to “absent” — an
empty
saveslist +frame: false+sub_esp: 0+cf_protect: falsemeans an empty prologue, which won’t match any real function entry. - Used
Register - A register used by an instruction
- X86Codec
- One codec per bitness. Cheap to construct, no state.
Enums§
- ArgValue
- Per-arg value computed by the analyzer. The renderer (the
decompiler) turns this into a human-readable string using
per-binary context (
.rodatastrings, function names, etc.). - Assemble
Error - Errors raised by
assemble_intel. - Bitness
- Bitness of an x86 decode/encode pass.
- Code
Size - The code size (16/32/64) that was used when an instruction was decoded
- Codec
Bits - Bit-width assumed by the codec. 32-bit and 64-bit have
slightly different encodings for the frame setup (REX
prefix) and the endbr instruction (
endbr32vsendbr64). - Error
- Errors produced by decode / encode / round-trip helpers.
- Flow
Control - Control flow
- Jump
Encode Error - Errors emitted by
encode_jmp. - Lift
Error - Errors specific to the lifter.
- Memory
Size - Size of a memory reference
- Mnemonic
- Mnemonic
- OpAccess
- Operand, register and memory access
- OpKind
- Instruction operand kind
- Register
- A register
- Switch
Encode Error - Errors emitted by
encode_msvc_jmp_table_dispatch. - Value
Expr - Tiny expression IR. See module docs for what’s intentionally not represented.
- Verify
Asm - Result of
verify_intel_text.
Functions§
- arg_
spill_ index - If
insnis an “argument spill” —mov [rbp+disp], REG(ormovss/movsd [rbp+disp], xmm) whereREGis one of the SysV x86-64 argument-passing registers — return that argument’s 0-based index. - assemble_
intel - Parse
textas a single x86 instruction in canonical Intel syntax and encode it to bytes atrip. Returns the encoded bytes when the form is recognised, orAssembleError::Unsupportedwhen it isn’t. - compute_
sp_ delta_ table - Compute SP delta at every instruction in
insns. The first instruction sees delta = 0 (function entry: SP points at the return address). Subsequent deltas accumulate stack effects frompush/pop/enter/leave(via iced’s built-instack_pointer_increment) plus the arithmetic formssub esp/rsp, IMM/add esp/rsp, IMMwhich iced doesn’t model. - decode
- Decode
bytesas a contiguous x86 instruction stream starting at virtual addressrip. Captures each instruction’s exact bytes for later byte-faithful re-emission. - decode_
epilogue - Decode an epilogue. Same Option-fallback convention as
decode_prologue. - decode_
prologue - Decode a byte sequence as a prologue. Returns
Nonewhen the bytes don’t match the recognised templates (handwritten prologue, patched code, etc.) — caller falls back to the opaque byte list. - decode_
tolerant - Like
decodebut tolerates a decoder failure mid-stream: on hitting an invalid byte the walk stops and returns every instruction successfully decoded up to that point plus the failure offset. Used by function-discovery passes that scan past data-in-code regions (e.g. jump-table embedded inside a.textsection). - default_
epilogue - Compute the canonical-default epilogue paired with
default_prologue. Same MSVC-style choices: - default_
prologue - Compute the canonical-default prologue for a function with the given profile inputs. Mirrors what MSVC’s x86 codegen would emit for a function that:
- detect_
post_ call_ spill - If the instructions starting at
after_idxform a recognised “spill the call’s return value to a local stack slot” sequence, return the displacement of that slot. - direct_
call_ target - If
insnis a direct (relative)callwhose target is statically known, return that target’s virtual address. Indirect calls (call rax,call [rip+…]) and non-call instructions returnNone. - direct_
lea_ rip_ target - If
insnis alea reg, [rip+disp](compiler-typical “load address of a global / string-constant”), return the absolute virtual address of the target. ReturnsNonefor non-leas, or forleas with a non-RIP base or a non-trivial index. - direct_
unconditional_ branch_ target - Like
direct_call_target, but for unconditional direct branches (jmp rel32/jmp short rel8). Useful for spotting tail calls when the target lives in another discovered function. - emit_
preserved - Re-emit a decoded instruction stream using each instruction’s preserved original bytes. Byte-identical by construction.
- encode_
call_ rel32 - Encode
call rel32fromsource_iptotarget— 5 bytes (0xe8+ i32). Direct near calls on x86 are always rel32 in 32- and 64-bit modes, so there’s no narrow/wide choice here. - encode_
cmp_ or_ test - Encode a
cmp/testoperand text (no jcc, no semicolon, no leading mnemonic) to bytes. ReturnsNonefor unrecognised shapes. - encode_
epilogue - Encode an epilogue. Mirror of
decode_epilogue. - encode_
head_ from_ cond_ text - Convenience: pull the head out of a full cond text and encode it.
- encode_
jcc - Encode a conditional
jccfromsource_ip(the address of the jcc instruction itself) totarget. - encode_
jmp - Encode an unconditional
jmpfromsource_ip(the address of the jmp instruction itself) totarget. - encode_
msvc_ jmp_ table_ dispatch - Encode an MSVC-style switch dispatch sequence:
- encode_
prologue - Encode a structured prologue back to bytes. Mirror of
decode_prologue. The encoder picks the canonical encoding for each piece — smallsub esp, IMM(≤127) uses the 3-byte form, larger values the 6-byte form. - encoded_
jcc_ size - Pre-computed byte size of
encode_jcc’s output. - encoded_
jmp_ size - Pre-computed byte size of
encode_jmp’s output. - epilogue_
roundtrips - format_
intel - Format
insnas Intel-syntax assembly text, suitable for embedding inside an@asm("...")directive in a.udfile. - identify_
call_ sites - Walk
insnsforward, returning every direct call site whose arg-setup we could resolve. - is_
function_ terminator - Heuristic: does this instruction terminate a function?
Returns, unconditional branches (direct or indirect), and the
rare hardware traps (
int,ud2, etc.) all qualify. Conditional branches don’t — they can still flow into later instructions in the same function. - jcc_
cond_ code_ from_ bytes - Extract the jcc condition code (0..=15) from an already-
decoded jcc’s opcode bytes. Returns
Nonewhenbytesisn’t a recognised jcc encoding. - jcc_
cond_ code_ from_ name - Inverse of
jcc_cond_name. - jcc_
cond_ name - Symbolic name for a jcc condition code, lowercase.
- lift_
function - Build a
FunctionCFG from a contiguous stream of decoded x86 instructions starting atentry_addr. - match_
local_ arith_ immediate - If
insnisadd/sub dword/qword ptr [rbp/ebp+disp], IMM— a stack-frame local being incremented or decremented by a literal — return(slot, op, value)whereopis"+="foraddand"-="forsub. - match_
local_ compound - If the instruction window starts with a recognised compound
stack-slot pattern (
[rbp+dst] op= [rbp+src]), return(consumed, dst, op, src). - match_
local_ set_ immediate - If
insnis amov [rbp/ebp+disp], IMM— a stack-frame local being initialised or assigned a literal — return the signed displacement and the immediate value. The displacement honours 32-bit addressing mode (so[ebp-0x8]round-trips as-8, not0xffff_fff8). - prologue_
roundtrips - Verify a structured-form’s round-trip: decode → encode →
compare. Returns
truewhen the structured form encodes to exactly the original bytes, meaning the emitter can drop the explicit byte list. - reencode_
via_ iced - Re-encode the structured instructions through iced’s
BlockEncoder. - register
- Register the x86 codec factory with
ud_arch_codec::registry. - rename_
ebp_ slot - Rename a frame-pointer memory operand to its source-language slot
name.
[ebp+N]becomesarg_<hex>and[ebp-N]becomesvar_<hex>— the offset (always positive) is the hex part of the name, matching the Ghidra/IDA convention. Bare[ebp](offset 0) renders asvar_0. Anything else (indexed addressing, non-EBP bases, non-memory operands, …) returnsNoneso the caller can fall back to the raw text. - rename_
operand_ if_ slot - Apply
rename_ebp_slotwhen it matches; otherwise return the input unchanged. Useful as a one-call helper for places that want to rename operands without branching on the result. - rename_
operand_ in_ ctx - Apply
rename_operand_with_ctx(which handles both[ebp+N]and[esp+N]with the supplied SP delta) and return the renamed text — or the original input unchanged when no rename rule matches. - rename_
operand_ with_ ctx - Rename a memory operand to its source-language slot name —
handling both frame-pointer (
[ebp+N]) and stack-pointer ([esp+N], with an optional SP delta context) forms. - render_
cond_ source - Render a
cmp/test+jccpair as a C-style relational expression evaluated against the source-languageif’s body. - roundtrip_
bytes - Decode
bytesand re-emit viaemit_preserved; verify the result equals the input. This is the format-agnostic round-trip property for the x86 backend, and it must hold for every byte sequence we claim to support. - signed_
memory_ displacement - Read the memory operand’s displacement as a signed 64-bit value, honouring the addressing mode’s actual width.
- sp_
change_ for - Per-instruction stack-pointer change. Uses iced’s intrinsic
stack_pointer_incrementfor push / pop / call / ret / enter / leave; falls back to operand inspection forsub esp, IMMandadd esp, IMMwhich iced doesn’t compute. - try_
lift_ epilogue_ pattern - Try to recognize the trailing instructions of
insnsas a stack- frame-tearing-down epilogue: - try_
lift_ if_ branch_ head - Try to recognise the trailing two instructions of
insnsas acmp(ortest) followed by a direct conditional jump. - try_
lift_ prologue_ pattern - Try to recognize the leading instructions of
insnsas a canonical SysV-x64 prologue at function entry. - try_
lift_ return_ pattern - Try to recognize the trailing instructions of
insnsas a return with a known integer literal: the canonical SysV-x64 epilogue patterns gcc emits at-O0. - try_
lift_ return_ via_ jmp - Try to recognize the trailing instructions of
insnsas a “return with literal, then jump to a shared epilogue” pattern. - try_
lift_ value_ block - Try to lift the entire instruction sequence as a value-producing block whose final state is “EAX = some expression.”
- verify_
intel_ text - Decode
bytesas a single x86 instruction atrip, format it viaformat_intel, and compare againsttext(after a light normalization that ignores case and folds whitespace runs).