Skip to main content

Crate ud_arch_x86

Crate ud_arch_x86 

Source
Expand description

x86 architecture backend.

Phase 1 scope: decode an x86 byte sequence into structured instructions (via iced_x86), and provide two distinct emission paths:

  • emit_preserved — concatenate each instruction’s original bytes captured at decode time. This is byte-identical by construction and is what the round-trip contract is built on.
  • reencode_via_iced — feed the structured Instructions back through BlockEncoder. This is not byte-identical for all real inputs: iced canonicalizes redundant prefixes (e.g. drops a 66 data16 override on a NOP that doesn’t need it), so for compiler- emitted alignment NOPs and .plt padding the bytes will differ. Useful for “I edited an instruction” workflows in later phases, not for round-trip.

16- and 32-bit modes are exposed through Bitness and the same API; the round-trip property is identical.

Structs§

CallSite
One detected direct-call site, including the index range of the instructions that should be folded into the resulting @call.
DecodedInsn
A single decoded instruction together with the exact bytes it occupied in the source buffer.
ExprRenderCtx
Read-only context for rendering ValueExpr.
Instruction
A 16/32/64-bit x86 instruction. Created by Decoder, by CodeAssembler or by Instruction::with*() methods.
InstructionInfoFactory
Creates InstructionInfos.
LiftedEpilogue
One recognised SysV-x64 epilogue at the tail of a function or shared between branches.
LiftedIfBranchHead
One recognised cmp/test + jcc pair at the tail of a basic block, suitable for lifting into a structured if/else directive.
LiftedPrologue
One recognised SysV-x64 prologue at the entry of a function.
LiftedReturn
One recognised return-with-literal pattern at the tail of a function: how many trailing instructions matched, and the literal integer value the function returns.
LiftedValueBlock
Lifted value-block: the expression sitting in EAX/RAX after the block plus the bytes the lift consumed.
PostCallSpill
One recognised post-call result spill: the instruction(s) that move the call’s return value into a stack slot.
ProfileInputs
Compiler-profile inputs the default-prologue computation uses. Today these are all derived from the function’s body at decompile time (callee-saved registers it writes, stack it reserves, args it expects) and from the function’s abi attribute at parse time.
StructuredEpilogue
Structured epilogue, the mirror of StructuredPrologue.
StructuredPrologue
Structured prologue. Every field defaults to “absent” — an empty saves list + frame: false + sub_esp: 0 + cf_protect: false means an empty prologue, which won’t match any real function entry.
UsedRegister
A register used by an instruction
X86Codec
One codec per bitness. Cheap to construct, no state.

Enums§

ArgValue
Per-arg value computed by the analyzer. The renderer (the decompiler) turns this into a human-readable string using per-binary context (.rodata strings, function names, etc.).
AssembleError
Errors raised by assemble_intel.
Bitness
Bitness of an x86 decode/encode pass.
CodeSize
The code size (16/32/64) that was used when an instruction was decoded
CodecBits
Bit-width assumed by the codec. 32-bit and 64-bit have slightly different encodings for the frame setup (REX prefix) and the endbr instruction (endbr32 vs endbr64).
Error
Errors produced by decode / encode / round-trip helpers.
FlowControl
Control flow
JumpEncodeError
Errors emitted by encode_jmp.
LiftError
Errors specific to the lifter.
MemorySize
Size of a memory reference
Mnemonic
Mnemonic
OpAccess
Operand, register and memory access
OpKind
Instruction operand kind
Register
A register
SwitchEncodeError
Errors emitted by encode_msvc_jmp_table_dispatch.
ValueExpr
Tiny expression IR. See module docs for what’s intentionally not represented.
VerifyAsm
Result of verify_intel_text.

Functions§

arg_spill_index
If insn is an “argument spill” — mov [rbp+disp], REG (or movss/movsd [rbp+disp], xmm) where REG is one of the SysV x86-64 argument-passing registers — return that argument’s 0-based index.
assemble_intel
Parse text as a single x86 instruction in canonical Intel syntax and encode it to bytes at rip. Returns the encoded bytes when the form is recognised, or AssembleError::Unsupported when it isn’t.
compute_sp_delta_table
Compute SP delta at every instruction in insns. The first instruction sees delta = 0 (function entry: SP points at the return address). Subsequent deltas accumulate stack effects from push / pop / enter / leave (via iced’s built-in stack_pointer_increment) plus the arithmetic forms sub esp/rsp, IMM / add esp/rsp, IMM which iced doesn’t model.
decode
Decode bytes as a contiguous x86 instruction stream starting at virtual address rip. Captures each instruction’s exact bytes for later byte-faithful re-emission.
decode_epilogue
Decode an epilogue. Same Option-fallback convention as decode_prologue.
decode_prologue
Decode a byte sequence as a prologue. Returns None when the bytes don’t match the recognised templates (handwritten prologue, patched code, etc.) — caller falls back to the opaque byte list.
decode_tolerant
Like decode but tolerates a decoder failure mid-stream: on hitting an invalid byte the walk stops and returns every instruction successfully decoded up to that point plus the failure offset. Used by function-discovery passes that scan past data-in-code regions (e.g. jump-table embedded inside a .text section).
default_epilogue
Compute the canonical-default epilogue paired with default_prologue. Same MSVC-style choices:
default_prologue
Compute the canonical-default prologue for a function with the given profile inputs. Mirrors what MSVC’s x86 codegen would emit for a function that:
detect_post_call_spill
If the instructions starting at after_idx form a recognised “spill the call’s return value to a local stack slot” sequence, return the displacement of that slot.
direct_call_target
If insn is a direct (relative) call whose target is statically known, return that target’s virtual address. Indirect calls (call rax, call [rip+…]) and non-call instructions return None.
direct_lea_rip_target
If insn is a lea reg, [rip+disp] (compiler-typical “load address of a global / string-constant”), return the absolute virtual address of the target. Returns None for non-leas, or for leas with a non-RIP base or a non-trivial index.
direct_unconditional_branch_target
Like direct_call_target, but for unconditional direct branches (jmp rel32 / jmp short rel8). Useful for spotting tail calls when the target lives in another discovered function.
emit_preserved
Re-emit a decoded instruction stream using each instruction’s preserved original bytes. Byte-identical by construction.
encode_call_rel32
Encode call rel32 from source_ip to target — 5 bytes (0xe8 + i32). Direct near calls on x86 are always rel32 in 32- and 64-bit modes, so there’s no narrow/wide choice here.
encode_cmp_or_test
Encode a cmp/test operand text (no jcc, no semicolon, no leading mnemonic) to bytes. Returns None for unrecognised shapes.
encode_epilogue
Encode an epilogue. Mirror of decode_epilogue.
encode_head_from_cond_text
Convenience: pull the head out of a full cond text and encode it.
encode_jcc
Encode a conditional jcc from source_ip (the address of the jcc instruction itself) to target.
encode_jmp
Encode an unconditional jmp from source_ip (the address of the jmp instruction itself) to target.
encode_msvc_jmp_table_dispatch
Encode an MSVC-style switch dispatch sequence:
encode_prologue
Encode a structured prologue back to bytes. Mirror of decode_prologue. The encoder picks the canonical encoding for each piece — small sub esp, IMM (≤127) uses the 3-byte form, larger values the 6-byte form.
encoded_jcc_size
Pre-computed byte size of encode_jcc’s output.
encoded_jmp_size
Pre-computed byte size of encode_jmp’s output.
epilogue_roundtrips
format_intel
Format insn as Intel-syntax assembly text, suitable for embedding inside an @asm("...") directive in a .ud file.
identify_call_sites
Walk insns forward, returning every direct call site whose arg-setup we could resolve.
is_function_terminator
Heuristic: does this instruction terminate a function? Returns, unconditional branches (direct or indirect), and the rare hardware traps (int, ud2, etc.) all qualify. Conditional branches don’t — they can still flow into later instructions in the same function.
jcc_cond_code_from_bytes
Extract the jcc condition code (0..=15) from an already- decoded jcc’s opcode bytes. Returns None when bytes isn’t a recognised jcc encoding.
jcc_cond_code_from_name
Inverse of jcc_cond_name.
jcc_cond_name
Symbolic name for a jcc condition code, lowercase.
lift_function
Build a Function CFG from a contiguous stream of decoded x86 instructions starting at entry_addr.
match_local_arith_immediate
If insn is add/sub dword/qword ptr [rbp/ebp+disp], IMM — a stack-frame local being incremented or decremented by a literal — return (slot, op, value) where op is "+=" for add and "-=" for sub.
match_local_compound
If the instruction window starts with a recognised compound stack-slot pattern ([rbp+dst] op= [rbp+src]), return (consumed, dst, op, src).
match_local_set_immediate
If insn is a mov [rbp/ebp+disp], IMM — a stack-frame local being initialised or assigned a literal — return the signed displacement and the immediate value. The displacement honours 32-bit addressing mode (so [ebp-0x8] round-trips as -8, not 0xffff_fff8).
prologue_roundtrips
Verify a structured-form’s round-trip: decode → encode → compare. Returns true when the structured form encodes to exactly the original bytes, meaning the emitter can drop the explicit byte list.
reencode_via_iced
Re-encode the structured instructions through iced’s BlockEncoder.
register
Register the x86 codec factory with ud_arch_codec::registry.
rename_ebp_slot
Rename a frame-pointer memory operand to its source-language slot name. [ebp+N] becomes arg_<hex> and [ebp-N] becomes var_<hex> — the offset (always positive) is the hex part of the name, matching the Ghidra/IDA convention. Bare [ebp] (offset 0) renders as var_0. Anything else (indexed addressing, non-EBP bases, non-memory operands, …) returns None so the caller can fall back to the raw text.
rename_operand_if_slot
Apply rename_ebp_slot when it matches; otherwise return the input unchanged. Useful as a one-call helper for places that want to rename operands without branching on the result.
rename_operand_in_ctx
Apply rename_operand_with_ctx (which handles both [ebp+N] and [esp+N] with the supplied SP delta) and return the renamed text — or the original input unchanged when no rename rule matches.
rename_operand_with_ctx
Rename a memory operand to its source-language slot name — handling both frame-pointer ([ebp+N]) and stack-pointer ([esp+N], with an optional SP delta context) forms.
render_cond_source
Render a cmp / test + jcc pair as a C-style relational expression evaluated against the source-language if’s body.
roundtrip_bytes
Decode bytes and re-emit via emit_preserved; verify the result equals the input. This is the format-agnostic round-trip property for the x86 backend, and it must hold for every byte sequence we claim to support.
signed_memory_displacement
Read the memory operand’s displacement as a signed 64-bit value, honouring the addressing mode’s actual width.
sp_change_for
Per-instruction stack-pointer change. Uses iced’s intrinsic stack_pointer_increment for push / pop / call / ret / enter / leave; falls back to operand inspection for sub esp, IMM and add esp, IMM which iced doesn’t compute.
try_lift_epilogue_pattern
Try to recognize the trailing instructions of insns as a stack- frame-tearing-down epilogue:
try_lift_if_branch_head
Try to recognise the trailing two instructions of insns as a cmp (or test) followed by a direct conditional jump.
try_lift_prologue_pattern
Try to recognize the leading instructions of insns as a canonical SysV-x64 prologue at function entry.
try_lift_return_pattern
Try to recognize the trailing instructions of insns as a return with a known integer literal: the canonical SysV-x64 epilogue patterns gcc emits at -O0.
try_lift_return_via_jmp
Try to recognize the trailing instructions of insns as a “return with literal, then jump to a shared epilogue” pattern.
try_lift_value_block
Try to lift the entire instruction sequence as a value-producing block whose final state is “EAX = some expression.”
verify_intel_text
Decode bytes as a single x86 instruction at rip, format it via format_intel, and compare against text (after a light normalization that ignores case and folds whitespace runs).

Type Aliases§

Result