Expand description
Linux eBPF + Solana SBF (sBPFv1 / sBPFv2) decoder + minimal lifter.
Every BPF “slot” is 8 bytes: a 1-byte opcode, a 1-byte
pair of dst/src nibbles, a signed le16 offset, and a
signed le32 immediate. One special instruction — lddw
(load 64-bit immediate, opcode 0x18) — takes two
consecutive slots: the first carries bits [31:0] in imm,
the second has opcode 0 and bits [63:32] in its imm.
Solana SBF (classic / sBPFv1) and Agave sBPFv2 reuse the same encoding with a handful of extra opcodes:
CALL_REG(0x8d) — register-indexed dynamic call (added in sBPFv1).UDIV/SDIV/UREM/SREMPQR variants — sBPFv2 dedicated division/remainder ops (the Linux eBPF opcodes for these slots mean different things or are absent).- Explicit sign-extends (
SXH/SXW/SXD) — sBPFv2.
The decoder is variant-gated. Opcodes we know the mnemonic
for in the configured variant emit InsnKind::* with a
readable text rendering; opcodes we don’t recognise emit
InsnKind::Unknown and the raw 8 bytes are preserved
verbatim — the round-trip property holds via byte identity
regardless of whether we can name the instruction.
References:
- Linux Kernel — eBPF Instruction Set, v6.5 docs.
- solana_rbpf — text format and SBF-specific opcode set.
Structs§
- BpfCodec
- One codec per BPF variant.
- Decoded
Insn - One decoded BPF slot.
Enums§
- Assemble
Error - Errors the assembler surfaces. Each one points at the specific shape that failed to parse / encode, so the decompile-time byte-drop pass can keep bytes pinned for the lines we can’t yet handle (typically symbolic forms).
- BpfVariant
- Variant selector. The bytes for shared opcodes are identical across variants; the variant only changes which opcodes we know the mnemonic for and which ones are legal per the runtime that consumes the bytecode.
- Error
- Errors specific to the BPF backend.
- Insn
Kind - Coarse classification — enough to drive CFG construction and
to pick the text rendering. The variant-specific mnemonic
(e.g.
udiv64vsudiv32for sBPFv2) is derived from the rawopcodebyte at format time; we don’t carry a separate mnemonic field onDecodedInsn.
Constants§
- EM_BPF
EM_BPFfrom the ELF spec (Linux eBPF).- EM_SBF
EM_SBFfrom Solana’s ELF extension (sBPFv1 / sBPFv2 — variant distinction needse_flags).- INSN_
SIZE - On-disk size of one BPF instruction slot.
Functions§
- assemble_
bpf - Assemble one BPF instruction text into its 8-byte slot encoding. The address argument is currently unused — BPF branch offsets are encoded as slot-relative i16 values taken directly from the text, so the assembler doesn’t need to know where in the function it lives.
- assemble_
bpf_ ifblock_ cond - Like [
parse_int] but accepts a leading-so callers can pass signed slot counts (used by the desymbolisedcall_internalform, whose imm may be negative when calling a function earlier in the section). Encode the jcc instruction that drives anifblock/whileblock’s framing. - assemble_
bpf_ ja - Convenience: encode
ja +offset/ja -offset. Used forthen_tail_jmp(jumps over an else body) andtail_bytes(back-edge of a while loop). Always 8 bytes. - call_
target - Compute the absolute byte-address target of a
call <imm>instruction for a local call. Theimmfield on a BPFcallis a signed slot offset relative to the next slot. - classify
- Pure re-classifier — re-derives
kindfromopcode+ the configured variant. Useful when something wants to re-walk a slice of decoded slots after the fact (matches theclassifycontract from other arch crates). - decode
- Decode
bytesas a BPF instruction stream starting at virtual addressstart. Buffer length must be a multiple ofINSN_SIZE. The decoder recogniseslddw(opcode 0x18) and emits twoDecodedInsns for it — oneLddwcarrying the 64-bit immediate, plus aLddwSecondHalfcontinuation — so each outputDecodedInsnstill has exactly 8 bytes. - desymbolize_
bpf_ text - Convert a symbolic BPF @asm text — the form
crates/ud-translate/src/decompile/bpf.rsproduces after applyinglabel_<hex>andsub_<hex>rewrites — into the numeric formassemble_bpfaccepts. - format_
insn - Render a decoded instruction as text. Matches the solana_rbpf / llvm-objdump dialect closely enough that a reader who knows BPF will recognise everything.
- jump_
target - Compute the absolute byte-address target of a relative jump. BPF offsets are in slots (8 bytes each) and apply to the instruction after this one.
- lift_
function - Lift a decoded instruction stream into a CFG.
- register
- Register the BPF codec factory with
ud_arch_codec::registry.