superh
Parser for the SuperH (SH) instruction set, inspired by unarm. It currently supports the following versions:
- SH1
- SH2
- SH3
- SH4 (including FPU)
Contents
About
- Most of the parser is generated from
isa.yamlby the/generator/module. - It accepts all 216 possible SuperH instruction words without errors.
- Unrecognised encodings are returned as
Ins::Word(u16)and displayed as.word 0xXXXX. - No promises that the output is 100% correct.
- Some illegal instructions may not be parsed as illegal.
- Some instructions may not stringify correctly.
- (more, probably)
no_stdcompatible — the core crate depends only onalloc.
Performance
Tested on all 216 SH instruction words using the /fuzz/ module.
Flags: -t <threads>, -n <iterations>, --pc <hex_addr> (repeatable).
A differential test against the sh4dis Python reference is also provided:
Usage
Parsing one instruction
use ;
let pc = 0;
let options = default;
let ins = parse;
assert_eq!;
assert_eq!;
PC-relative instructions use the pc argument to compute the effective display address:
use ;
// mov.l @(0x4, pc), r0 at pc = 0x1000
// EA = 0*4 + (0x1000 & !3) + 4 = 0x1004; display offset = EA - pc = 4
let ins = parse;
assert_eq!;
Unrecognised encodings are returned as Ins::Word rather than an error:
use ;
let ins = parse;
assert_eq!;
assert_eq!;
Streaming disassembly with Parser
Parser<'a> is an iterator over Ins values. It reads two bytes at a time,
advances the program counter automatically, and handles both big-endian and
little-endian byte order.
use ;
let bytes: & = &; // mov #1, r0 / mov r2, r3
let opts = default;
let mut parser = new;
parser.set_pc;
// parser.pc() points past the last decoded instruction, so subtract 2 to get its address.
while let Some = parser.next
// 8c010000: mov #0x1, r0
// 8c010002: mov r2, r3
Register def/use analysis
Every decoded instruction exposes the set of registers it defines (writes) and uses (reads). This is useful for dataflow analysis, JIT compilers, and decompilers.
use ;
let opts = default;
let ins = parse; // add r1, r2
let defs: = ins.defs.iter.copied.collect;
let uses: = ins.uses.iter.copied.collect;
assert!; // r2 is written
assert!; // r1 is read
assert!; // r2 is also read
The AnyReg enum covers every register file:
AnyReg::Gp(Reg) // r0–r15
AnyReg::Fr(FReg) // fr0–fr15 (sh4 feature)
AnyReg::Dr(DReg) // dr0/dr2/…/dr14 (sh4 feature)
AnyReg::Fv(VecReg) // fv0/fv4/fv8/fv12 (sh4 feature)
AnyReg::Sys(SysReg) // Sr, Gbr, Vbr, Ssr, Spc, Sgr, Dbr, Pr, Mach, Macl, Fpul, Fpscr, T
SysReg::T is the condition bit. Every instruction that writes T has it in
defs(); every consumer (bt, bf, movt, addc, …) has it in uses().
Branch delay slots
SuperH branches have a delay slot — the instruction immediately after the branch always executes before the branch takes effect.
use ;
let opts = default;
assert!; // bra
assert!; // jmp @r0
assert!; // bt (no delay slot)
The FormatIns trait
FormatIns lets you customise how an instruction is rendered. Implement it on
any type that also implements core::fmt::Write.
use Write as _;
use ;
let mut formatter = MyFormatter ;
let ins = parse;
formatter.write_ins.unwrap;
println!; // "mov REG:r2, REG:r3"
Feature flags
SH versions are additive: enabling a higher version implies all lower ones.
| Feature | Enables |
|---|---|
sh1 |
SH1 instructions (always available) |
sh2 |
SH1 + SH2 |
sh3 |
SH1 + SH2 + SH3 |
sh4 |
SH1 + SH2 + SH3 + SH4 (FPU included) |
The default feature set enables all four. To build a minimal SH1-only binary:
[]
= { = "*", = false, = ["sh1"] }