Expand description
§superh
Parser for the SuperH (SH) instruction set, inspired by unarm. It currently supports the following versions:
- SH1
- SH2
- SH3
- SH4 (including FPU)
§Contents
§About
- Most of the parser is generated from
isa.yamlby the/generator/module. - It accepts all 216 possible SuperH instruction words without errors.
- Unrecognised encodings are returned as
Ins::Word(u16)and displayed as.word 0xXXXX. - No promises that the output is 100% correct.
- Some illegal instructions may not be parsed as illegal.
- Some instructions may not stringify correctly.
- (more, probably)
no_stdcompatible — the core crate depends only onalloc.
§Performance
Tested on all 216 SH instruction words using the /fuzz/ module.
cargo run -p superh-fuzz --release -- parse # exhaustive parse
cargo run -p superh-fuzz --release -- parse_random # random instruction words
cargo run -p superh-fuzz --release -- display # parse + stringify
cargo run -p superh-fuzz --release -- reparse # determinism check
cargo run -p superh-fuzz --release -- defs # def/use analysis
cargo run -p superh-fuzz --release -- uses # use analysis
cargo run -p superh-fuzz --release -- dump # dump all 65536 results to stdoutFlags: -t <threads>, -n <iterations>, --pc <hex_addr> (repeatable).
A differential test against the sh4dis Python reference is also provided:
pip install sh4dis
python3 fuzz/diff_sh4dis.py§Usage
§Parsing one instruction
use superh::{parse, Ins, Options, Reg};
let pc = 0;
let options = Options::default();
let ins = parse(0x6323, pc, &options);
assert_eq!(
ins,
Ins::MovRmRn {
rn: Reg::R3,
rm: Reg::R2,
}
);
assert_eq!(ins.display(&options).to_string(), "mov r2, r3");PC-relative instructions use the pc argument to compute the effective display address:
use superh::{parse, Ins, Options};
// mov.l @(0x4, pc), r0 at pc = 0x1000
// EA = 0*4 + (0x1000 & !3) + 4 = 0x1004; display offset = EA - pc = 4
let ins = parse(0xd000, 0x1000, &Options::default());
assert_eq!(ins.display(&Options::default()).to_string(), "mov.l @(0x4, pc), r0");Unrecognised encodings are returned as Ins::Word rather than an error:
use superh::{parse, Ins, Options};
let ins = parse(0xffff, 0, &Options::default());
assert_eq!(ins, Ins::Word(0xffff));
assert_eq!(ins.display(&Options::default()).to_string(), ".word 0xffff");§Streaming disassembly with Parser
Parser<'a> is an iterator over Ins values. It reads two bytes at a time,
advances the program counter automatically, and handles both big-endian and
little-endian byte order.
use superh::{Options, ParseEndian, ParseMode, Parser};
let bytes: &[u8] = &[0xe0, 0x01, 0x63, 0x23]; // mov #1, r0 / mov r2, r3
let opts = Options::default();
let mut parser = Parser::new(bytes, ParseMode::Instruction, ParseEndian::Big, opts.clone());
parser.set_pc(0x8c01_0000);
// parser.pc() points past the last decoded instruction, so subtract 2 to get its address.
while let Some(ins) = parser.next() {
println!("{:08x}: {}", parser.pc() - 2, ins.display(&opts));
}
// 8c010000: mov #0x1, r0
// 8c010002: mov r2, r3§Register def/use analysis
Every decoded instruction exposes the set of registers it defines (writes) and uses (reads). This is useful for dataflow analysis, JIT compilers, and decompilers.
use superh::{parse, AnyReg, Options, Reg};
let opts = Options::default();
let ins = parse(0x321c, 0, &opts); // add r1, r2
let defs: Vec<AnyReg> = ins.defs().iter().copied().collect();
let uses: Vec<AnyReg> = ins.uses().iter().copied().collect();
assert!(defs.contains(&AnyReg::Gp(Reg::R2))); // r2 is written
assert!(uses.contains(&AnyReg::Gp(Reg::R1))); // r1 is read
assert!(uses.contains(&AnyReg::Gp(Reg::R2))); // r2 is also readThe AnyReg enum covers every register file:
AnyReg::Gp(Reg) // r0–r15
AnyReg::Fr(FReg) // fr0–fr15 (sh4 feature)
AnyReg::Dr(DReg) // dr0/dr2/…/dr14 (sh4 feature)
AnyReg::Fv(VecReg) // fv0/fv4/fv8/fv12 (sh4 feature)
AnyReg::Sys(SysReg) // Sr, Gbr, Vbr, Ssr, Spc, Sgr, Dbr, Pr, Mach, Macl, Fpul, Fpscr, TSysReg::T is the condition bit. Every instruction that writes T has it in
defs(); every consumer (bt, bf, movt, addc, …) has it in uses().
§Branch delay slots
SuperH branches have a delay slot — the instruction immediately after the branch always executes before the branch takes effect.
use superh::{parse, Options};
let opts = Options::default();
assert!(parse(0xa000, 0, &opts).is_delayed_branch()); // bra
assert!(parse(0x402b, 0, &opts).is_delayed_branch()); // jmp @r0
assert!(!parse(0x8900, 0, &opts).is_delayed_branch()); // bt (no delay slot)§The FormatIns trait
FormatIns lets you customise how an instruction is rendered. Implement it on
any type that also implements core::fmt::Write.
use std::fmt::Write as _;
use superh::{parse, FormatIns, Options, Reg};
pub struct MyFormatter {
buf: String,
options: Options,
}
impl std::fmt::Write for MyFormatter {
fn write_str(&mut self, s: &str) -> std::fmt::Result {
self.buf.push_str(s);
Ok(())
}
}
impl FormatIns for MyFormatter {
fn options(&self) -> &Options {
&self.options
}
// Override register formatting
fn write_reg(&mut self, reg: Reg) -> core::fmt::Result {
self.write_str("REG:")?;
self.write_str(reg.name())
}
}
let mut formatter = MyFormatter { buf: String::new(), options: Options::default() };
let ins = parse(0x6323, 0, &formatter.options);
formatter.write_ins(&ins).unwrap();
println!("{}", formatter.buf); // "mov REG:r2, REG:r3"§Feature flags
SH versions are additive: enabling a higher version implies all lower ones.
| Feature | Enables |
|---|---|
sh1 | SH1 instructions (always available) |
sh2 | SH1 + SH2 |
sh3 | SH1 + SH2 + SH3 |
sh4 | SH1 + SH2 + SH3 + SH4 (FPU included) |
The default feature set enables all four. To build a minimal SH1-only binary:
[dependencies]
superh = { version = "*", default-features = false, features = ["sh1"] }Structs§
- Branch
Target - The resolved absolute address of a direct (PC-relative) branch instruction.
- Defs
Uses - Set of registers that an instruction either defines or uses, see
crate::Ins::defsandcrate::Ins::uses. Each register appears at most once, even when the same register fills two operand slots (e.g.mov.b r0, @(disp, r0)). - Defs
Uses Into Iter - Display
Ins - Formatter
- Options
- Runtime display/decode options for the disassembler.
- Parser
- A streaming
SuperHdisassembler that implementsIterator<Item = Ins>. - String
Formatter - Versions
- A bitmask set of
Versionvalues, used for runtime instruction filtering.
Enums§
- AnyReg
- A register from any register file — GP, FPU, or system.
- DReg
- FReg
- Ins
- A decoded SuperH instruction.
- Parse
Endian - Byte order for reading instruction words.
- Parse
Mode - Whether the
Parsershould decode instructions or raw data. - Reg
- SysReg
- Identifies a system / control / multiply-accumulate register.
- VecReg
- Version
- The SuperH version to decode for. Controls which instructions are recognised at runtime.
Traits§
Functions§
- parse
- Decode a single 16-bit SuperH instruction word.
- parse_
with_ discriminant - Re-parse
insas the variant identified bydiscriminant.