Skip to main content

Crate superh

Crate superh 

Source
Expand description

§superh

Parser for the SuperH (SH) instruction set, inspired by unarm. It currently supports the following versions:

  • SH1
  • SH2
  • SH3
  • SH4 (including FPU)

§Contents

§About

  • Most of the parser is generated from isa.yaml by the /generator/ module.
  • It accepts all 216 possible SuperH instruction words without errors.
  • Unrecognised encodings are returned as Ins::Word(u16) and displayed as .word 0xXXXX.
  • No promises that the output is 100% correct.
    • Some illegal instructions may not be parsed as illegal.
    • Some instructions may not stringify correctly.
    • (more, probably)
  • no_std compatible — the core crate depends only on alloc.

§Performance

Tested on all 216 SH instruction words using the /fuzz/ module.

cargo run -p superh-fuzz --release -- parse         # exhaustive parse
cargo run -p superh-fuzz --release -- parse_random  # random instruction words
cargo run -p superh-fuzz --release -- display       # parse + stringify
cargo run -p superh-fuzz --release -- reparse       # determinism check
cargo run -p superh-fuzz --release -- defs          # def/use analysis
cargo run -p superh-fuzz --release -- uses          # use analysis
cargo run -p superh-fuzz --release -- dump          # dump all 65536 results to stdout

Flags: -t <threads>, -n <iterations>, --pc <hex_addr> (repeatable).

A differential test against the sh4dis Python reference is also provided:

pip install sh4dis
python3 fuzz/diff_sh4dis.py

§Usage

§Parsing one instruction

use superh::{parse, Ins, Options, Reg};

let pc = 0;
let options = Options::default();
let ins = parse(0x6323, pc, &options);
assert_eq!(
    ins,
    Ins::MovRmRn {
        rn: Reg::R3,
        rm: Reg::R2,
    }
);
assert_eq!(ins.display(&options).to_string(), "mov r2, r3");

PC-relative instructions use the pc argument to compute the effective display address:

use superh::{parse, Ins, Options};

// mov.l @(0x4, pc), r0  at pc = 0x1000
// EA = 0*4 + (0x1000 & !3) + 4 = 0x1004; display offset = EA - pc = 4
let ins = parse(0xd000, 0x1000, &Options::default());
assert_eq!(ins.display(&Options::default()).to_string(), "mov.l @(0x4, pc), r0");

Unrecognised encodings are returned as Ins::Word rather than an error:

use superh::{parse, Ins, Options};

let ins = parse(0xffff, 0, &Options::default());
assert_eq!(ins, Ins::Word(0xffff));
assert_eq!(ins.display(&Options::default()).to_string(), ".word 0xffff");

§Streaming disassembly with Parser

Parser<'a> is an iterator over Ins values. It reads two bytes at a time, advances the program counter automatically, and handles both big-endian and little-endian byte order.

use superh::{Options, ParseEndian, ParseMode, Parser};

let bytes: &[u8] = &[0xe0, 0x01, 0x63, 0x23]; // mov #1, r0 / mov r2, r3
let opts = Options::default();
let mut parser = Parser::new(bytes, ParseMode::Instruction, ParseEndian::Big, opts.clone());
parser.set_pc(0x8c01_0000);

// parser.pc() points past the last decoded instruction, so subtract 2 to get its address.
while let Some(ins) = parser.next() {
    println!("{:08x}: {}", parser.pc() - 2, ins.display(&opts));
}
// 8c010000: mov #0x1, r0
// 8c010002: mov r2, r3

§Register def/use analysis

Every decoded instruction exposes the set of registers it defines (writes) and uses (reads). This is useful for dataflow analysis, JIT compilers, and decompilers.

use superh::{parse, AnyReg, Options, Reg};

let opts = Options::default();
let ins = parse(0x321c, 0, &opts); // add r1, r2

let defs: Vec<AnyReg> = ins.defs().iter().copied().collect();
let uses: Vec<AnyReg> = ins.uses().iter().copied().collect();

assert!(defs.contains(&AnyReg::Gp(Reg::R2))); // r2 is written
assert!(uses.contains(&AnyReg::Gp(Reg::R1))); // r1 is read
assert!(uses.contains(&AnyReg::Gp(Reg::R2))); // r2 is also read

The AnyReg enum covers every register file:

AnyReg::Gp(Reg)       // r0–r15
AnyReg::Fr(FReg)      // fr0–fr15  (sh4 feature)
AnyReg::Dr(DReg)      // dr0/dr2/…/dr14  (sh4 feature)
AnyReg::Fv(VecReg)    // fv0/fv4/fv8/fv12  (sh4 feature)
AnyReg::Sys(SysReg)   // Sr, Gbr, Vbr, Ssr, Spc, Sgr, Dbr, Pr, Mach, Macl, Fpul, Fpscr, T

SysReg::T is the condition bit. Every instruction that writes T has it in defs(); every consumer (bt, bf, movt, addc, …) has it in uses().

§Branch delay slots

SuperH branches have a delay slot — the instruction immediately after the branch always executes before the branch takes effect.

use superh::{parse, Options};

let opts = Options::default();
assert!(parse(0xa000, 0, &opts).is_delayed_branch()); // bra
assert!(parse(0x402b, 0, &opts).is_delayed_branch()); // jmp @r0
assert!(!parse(0x8900, 0, &opts).is_delayed_branch()); // bt (no delay slot)

§The FormatIns trait

FormatIns lets you customise how an instruction is rendered. Implement it on any type that also implements core::fmt::Write.

use std::fmt::Write as _;
use superh::{parse, FormatIns, Options, Reg};

pub struct MyFormatter {
    buf: String,
    options: Options,
}

impl std::fmt::Write for MyFormatter {
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        self.buf.push_str(s);
        Ok(())
    }
}

impl FormatIns for MyFormatter {
    fn options(&self) -> &Options {
        &self.options
    }

    // Override register formatting
    fn write_reg(&mut self, reg: Reg) -> core::fmt::Result {
        self.write_str("REG:")?;
        self.write_str(reg.name())
    }
}

let mut formatter = MyFormatter { buf: String::new(), options: Options::default() };
let ins = parse(0x6323, 0, &formatter.options);
formatter.write_ins(&ins).unwrap();
println!("{}", formatter.buf); // "mov REG:r2, REG:r3"

§Feature flags

SH versions are additive: enabling a higher version implies all lower ones.

FeatureEnables
sh1SH1 instructions (always available)
sh2SH1 + SH2
sh3SH1 + SH2 + SH3
sh4SH1 + SH2 + SH3 + SH4 (FPU included)

The default feature set enables all four. To build a minimal SH1-only binary:

[dependencies]
superh = { version = "*", default-features = false, features = ["sh1"] }

Structs§

BranchTarget
The resolved absolute address of a direct (PC-relative) branch instruction.
DefsUses
Set of registers that an instruction either defines or uses, see crate::Ins::defs and crate::Ins::uses. Each register appears at most once, even when the same register fills two operand slots (e.g. mov.b r0, @(disp, r0)).
DefsUsesIntoIter
DisplayIns
Formatter
Options
Runtime display/decode options for the disassembler.
Parser
A streaming SuperH disassembler that implements Iterator<Item = Ins>.
StringFormatter
Versions
A bitmask set of Version values, used for runtime instruction filtering.

Enums§

AnyReg
A register from any register file — GP, FPU, or system.
DReg
FReg
Ins
A decoded SuperH instruction.
ParseEndian
Byte order for reading instruction words.
ParseMode
Whether the Parser should decode instructions or raw data.
Reg
SysReg
Identifies a system / control / multiply-accumulate register.
VecReg
Version
The SuperH version to decode for. Controls which instructions are recognised at runtime.

Traits§

FormatIns
FormatValue

Functions§

parse
Decode a single 16-bit SuperH instruction word.
parse_with_discriminant
Re-parse ins as the variant identified by discriminant.