chipi 0.1.0

A declarative instruction set decoder and disassembler generator
Documentation

chipi

A declarative instruction decoder generator using a custom DSL. Define your CPUs instruction encoding in a .chipi file, and chipi generates a decoder and disassembler for you. Seemless interaction with Rust types.

An example disassembler for GameCube CPU and DSP can be found here.

Usage

Add to your Cargo.toml:

[build-dependencies]

chipi = "0.1"

In build.rs:

use std::env;
use std::path::PathBuf;

fn main() {
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    chipi::generate("ppc.chipi", out_dir.join("ppc.rs").to_str().unwrap())
        .expect("failed to generate decoder");
    println!("cargo:rerun-if-changed=ppc.chipi");
}

Then use it:

mod ppc {
    include!(concat!(env!("OUT_DIR"), "/ppc.rs"));
}

match ppc::PpcInstruction::decode(raw) {
    Some(i) => println!("{}", i),    // uses generated Display impl
    None => println!(".long {:#010x}", raw),
};

Note: The generated decode() function signature changes depending on amount of units (e.g. if the architecture is of variable length)! Example:

  • Single-unit: pub fn decode(opcode: u16) -> Option<Self>
    • Returns instruction only
  • Multi-unit: pub fn decode(units: &[u16]) -> Option<(Self, usize)>
    • Returns (instruction, unit_count) tuple, unit_count indicating the amount of units consumed

The generated Display impl uses format lines defined in the DSL. You can also override formatting per-instruction by implementing the generated trait:

struct MyFormat;
impl ppc::PpcFormat for MyFormat {
    fn fmt_bx(li: i32, aa: bool, lk: bool, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "BRANCH {:#x}", li)
    }
}

// Use custom formatting
println!("{}", instr.display::<MyFormat>());

DSL

Create a .chipi file describing your instruction set:

import crate::cpu::Register

decoder Ppc {
    width = 32
    bit_order = msb0
}

type reg = u8 as Register
type simm16 = i32 { sign_extend(16) }
type simm24 = i32 { sign_extend(24), shift_left(2) }

# Branch
bx      [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
        | "b{lk ? l}{aa ? a} {li:#x}"

# Arithmetic
addi    [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
        | "addi {rd}, {ra}, {simm}"
  • decoder block sets the instruction width (in bits) and bit ordering
  • Each line defines an instruction: a name, fixed-bit patterns for matching, and named fields to extract
  • Fields have a name, a type (u8, u16, ...), and a bit range
  • Fixed bits use [range]=value syntax
  • Format lines start with | and define disassembly output
  • Comments start with #

Bit ordering

  • msb0: position 0 is the most significant bit
  • lsb0: position 0 is the least significant bit

Variable-Length Instructions

Chipi supports variable-length instructions. When a bit position exceeds width - 1, it implicitly references subsequent units (1 unit = width bits). The unit index is automatically computed as bit_position / width.

decoder GcDsp {
    width = 16
    bit_order = msb0
    max_units = 2       # optional safety guard
}

# 1-unit instruction: all bits within [0:15]
nop     [0:15]=0000000000000000
        | "nop"

# 2-unit instruction: bits [16:31] are in the second unit
lri     [0:10]=00000010000 rd:u5[11:15] imm:u16[16:31]
        | "lri r{rd}, #0x{imm:04x}"

call    [0:15]=0000001010111111 addr:u16[16:31]
        | "call 0x{addr:04x}"

Optional Safety Guard: max_units

The max_units decoder option acts as a compile-time safety net:

decoder GcDsp {
    width = 16
    bit_order = msb0
    max_units = 2       # enforce maximum instruction length
}

It ensures at compile-time that bitranges do not exceed max_units * width. Helps with catching typos.

Custom types

Use type to create type aliases with optional transformations or wrappers:

# Simple alias
type byte = u8

# With transformation
type simm16 = i32 { sign_extend(16) }

# With custom wrapper (must be imported)
type reg = u8 as Register

# Multiple transformations (comma-separated)
type addr = u32 { shift_left(2), zero_extend(32) }

# With display format hint
type simm16 = i32 { sign_extend(16), display(signed_hex) }
type uimm = u16 { display(hex) }

Builtin types:

  • bool: Converts extracted bit to bool
  • u1 through u7: Extracted as u8 in Rust
  • u8, u16, u32: Unsigned integer types
  • i8, i16, i32: Signed integer types

Builtin transformations:

  • sign_extend(n): Sign-extends the extracted value from n bits
  • zero_extend(n): Zero-extends the extracted value from n bits
  • shift_left(n): Shifts the value left by n bits

Display formats:

  • display(signed_hex): Formats as signed hex (0x1A, -0x1A, 0)
  • display(hex): Formats as unsigned hex (0x1A, 0)

Format lines

Format lines follow an instruction definition and control how it is displayed. They start with | and contain a quoted format string:

bx  [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
    | "b{lk ? l}{aa ? a} {li:#x}"

This produces b 0x100, bl 0x100, ba 0x100, or bla 0x100 depending on the flag fields.

Field references: {field} inserts a field value. Add a format specifier with {field:#x} (hex), {field:#b} (binary), etc.

Ternary expressions: {field ? text} emits text if the field is nonzero, nothing otherwise. {field ? yes : no} provides an else branch.

Arithmetic: {a + b * 4} evaluates inline arithmetic (+, -, *, /, %).

Unary negation: {-field} negates a field value.

addi  [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
      | simm < 0 : "subi {rd}, {ra}, {-simm}"
      | "addi {rd}, {ra}, {simm}"

Builtin functions: {rotate_right(val, amt)} and {rotate_left(val, amt)}.

Guards: Multiple format lines can be used with guard conditions to select different output based on field values:

addi  [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
      | ra == 0: "li {rd}, {simm}"
      | "addi {rd}, {ra}, {simm}"

Guard conditions support ==, !=, <, <=, >, >= and can be joined with , or &&. Guard operands can be field names, integer literals, or arithmetic expressions (sh == 32 - mb). The last format line may omit the guard (acts as the default).

Escapes: Use \{, \}, \?, \: to emit literal characters.

Maps

Maps define lookup tables for use in format strings:

map spr_name(spr) {
    1 => "xer"
    8 => "lr"
    9 => "ctr"
    _ => "???"
}

mtspr  [0:5]=011111 rs:reg[6:10] spr:u16[11:20] [21:30]=0111010011 [31]=0
       | "mtspr {spr_name(spr)}, {rs}"

Map parameters can also use {param} interpolation in the output, in which case the map returns a String instead of &'static str:

map ea(mode, reg) {
    0, _ => "d{reg}"
    1, _ => "a{reg}"
    _ => "???"
}

Formatting trait

chipi generates a trait (e.g. PpcFormat) with one method per instruction. Each method has a default implementation from the format lines. To override specific instructions, implement the trait on your own struct:

struct MyFormat;
impl ppc::PpcFormat for MyFormat {
    // Override just this one; all others keep their defaults
    fn fmt_addi(rd: &Register, ra: &Register, simm: i32,
                f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "ADDI r{}, r{}, {}", rd, ra, simm)
    }
}

println!("{}", instr.display::<MyFormat>());

Instructions without format lines get a raw fallback: instr_name field1, field2, ....

Syntax Highlighting

vscode