chipi
A declarative instruction decoder generator using a custom DSL. Define your CPUs instruction encoding in a .chipi file, and chipi generates a decoder and disassembler for you. Seemless interaction with Rust types.
An example disassembler for GameCube CPU and DSP can be found here.
Usage
Add to your Cargo.toml:
[]
= "0.1.1"
In build.rs:
use env;
use PathBuf;
Then use it:
match decode ;
Note: The generated decode() function signature changes depending on amount of units (e.g. if the architecture is of variable length)! Example:
- Single-unit:
pub fn decode(opcode: u16) -> Option<Self>- Returns
instructiononly
- Returns
- Multi-unit:
pub fn decode(units: &[u16]) -> Option<(Self, usize)>- Returns
(instruction, unit_count)tuple,unit_countindicating the amount of units consumed
- Returns
The generated Display impl uses format lines defined in the DSL. You can also override formatting per-instruction by implementing the generated trait:
;
// Use custom formatting
println!;
DSL
Create a .chipi file describing your instruction set:
import crate::cpu::Register
decoder Ppc {
width = 32
bit_order = msb0
}
type reg = u8 as Register
type simm16 = i32 { sign_extend(16) }
type simm24 = i32 { sign_extend(24), shift_left(2) }
# Branch
bx [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
| "b{lk ? l}{aa ? a} {li:#x}"
# Arithmetic
addi [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
| "addi {rd}, {ra}, {simm}"
decoderblock sets the instruction width (in bits) and bit ordering- Each line defines an instruction: a name, fixed-bit patterns for matching, and named fields to extract
- Fields have a name, a type (
u8,u16, ...), and a bit range - Fixed bits use
[range]=valuesyntax - Format lines start with
|and define disassembly output - Comments start with
#
Bit ordering
msb0: position 0 is the most significant bitlsb0: position 0 is the least significant bit
Variable-Length Instructions
Chipi supports variable-length instructions. When a bit position exceeds width - 1, it implicitly references subsequent units (1 unit = width bits). The unit index is automatically computed as bit_position / width.
decoder GcDsp {
width = 16
bit_order = msb0
max_units = 2 # optional safety guard
}
# 1-unit instruction: all bits within [0:15]
nop [0:15]=0000000000000000
| "nop"
# 2-unit instruction: bits [16:31] are in the second unit
lri [0:10]=00000010000 rd:u5[11:15] imm:u16[16:31]
| "lri r{rd}, #0x{imm:04x}"
call [0:15]=0000001010111111 addr:u16[16:31]
| "call 0x{addr:04x}"
Optional Safety Guard: max_units
The max_units decoder option acts as a compile-time safety net:
decoder GcDsp {
width = 16
bit_order = msb0
max_units = 2 # enforce maximum instruction length
}
It ensures at compile-time that bitranges do not exceed max_units * width. Helps with catching typos.
Custom types
Use type to create type aliases with optional transformations or wrappers:
# Simple alias
type byte = u8
# With transformation
type simm16 = i32 { sign_extend(16) }
# With custom wrapper (must be imported)
type reg = u8 as Register
# Multiple transformations (comma-separated)
type addr = u32 { shift_left(2), zero_extend(32) }
# With display format hint
type simm16 = i32 { sign_extend(16), display(signed_hex) }
type uimm = u16 { display(hex) }
Builtin types:
bool: Converts extracted bit toboolu1throughu7: Extracted asu8in Rustu8,u16,u32: Unsigned integer typesi8,i16,i32: Signed integer types
Builtin transformations:
sign_extend(n): Sign-extends the extracted value from n bitszero_extend(n): Zero-extends the extracted value from n bitsshift_left(n): Shifts the value left by n bits
Display formats:
display(signed_hex): Formats as signed hex (0x1A,-0x1A,0)display(hex): Formats as unsigned hex (0x1A,0)
Format lines
Format lines follow an instruction definition and control how it is displayed. They start with | and contain a quoted format string:
bx [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
| "b{lk ? l}{aa ? a} {li:#x}"
This produces b 0x100, bl 0x100, ba 0x100, or bla 0x100 depending on the flag fields.
Field references: {field} inserts a field value. Add a format specifier with {field:#x} (hex), {field:#b} (binary), etc.
Ternary expressions: {field ? text} emits text if the field is nonzero, nothing otherwise. {field ? yes : no} provides an else branch.
Arithmetic: {a + b * 4} evaluates inline arithmetic (+, -, *, /, %).
Unary negation: {-field} negates a field value.
addi [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
| simm < 0 : "subi {rd}, {ra}, {-simm}"
| "addi {rd}, {ra}, {simm}"
Builtin functions: {rotate_right(val, amt)} and {rotate_left(val, amt)}.
Guards: Multiple format lines can be used with guard conditions to select different output based on field values:
addi [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
| ra == 0: "li {rd}, {simm}"
| "addi {rd}, {ra}, {simm}"
Guard conditions support ==, !=, <, <=, >, >= and can be joined with , or &&. Guard operands can be field names, integer literals, or arithmetic expressions (sh == 32 - mb). The last format line may omit the guard (acts as the default).
Escapes: Use \{, \}, \?, \: to emit literal characters.
Maps
Maps define lookup tables for use in format strings:
map spr_name(spr) {
1 => "xer"
8 => "lr"
9 => "ctr"
_ => "???"
}
mtspr [0:5]=011111 rs:reg[6:10] spr:u16[11:20] [21:30]=0111010011 [31]=0
| "mtspr {spr_name(spr)}, {rs}"
Map parameters can also use {param} interpolation in the output, in which case the map returns a String instead of &'static str:
map ea(mode, reg) {
0, _ => "d{reg}"
1, _ => "a{reg}"
_ => "???"
}
Formatting trait
chipi generates a trait (e.g. PpcFormat) with one method per instruction. Each method has a default implementation from the format lines. To override specific instructions, implement the trait on your own struct:
;
println!;
Instructions without format lines get a raw fallback: instr_name field1, field2, ....