chipi 0.1.1

A declarative instruction set decoder and disassembler generator
Documentation
# chipi


A declarative instruction decoder generator using a custom DSL. Define your CPUs instruction encoding in a `.chipi` file, and chipi generates a decoder and disassembler for you. Seemless interaction with Rust types.  

An example disassembler for GameCube CPU and DSP can be found [here](https://github.com/ioncodes/chipi-gekko).

## Usage


Add to your `Cargo.toml`:

```toml
[build-dependencies]
chipi = "0.1.1"
```

In `build.rs`:

```rs
use std::env;
use std::path::PathBuf;

fn main() {
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    chipi::generate("ppc.chipi", out_dir.join("ppc.rs").to_str().unwrap())
        .expect("failed to generate decoder");
    println!("cargo:rerun-if-changed=ppc.chipi");
}
```

Then use it:

```rs
mod ppc {
    include!(concat!(env!("OUT_DIR"), "/ppc.rs"));
}

match ppc::PpcInstruction::decode(raw) {
    Some(i) => println!("{}", i),    // uses generated Display impl
    None => println!(".long {:#010x}", raw),
};
```

Note: The generated `decode()` function signature changes depending on amount of units (e.g. if the architecture is of variable length)! Example:
- Single-unit: `pub fn decode(opcode: u16) -> Option<Self>`
  - Returns `instruction` only
- Multi-unit: `pub fn decode(units: &[u16]) -> Option<(Self, usize)>`
  - Returns `(instruction, unit_count)` tuple, `unit_count` indicating the amount of units consumed

The generated `Display` impl uses format lines defined in the DSL. You can also override formatting per-instruction by implementing the generated trait:

```rs
struct MyFormat;
impl ppc::PpcFormat for MyFormat {
    fn fmt_bx(li: i32, aa: bool, lk: bool, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "BRANCH {:#x}", li)
    }
}

// Use custom formatting
println!("{}", instr.display::<MyFormat>());
```

## DSL


Create a `.chipi` file describing your instruction set:

```chipi
import crate::cpu::Register

decoder Ppc {
    width = 32
    bit_order = msb0
}

type reg = u8 as Register
type simm16 = i32 { sign_extend(16) }
type simm24 = i32 { sign_extend(24), shift_left(2) }

# Branch

bx      [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
        | "b{lk ? l}{aa ? a} {li:#x}"

# Arithmetic

addi    [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
        | "addi {rd}, {ra}, {simm}"
```

- `decoder` block sets the instruction width (in bits) and bit ordering
- Each line defines an instruction: a name, fixed-bit patterns for matching, and named fields to extract
- Fields have a name, a type (`u8`, `u16`, ...), and a bit range
- Fixed bits use `[range]=value` syntax
- Format lines start with `|` and define disassembly output
- Comments start with `#`

### Bit ordering


- `msb0`: position 0 is the most significant bit
- `lsb0`: position 0 is the least significant bit

### Variable-Length Instructions


Chipi supports variable-length instructions. When a bit position exceeds `width - 1`, it implicitly references subsequent units (1 unit = `width` bits). The unit index is automatically computed as `bit_position / width`.

```chipi
decoder GcDsp {
    width = 16
    bit_order = msb0
    max_units = 2       # optional safety guard
}

# 1-unit instruction: all bits within [0:15]

nop     [0:15]=0000000000000000
        | "nop"

# 2-unit instruction: bits [16:31] are in the second unit

lri     [0:10]=00000010000 rd:u5[11:15] imm:u16[16:31]
        | "lri r{rd}, #0x{imm:04x}"

call    [0:15]=0000001010111111 addr:u16[16:31]
        | "call 0x{addr:04x}"
```

### Optional Safety Guard: `max_units`


The `max_units` decoder option acts as a compile-time safety net:

```chipi
decoder GcDsp {
    width = 16
    bit_order = msb0
    max_units = 2       # enforce maximum instruction length
}
```

It ensures at compile-time that bitranges do not exceed `max_units * width`. Helps with catching typos.

### Custom types


Use `type` to create type aliases with optional transformations or wrappers:

```chipi
# Simple alias

type byte = u8

# With transformation

type simm16 = i32 { sign_extend(16) }

# With custom wrapper (must be imported)

type reg = u8 as Register

# Multiple transformations (comma-separated)

type addr = u32 { shift_left(2), zero_extend(32) }

# With display format hint

type simm16 = i32 { sign_extend(16), display(signed_hex) }
type uimm = u16 { display(hex) }
```

**Builtin types:**
- `bool`: Converts extracted bit to `bool`
- `u1` through `u7`: Extracted as `u8` in Rust
- `u8`, `u16`, `u32`: Unsigned integer types
- `i8`, `i16`, `i32`: Signed integer types

**Builtin transformations:**
- `sign_extend(n)`: Sign-extends the extracted value from n bits
- `zero_extend(n)`: Zero-extends the extracted value from n bits
- `shift_left(n)`: Shifts the value left by n bits

**Display formats:**
- `display(signed_hex)`: Formats as signed hex (`0x1A`, `-0x1A`, `0`)
- `display(hex)`: Formats as unsigned hex (`0x1A`, `0`)

### Format lines


Format lines follow an instruction definition and control how it is displayed. They start with `|` and contain a quoted format string:

```chipi
bx  [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
    | "b{lk ? l}{aa ? a} {li:#x}"
```

This produces `b 0x100`, `bl 0x100`, `ba 0x100`, or `bla 0x100` depending on the flag fields.

**Field references:** `{field}` inserts a field value. Add a format specifier with `{field:#x}` (hex), `{field:#b}` (binary), etc.

**Ternary expressions:** `{field ? text}` emits `text` if the field is nonzero, nothing otherwise. `{field ? yes : no}` provides an else branch.

**Arithmetic:** `{a + b * 4}` evaluates inline arithmetic (`+`, `-`, `*`, `/`, `%`).

**Unary negation:** `{-field}` negates a field value.

```chipi
addi  [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
      | simm < 0 : "subi {rd}, {ra}, {-simm}"
      | "addi {rd}, {ra}, {simm}"
```

**Builtin functions:** `{rotate_right(val, amt)}` and `{rotate_left(val, amt)}`.

**Guards:** Multiple format lines can be used with guard conditions to select different output based on field values:

```chipi
addi  [0:5]=001110 rd:reg[6:10] ra:reg[11:15] simm:simm16[16:31]
      | ra == 0: "li {rd}, {simm}"
      | "addi {rd}, {ra}, {simm}"
```

Guard conditions support `==`, `!=`, `<`, `<=`, `>`, `>=` and can be joined with `,` or `&&`. Guard operands can be field names, integer literals, or arithmetic expressions (`sh == 32 - mb`). The last format line may omit the guard (acts as the default).

**Escapes:** Use `\{`, `\}`, `\?`, `\:` to emit literal characters.

### Maps


Maps define lookup tables for use in format strings:

```chipi
map spr_name(spr) {
    1 => "xer"
    8 => "lr"
    9 => "ctr"
    _ => "???"
}

mtspr  [0:5]=011111 rs:reg[6:10] spr:u16[11:20] [21:30]=0111010011 [31]=0
       | "mtspr {spr_name(spr)}, {rs}"
```

Map parameters can also use `{param}` interpolation in the output, in which case the map returns a `String` instead of `&'static str`:

```chipi
map ea(mode, reg) {
    0, _ => "d{reg}"
    1, _ => "a{reg}"
    _ => "???"
}
```

### Formatting trait


chipi generates a trait (e.g. `PpcFormat`) with one method per instruction. Each method has a default implementation from the format lines. To override specific instructions, implement the trait on your own struct:

```rs
struct MyFormat;
impl ppc::PpcFormat for MyFormat {
    // Override just this one; all others keep their defaults
    fn fmt_addi(rd: &Register, ra: &Register, simm: i32,
                f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "ADDI r{}, r{}, {}", rd, ra, simm)
    }
}

println!("{}", instr.display::<MyFormat>());
```

Instructions without format lines get a raw fallback: `instr_name field1, field2, ...`.

## Syntax Highlighting


[vscode](https://github.com/ioncodes/chipi-vscode)