Skip to main content

Crate chipi

Crate chipi 

Source
Expand description

§chipi

Generate instruction decoders and disassemblers from .chipi files.

Write your CPU instruction encoding in a simple DSL, and chipi generates the Rust decoder and formatting code for you.

§Usage

Add to Cargo.toml:

[build-dependencies]
chipi = "0.5.3"

Create build.rs:

use std::env;
use std::path::PathBuf;

fn main() {
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    chipi::generate("cpu.chipi", out_dir.join("cpu.rs").to_str().unwrap())
        .expect("failed to generate decoder");
    println!("cargo:rerun-if-changed=cpu.chipi");
}

Use the generated decoder:

mod cpu {
    include!(concat!(env!("OUT_DIR"), "/cpu.rs"));
}

// decode() always takes &[u8] and returns (instruction, bytes_consumed)
match cpu::CpuInstruction::decode(&data[offset..]) {
    Some((instr, bytes)) => {
        println!("{}", instr);
        offset += bytes;
    }
    None => println!("invalid instruction"),
}

§Example .chipi file

decoder Cpu {
    width = 32
    bit_order = msb0
    endian = big
}

type simm16 = i32 { sign_extend(16) }
type simm24 = i32 { sign_extend(24), shift_left(2) }

bx   [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
     | "b{lk ? l}{aa ? a} {li:#x}"

addi [0:5]=001110 rd:u8[6:10] ra:u8[11:15] simm:simm16[16:31]
     | ra == 0: "li {rd}, {simm}"
     | "addi {rd}, {ra}, {simm}"

§Syntax

§Decoder block

decoder Name {
    width = 32        # 8, 16, or 32 bits
    bit_order = msb0  # msb0 or lsb0
    endian = big      # big or little (default: big)
    max_units = 4     # optional: safety guard (validates bit ranges)
}
§Variable-Length Instructions

chipi automatically generates variable-length decoders when you use bit positions beyond width-1. Simply reference subsequent units in your bit ranges:

decoder Dsp {
    width = 16
    bit_order = msb0
    endian = big
    max_units = 2     # Optional safety check: ensures bits don't exceed 32 (width * max_units)
}

nop    [0:15]=0000000000000000        # 1 unit (16 bits)
lri    [0:10]=00000010000 rd:u5[11:15] imm:u16[16:31]  # 2 units (32 bits)

The generated decode always has the signature: pub fn decode(data: &[u8]) -> Option<(Self, usize)>

It accepts raw bytes and returns the decoded instruction along with the number of bytes consumed.

§Instructions

Each instruction is one line with a name, fixed bit patterns, and fields:

add [0:5]=011111 rd:u8[6:10] ra:u8[11:15]

Fixed bits use [range]=pattern. Fields use name:type[range].

§Wildcard Bits

Use ? in bit patterns for bits that can be any value:

# Match when bits [15:8] are 0x8c, bits [7:0] can be anything
clr15   [15:0]=10001100????????
        | "CLR15"

# Mix wildcards with specific bits
nop     [7:4]=0000 [3:0]=????
        | "nop"

Wildcard bits are excluded from the matching mask, so instructions match regardless of the values in those positions. This is useful for reserved or architecturally undefined bits.

§Overlapping Patterns

chipi supports overlapping instruction patterns where one pattern is a subset of another. More specific patterns (with more fixed bits) are checked first:

# Generic instruction - matches 0x1X (any value in bits 4-7)
load  [0:3]=0001 reg:u4[4:7]
      | "load r{reg}"

# Specific instruction - matches only 0x1F
load_max [0:3]=0001 [4:7]=1111
         | "load rmax"

The decoder will check load_max first (all bits fixed), then fall back to load (bits 4-7 are wildcards). This works across all units in variable-length decoders.

§Types

Builtin types:

  • bool (converts bit to true/false)
  • u1 to u7 (maps to u8)
  • u8, u16, u32
  • i8, i16, i32

Custom types:

type simm = i32 { sign_extend(16) }
type reg = u8 as Register

Available transformations:

  • sign_extend(n) - sign extend from n bits
  • zero_extend(n) - zero extend from n bits
  • shift_left(n) - shift left by n bits

Display format hints (controls how the field is printed in format strings):

  • display(signed_hex) - signed hex: 0x1A, -0x1A, 0
  • display(hex) - unsigned hex: 0x1A, 0

§Imports

Import Rust types to wrap extracted values:

import crate::cpu::Register
import std::num::Wrapping

§Format lines

Format lines follow an instruction and define its disassembly output:

bx [0:5]=010010 li:simm24[6:29] aa:bool[30] lk:bool[31]
   | "b{lk ? l}{aa ? a} {li:#x}"

Features:

  • {field} - insert field value, with optional format spec: {field:#x}
  • {field ? text} - emit text if nonzero, {field ? yes : no} for else
  • {a + b * 4} - inline arithmetic (+, -, *, /, %)
  • {-field} - unary negation
  • {map_name(arg)} - call a map lookup
  • {rotate_right(val, amt)} - builtin functions
  • Guards: | ra == 0: "li {rd}, {simm}" - conditional format selection
  • Guard arithmetic: | sh == 32 - mb : "srwi ..." - arithmetic in guard operands

§Maps

Lookup tables for use in format strings:

map spr_name(spr) {
    1 => "xer"
    8 => "lr"
    9 => "ctr"
    _ => "???"
}

§Formatting trait

chipi generates a {Name}Format trait with one method per instruction. Default implementations come from format lines. Override selectively:

struct MyFormat;
impl cpu::CpuFormat for MyFormat {
    fn fmt_bx(li: i32, aa: bool, lk: bool,
              f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "BRANCH {:#x}", li)
    }
}

println!("{}", instr.display::<MyFormat>());

§Emulator LUT

chipi can generate a function-pointer lookup table for emulator dispatch. Each opcode is routed directly to a handler function via static [Handler; N] arrays derived from the same decision tree.

§build.rs

Use LutBuilder to configure and emit both the LUT and the handler stubs:

use std::env;
use std::path::PathBuf;

fn main() {
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    let manifest = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
    let spec = "cpu.chipi";

    let builder = chipi::LutBuilder::new(spec)
        .handler_mod("crate::cpu::interpreter")
        .ctx_type("crate::Cpu");

    // Regenerated every build, stays in sync with the spec
    builder
        .build_lut(out_dir.join("cpu_lut.rs").to_str().unwrap())
        .expect("failed to generate LUT");

    // Written once, hand-edits are never overwritten
    let stubs = manifest.join("src/cpu/interpreter.rs");
    if !stubs.exists() {
        builder.build_stubs(stubs.to_str().unwrap())
            .expect("failed to generate stubs");
    }

    println!("cargo:rerun-if-changed={spec}");
}

§Include and dispatch

// src/cpu.rs
#[allow(dead_code, non_upper_case_globals)]
pub mod lut {
    include!(concat!(env!("OUT_DIR"), "/cpu_lut.rs"));
}

// fetch-decode-execute
let opcode = mem.read_u32(cpu.pc);
cpu.pc = cpu.pc.wrapping_add(4);
crate::cpu::lut::dispatch(&mut ctx, opcode);

§Handler stubs

On the first build, build_stubs writes src/cpu/interpreter.rs with todo!() bodies. Replace each todo!() as you go; the file is never regenerated so hand-edits are safe.

The second parameter type is derived from the spec’s width: u8 (8-bit), u16 (16-bit), or u32 (32-bit).

pub fn addi(_ctx: &mut crate::Cpu, _opcode: u32) { todo!("addi") }
pub fn lwz(_ctx: &mut crate::Cpu, _opcode: u32) { todo!("lwz")  }
// ... one fn per instruction

§Grouped handlers with const generics

Use .group() to fold multiple instructions into one handler via a const OP: u32 generic parameter. Each LUT entry is a separate monomorphization.

Provide .lut_mod() so that generated stubs can use the OP_* constants:

chipi::LutBuilder::new("cpu.chipi")
    .handler_mod("crate::cpu::interpreter")
    .ctx_type("crate::Cpu")
    .lut_mod("crate::cpu::lut")
    .group("alu", ["addi", "addis", "ori", "oris"])
    .build_lut(out_dir.join("cpu_lut.rs").to_str().unwrap())?;

§Custom instruction wrapper type

Use .instr_type() to replace the raw integer with a richer type. chipi uses it in the generated Handler alias and all stub signatures. .raw_expr() tells chipi how to extract the underlying integer for table indexing; it defaults to "instr.0" for newtype wrappers.

chipi::LutBuilder::new("cpu.chipi")
    .handler_mod("crate::cpu::interpreter")
    .ctx_type("crate::Cpu")
    .instr_type("crate::cpu::Instruction")  // struct Instruction(pub u32)
    // .raw_expr("instr.0")                 // default for newtype wrappers
    .build_lut(out_dir.join("cpu_lut.rs").to_str().unwrap())?;

Generated Handler type and stub signature:

pub type Handler = fn(&mut crate::Cpu, crate::cpu::Instruction);

pub fn addi(_ctx: &mut crate::Cpu, _instr: crate::cpu::Instruction) { todo!("addi") }

§Instruction Type Generation

chipi can auto-generate the instruction newtype with field accessor methods, eliminating the need to hand-write bit extraction code. This is useful in cases where a thin wrapper for decoding is prefered (e.g. emulation).

§build.rs

Add .build_instr_type() to your LutBuilder chain:

chipi::LutBuilder::new("cpu.chipi")
    .instr_type("crate::cpu::Instruction")
    .build_instr_type(out_dir.join("instruction.rs").to_str().unwrap())?;

§Generated output

Creates a newtype with #[inline] accessor methods for every unique field:

pub struct Instruction(pub u32);

#[rustfmt::skip]
impl Instruction {
    #[inline] pub fn rd(&self) -> u8 { ((self.0 >> 21) & 0x1f) as u8 }
    #[inline] pub fn ra(&self) -> u8 { ((self.0 >> 16) & 0x1f) as u8 }
    #[inline] pub fn simm(&self) -> i32 { ((((self.0 >> 0) & 0xffff) as i32) << 16) >> 16 }
    #[inline] pub fn rc(&self) -> bool { (self.0 & 0x1) != 0 }
    // ... one accessor per unique field across all instructions
}

§Usage

Include the generated file and optionally add custom methods:

// src/cpu/semantics.rs
include!(concat!(env!("OUT_DIR"), "/instruction.rs"));

// Add custom accessors not derivable from the spec
impl Instruction {
    /// SPR field with swapped halves (PowerPC)
    pub fn spr_decoded(&self) -> u32 {
        let raw = self.spr();
        (raw >> 5) | ((raw & 0x1f) << 5)
    }
}

§Conflict handling

Fields with the same name but different bit ranges across instructions generate separate accessors with bit range suffixes (e.g., d_15_0() and d_11_0()). You can add convenience aliases in a separate impl block if needed.

§API

// Parse and generate decoder from file
chipi::generate("cpu.chipi", "out.rs")?;

// Generate decoder from source string
let code = chipi::generate_from_str(source, "cpu.chipi")?;

// Step-by-step
let def = chipi::parse("cpu.chipi")?;
chipi::emit(&def, "out.rs")?;

// Emulator LUT, simple
// (instr type auto-derived from spec width: u8 / u16 / u32)
chipi::generate_lut("cpu.chipi", "out/lut.rs", "crate::interp", "crate::Cpu")?;
chipi::generate_stubs("cpu.chipi", "src/interp.rs", "crate::Cpu")?; // once only

// Instruction type generation
chipi::generate_instr_type("cpu.chipi", "out/instruction.rs", "Instruction")?;

// Emulator LUT, full control via LutBuilder
chipi::LutBuilder::new("cpu.chipi")
    .handler_mod("crate::cpu::interpreter")
    .ctx_type("crate::Cpu")
    .lut_mod("crate::cpu::lut")              // needed when using groups
    .group("alu", ["addi", "addis"])         // const-generic shared handler
    .instr_type("crate::cpu::Instruction")   // optional wrapper type
    .build_lut("out/lut.rs")?
    .build_instr_type("out/instruction.rs")?;  // generate instruction type

Modules§

codegen
Rust code generation from validated definitions and decision trees.
error
Error types and reporting for parsing and validation.
format_parser
Character-level parser for format string internals.
instr_gen
Instruction type generation - produces a newtype with field accessor methods.
lut_gen
Function-pointer LUT generation from a validated definition and dispatch tree.
parser
DSL parsing for instruction definitions.
tree
Decision tree construction for optimal instruction dispatch.
types
Core type definitions for the intermediate representation.
validate
Semantic validation for instruction definitions.

Structs§

LutBuilder
Builder for generating a function-pointer LUT and handler stubs, with optional grouping of instructions under shared const-generic handlers.

Functions§

emit
Validate a parsed definition and write generated Rust code to a file.
generate
Full pipeline: parse a .chipi file and generate a Rust decoder.
generate_from_str
Parse, validate, and generate code from source text. Returns the generated Rust code as a String.
generate_instr_type
Generate an instruction newtype with field accessor methods from a .chipi spec.
generate_lut
Generate a function-pointer LUT from a .chipi spec file.
generate_stubs
Generate handler stub functions for every instruction in a .chipi spec.
parse
Parse a .chipi file from a file path and return the decoder definition.
parse_str
Parse source text directly without reading from a file.