moongen 0.0.1

moonsharp bytecode types, assembler, disassembler, and static analyzer
Documentation

moongen provides both

  • a command line program for assembling/disassembling/analyzing moonsharp bytecode
  • a library for interacting with it

CLI

there are three commands

  • moongen asm <path> assembles the assembly format into a bytecode dump
  • moongen disasm <path> disassembles a bytecode dump into the assembly format
  • moongen analyze <path> analyzes a bytecode dump and prints any diagnostics if it violated any rules, along with the full path taken to a given instruction

all three

  • accept - as their path, indicating they should read data from stdin
  • emit their results to stdout

assembly format

for an instruction reference, review the Inst documentation

syntax is defined by grammar.pest, and follows the following format

  • each line may start with a label definition: @ident:
  • each line may have one instruction
    • an instruction name (ident)
    • if the instruction takes addr, one of the following:
      • an integer specifying the instruction address relative to the start of the chunk
      • ~, followed by an integer specifying the instruction address relative to the current instruction
      • @, followed by an ident referring to a label
    • if the instruction takes arg1, an integer
    • if the instruction takes arg2, an integer
    • if the instruction takes name, a string
    • if the instruction takes value, an =, followed by one of the following:
      • null
      • nil
      • void
      • true
      • false
      • a float
      • a string
      • {} (creates an empty table)
    • if the instruction takes symbol, a symbol
    • if the instruction takes symbol_list, [, comma-separated symbols, ]

terminology

  • idents follow the regex /[a-zA-Z_][a-zA-Z0-9_]*/
  • integers follow the regex /-?(?:0|[1-9][0-9]*)/
  • floats follow the regex /-?(?:0|[1-9][0-9]*)(?:\.[0-9]*)/
  • strings are either
    • JSON-escaped content wrapped in quotes ("this is a string with \"embedded\" quotes")
    • base64-encoded content wrapped in quotes and prefixed with b (b"dGhpcyBpcyBhIHN0cmluZyB3aXRoICJlbWJlZGRlZCIgcXVvdGVz", useful for binary data)
  • symbols are one of the following:
    • &, symbol name (local name), :, integer (local index)
    • ^, symbol name (upvalue name), :, integer (upvalue index)
    • %, symbol name (global name), :, symbol (global _ENV)
    • env (_ENV symbol)
    • nullref (null symbol)
  • symbol names are one of the following:
    • an ident (name)
    • an ident, @, integer (name + disambiguation)
    • ... (vararg)

full demonstration

#![has_env]
// useful for debugging purposes
meta 25 1 "greeter" =null
// does nothing but is in the function header anyways
fn 0 -1 []

closure @greet []
upv.ld ^_ENV:0
// %hello:^_ENV:0 isnt necessary, but moonsharp emits it anyways
// you can use nullref for index.set
index.set 0 0 ="hello" %hello:^_ENV:0

// moonsharp likes to generate closures by emitting their instructions and jumping over them
// you dont have to do it this way though (it also saves an instruction to Not Do That)
// but this example will do it moonsharp's way
jmp @over_greet
	@greet:
	meta 9 1 "greet" =null
	fn 1 0 [&who:0]
	args [&who:0]

	lit ="hello "
	loc.ld &who:0
	lit ="!"
	op.concat
	op.concat

	ret 1
	// moonsharp also generates unreachable `ret 0`s even when the last instruction in a function is a `ret 1`...
	ret 0
@over_greet:

// indentation isn't forced either way! lay it out in a way that makes more sense if you'd like
			upv.ld ^_ENV:0
		index ="print"
				upv.ld ^_ENV:0
			index ="greet"
			lit ="dolly"
		call 1 "calling greet"
	call 1 "calling print"
pop 1

ret 0