moongen provides both
- a command line program for assembling/disassembling/analyzing moonsharp bytecode
- a library for interacting with it
## installing
### as a CLI
any of the following:
- Nix:
- `nix run git+https://code.dolls.today/voidstella/moongen -- asm test.txt` to run directly
- `nix shell git+https://code.dolls.today/voidstella/moongen` to get in a shell
- NixOS:
- add `moongen.url = "git+https://code.dolls.today/voidstella/moongen"` to flake inputs
- then get the package from `inputs.moongen.packages.${system}.default`
- cargo:
- `cargo install moongen`
### as a library
`cargo add moongen`
## CLI usage
there are three commands
- `moongen asm <path>` assembles the assembly format into a bytecode dump
- `moongen disasm <path>` disassembles a bytecode dump into the assembly format
- `moongen analyze <path>` analyzes a bytecode dump and prints any diagnostics if it violated any rules, along with the full path taken to a given instruction
all three
- accept `-` as their path, indicating they should read data from stdin
- emit their results to stdout
## assembly format
for an instruction reference, review the [`Inst` documentation](https://docs.rs/moongen/latest/moongen/inst/enum.Inst.html)
syntax is defined by `grammar.pest`, and follows the following format
- each line *may* start with a label definition: `@ident:`
- each line *may* have one instruction
- an instruction name (ident)
- if the instruction takes `addr`, one of the following:
- an integer specifying the instruction address relative to the start of the chunk
- `~`, followed by an integer specifying the instruction address relative to the current instruction
- `@`, followed by an ident referring to a label
- if the instruction takes `arg1`, an integer
- if the instruction takes `arg2`, an integer
- if the instruction takes `name`, a string
- if the instruction takes `value`, an `=`, followed by one of the following:
- `null`
- `nil`
- `void`
- `true`
- `false`
- a float
- a string
- `{}` (creates an empty table)
- if the instruction takes `symbol`, a symbol
- if the instruction takes `symbol_list`, `[`, comma-separated symbols, `]`
terminology
- idents follow the [regex](https://regexr.com/) `/[a-zA-Z_][a-zA-Z0-9_]*/`
- integers follow the regex `/-?(?:0|[1-9][0-9]*)/`
- floats follow the regex `/-?(?:0|[1-9][0-9]*)(?:\.[0-9]*)/`
- strings are either
- JSON-escaped content wrapped in quotes (`"this is a string with \"embedded\" quotes"`)
- base64-encoded content wrapped in quotes and prefixed with `b` (`b"dGhpcyBpcyBhIHN0cmluZyB3aXRoICJlbWJlZGRlZCIgcXVvdGVz"`, useful for binary data)
- symbols are one of the following:
- `&`, symbol name (local name), `:`, integer (local index)
- `^`, symbol name (upvalue name), `:`, integer (upvalue index)
- `%`, symbol name (global name), `:`, symbol (global `_ENV`)
- `env` (`_ENV` symbol)
- `nullref` (null symbol)
- symbol names are one of the following:
- an ident (name)
- an ident, `@`, integer (name + disambiguation)
- `...` (vararg)
full demonstration
```text
#![has_env]
// useful for debugging purposes
meta 25 1 "greeter" =null
// does nothing but is in the function header anyways
fn 0 -1 []
closure @greet []
upv.ld ^_ENV:0
// %greet:^_ENV:0 isnt necessary, but moonsharp emits it anyways
// you can use nullref for index.set
index.set 0 0 ="greet" %greet:^_ENV:0
// moonsharp likes to generate closures by emitting their instructions and jumping over them
// you dont have to do it this way though (it also saves an instruction to Not Do That)
// but this example will do it moonsharp's way
jmp @over_greet
@greet:
meta 9 1 "greet" =null
fn 1 0 [&who:0]
args [&who:0]
lit ="hello "
loc.ld &who:0
lit ="!"
op.concat
op.concat
ret 1
// moonsharp also generates unreachable `ret 0`s even when the last instruction in a function is a `ret 1`...
ret 0
@over_greet:
// indentation isn't forced either way! lay it out in a way that makes more sense if you'd like
upv.ld ^_ENV:0
index ="print"
upv.ld ^_ENV:0
index ="greet"
lit ="dolly"
call 1 "calling greet"
call 1 "calling print"
pop 1
ret 0
```