= Iridium VM Specification
:toc:
:author: Fletcher Haynes
:email: fletcher@subnetzero.io
== 1.0 Introduction
Hi there! This document contains the formal specification for the Iridium VM; it should be considered authoritative.
=== 1.1 An Overview of the VM
The *Iridium VM* project provides a platform for writing, deploying, and managing applications that are extremely resilient and low-latency. In terms of discrete components, the three main logical components are:
1. A language VM that executes Iridium-specific bytecode
2. An assembler that translates the Iridium Assembly Language to Iridium bytecode
3. A compiler for a simple, Python-esque language
4. A JIT
A modern, common workflow might involve writing an application, getting it to work in a Docker container, and figuring out how to deploy it to a platform somewhere. All of these activities are heavily segregated; my Python app doesn't know its running in a docker container on a Kubernetes cluster.
Iridium attempts to accomplish the following, in order of priority:
1. Be an educational tool for people interested in language VMs and interpreters
2. Provide a foundation for experimentation in the areas of application deployment and management in distributed systems
3. Be production-quality
This is not to say that these goals are incompatible, but it is helpful to general establish priorities.
=== 1.2 Similar Work
There are only two similar platforms I know of that have done something similar. I'm not going to detail all their features here.
==== BEAM VM
This is the Erlang VM. It, and the Erlang OTP library, have served as a heavy source of inspiration for this project.
==== Lua
Later versions of Lua use a register-based virtual machine. There is also a project, LuaJIT, that showcases what a good JIT can do.
=== 1.3 Feedback
You can join any of our various chat mediums, or e-mail iridium@subnetzero.io.
== 2.0 Iridium VM Internals
This covers the internals of the Iridium VM. Number and type of registers, opcodes, instructions, etc.
=== 2.1 Registers
Iridium is a _register-based_ virtual machine. Details are:
==== i32 Registers
. It has 32 registers capable of holding one i32 number
. Registers are numbered 0-31
==== f64 registers
. It has 32 registers capable of holding one f64 number
. Registers are numbered 0-31
=== 2.2 Program Counter
The VM data structure has a Rust `usize` variable that tracks the current byte evaluated by the VM. This is the `pc`, or the `Program Counter`.
=== 2.3 Special-purpose Registers
The VM data structure has attributes that take the place of special-purpose registers:
.Special register types
[width="40%", options="header"]
|=========================================================================
| Field Name | Size | Purpose
| `remainder` | `usize` | Stores the remainder from modulo division
| `equal_flag`| `bool` | Stores the result of comparison instructions
|=========================================================================
Opcodes know how and when to access these fields.
=== 2.4 Bytecode
At the lowest level, bytecode is stored in a `Vector` of `u8s`. The previously mentioned program counter tracks the index into that vector.
Bytecode is generated by the assembler component, which is covered in a different document.
=== 2.5 Bytecode Header
After assembly, the output is a file containing a lot of 1s and 0s. The first 64 bytes are the `Header` for an Iridium bytecode file. The Linux `ELF` format operates in a similar way.
As with `ELF`, the first four bytes of the header are a "magic number": `[45, 50, 49, 45]`. For the curious, this spells out `EPIE` in ASCII.
Bytes 5-64 are not used and are reserved for future use.
=== 2.5 Read-Only Section
After the header comes the read-only data section of the bytecode. This stores constants found by the assembler.
In the VM data structure, this section is a `Vector` of `u8s` and may be of arbitrary length.
The offset at which the read-only section _ends_ is encoded in the first four bytes after the header. So bytes 65-69. All bytes after that are executable bytecode.
=== 2.6 Heap Memory
When values cannot be stored in registers, they can be moved to the `Heap`. In the VM, the heap is represented as, you guessed it, a `Vector` of `u8s`. At startup, the VM pre-allocates 2048 bytes and will expand it as needed.
=== 2.7 Instruction Width
Iridium VM uses a fixed-bit instruction format. Iridium expects that each instruction is 32-bits wide. Each iteration of the execution loop will consume 32 bits. Some of the opcodes do not need all 32-bits; those are padded by the assembler.
== 3.0 Opcodes
The first byte of a 4-byte wide instruction is the Opcode. The following Opcodes are supported:
.Opcodes
[width="100%", options="header", cols="5*^.^"]
|=========================================================================
| Opcode | Operand 1 | Operand 2 | Operand 3 | Summary
| LOAD | Register 2+| Number to Load | Combines the second and third operand fields into a u16 which is then loaded into the register.
| LOADM | Register | Register | Unused | Loads 32 bits from the heap into the first register starting at the offset supplied in the second register
| ADD | Register | Register | Register | Adds the contents of registers specified in operand 1 and 2 and places the result in register 3.
| SUB | Register | Register | Register | Subtracts register 2 from register 1 and places the result in register 3
| MUL | Register | Register | Register | Multiplies the contents of registers specified in operand 1 and 2 and places the result in register 3.
| DIV | Register | Register | Register | Divides the contents of registers in operand 1 and 2; results go in register 3. The remainder goes in the remainder field of the VM.
| HLT 3+| Unused | Halts execution of the program
| IGL 3+| Unused | Used if an illegal opcode got in to the bytecode
| JMP | Register 2+| Unused | Jumps directly to the address in the specified in the register
| JMPF | Register 2+| Unused | Relative jump forward by the number in the register
| JMPB | Register 2+| Unused | Relative jump backward by the number in the register
| EQ | Register | Register | Unused | Checks the values in registers 1 and 2 and sets the VM equal flag to true if they are, false if not
| NEQ | Register | Register | Unused | Checks the values in registers 1 and 2 and sets the VM equal flag to false if they are, true if not
| GT | Register | Register | Unused | Checks if register 1 is > register 2
| GTE | Register | Register | Unused | Checks if register 1 is >= register 2
| LT | Register | Register | Unused | Checks if register 1 is < register 2
| LTE | Register | Register | Unused | Checks if register 1 is <= register 2
| JMPE | Register | Register | Register | Direct jump to the value in the register if the VM's equal_flag is true
| NOP 3+| Unused | Does nothing; is a no-op.
| ALOC | Register 2+| Unused | Increases the heap by the amount specified in the first register
| INC | Register 2+| Unused | Increments the number in the register by 1
| DEC | Register 2+| Unused | Decrements the number in the register by 1
| DJMPE 2+| Destination | Unused | Direct jump to the value specified _in the assembly_ if the VM's equal_flag is true. Does not use registers.
| PRTS 2+| Offset | Unused | Takes an offset into the read-only section and prints a string that starts at that offset
| SETM | Register | Register | Unused | Takes an offset into the heap in the first register and writes the data in the second register to it
|=========================================================================
== 4.0 Shell Environment
Iridium provides a shell environment that can be accessed locally or remotely via SSH. REPL (or interactive interpreter) is built in to this shell.
=== 4.1 Invocation
The Iridium shell can be invoked by running the `iridium` executable without a path argument. If the `iridium` executable is started in server mode, then it will listen on the configured interface and port for SSH traffic. When operating in REPL mode, there is a default VM created to execute code.
=== 4.2 Commands
The shell has commands meant to manage running Iridium programs and VMs. These are meant to provide command-and-control functionality for applications running in the VM. Every command is prefaced with the command character, which is currently: `!`.
=== 4.3 Executing Code
Any user input that does not begin with the command character is treated as code to be executed by the default VM.
=== 5.0 Calling Convention
Iridium has two opcodes related to functions: CALL and RETURN. This section describes how functions are called and return values are passed back. It also describes how to pass arguments to the called function.
The entity invoking `CALL` is referred to as the `caller`, and the invoked code is referred to as the `callee`.
==== 5.1 Call
`CALL` differs from `JMP` in that when `CALL` is invoked, the _next_ address is pushed onto the stack. The PC is then set to the destination of `CALL`. When a `RET` operation is found, the most recent return address is popped off of the stack and the PC is set to that.
==== 5.2 Return
When the code hits a `RET` instruction, the most recent return address is popped off the stack, and the program counter is set to its value.
==== 5.3 Push
==== 5.3 Pop