███████╗██╗ ██╗███████╗███████╗██╗ ██╗███╗ ███╗
██╔════╝██║ ██║██╔════╝██╔════╝██║ ██║████╗ ████║
█████╗ ██║ ██║███████╗█████╗ ██║ ██║██╔████╔██║
██╔══╝ ██║ ██║╚════██║██╔══╝ ╚██╗ ██╔╝██║╚██╔╝██║
██║ ╚██████╔╝███████║███████╗ ╚████╔╝ ██║ ╚═╝ ██║
╚═╝ ╚═════╝ ╚══════╝╚══════╝ ╚═══╝ ╚═╝ ╚═╝
[LANGUAGE-AGNOSTIC BYTECODE VM WITH FUSED SUPERINSTRUCTIONS]
"One VM to run them all."
A language-agnostic bytecode virtual machine with fused superinstructions and Cranelift JIT. Any language frontend compiles to fusevm opcodes and gets fused hot-loop dispatch, extension opcode tables, stack-based execution with slot-indexed fast paths, and native code compilation via Cranelift — for free. 127 opcodes across 10 categories. Cranelift 0.130 behind jit feature flag.
Docs · API Reference · Crates.io · strykelang · zshrs
Table of Contents
- [0x00] Overview
- [0x01] Install
- [0x02] Usage
- [0x03] Architecture
- [0x04] Fused Superinstructions
- [0x05] Op Categories
- [0x06] Extension Mechanism
- [0x07] JIT Compilation
- [0x08] Value Representation
- [0x09] Benchmarks
- [0xFF] License
[0x00] OVERVIEW
fusevm is the shared execution engine behind strykelang, zshrs, and awkrs. All three compile to the same Op enum. The VM doesn't care which language produced the bytecodes.
stryke source ──► stryke compiler ──┐
│
zshrs source ──► shell compiler ──┼──► fusevm::Op ──► VM::run()
│ │
awkrs source ──► awk compiler ──┘ JitCompiler::try_run_linear()
│
Cranelift 0.130
native x86-64 / aarch64
- Fused superinstructions — the compiler detects hot patterns and emits single ops instead of multi-op sequences
- Extension dispatch — language-specific opcodes via
Extended(u16, u8)with registered handler tables - Stack + slots — stack-based execution with slot-indexed fast paths for locals
- Cranelift JIT — eligibility analysis and compilation for hot chunks
- Zero-clone dispatch — ops borrowed from chunk, in-place array/hash mutation,
Cow<str>string coercion - Zero runtime dependencies — pure Rust, no allocator tricks, no unsafe
[0x01] INSTALL
# or from source
&& &&
[0x02] USAGE
use ;
let mut b = new;
b.emit;
b.emit;
b.emit;
let mut vm = VMnew;
match vm.run
[0x03] ARCHITECTURE
┌──────────────────────────────────┐
│ Language Frontend │
│ (stryke, zshrs, or your own) │
└──────────────┬───────────────────┘
│ compile
▼
┌──────────────────────────────────┐
│ ChunkBuilder::emit() │
│ Op enum ──► Chunk (bytecodes) │
└──────────────┬───────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ VM::run() │ │ JitCompiler │
│ match-dispatch │ │ Cranelift codegen │
�� interpreter │ │ (eligible chunks) │
└─────────────────┘ └─────────────────────┘
[0x04] FUSED SUPERINSTRUCTIONS
The performance secret. The compiler detects hot patterns and emits single ops instead of multi-op sequences:
| Fused Op | Replaces | Effect |
|---|---|---|
AccumSumLoop(sum, i, limit) |
GetSlot + GetSlot + Add + SetSlot + PreInc + NumLt + JumpIfFalse |
Entire counted sum loop in one dispatch |
SlotIncLtIntJumpBack(slot, limit, target) |
PreIncSlot + SlotLtIntJumpIfFalse |
Loop backedge in one dispatch |
ConcatConstLoop(const, s, i, limit) |
LoadConst + ConcatAppendSlot + SlotIncLtIntJumpBack |
String append loop in one dispatch |
PushIntRangeLoop(arr, i, limit) |
GetSlot + PushArray + ArrayLen + Pop + SlotIncLtIntJumpBack |
Array push loop in one dispatch |
Each fused op eliminates N-1 dispatch cycles, stack pushes, and branch mispredictions from the hot path.
[0x05] OP CATEGORIES
127 opcodes across 10 categories:
| Category | Count | Examples |
|---|---|---|
| Constants & Stack | ~12 | LoadInt, LoadFloat, Pop, Dup, Swap |
| Variables | ~8 | GetVar, SetVar, GetSlot, SetSlot |
| Arrays & Hashes | ~25 | ArrayPush, HashGet, MakeArray, HashKeys |
| Arithmetic | ~9 | Add, Sub, Mul, Div, Pow |
| Comparison | ~14 | NumEq, StrLt, Spaceship |
| Control Flow | ~5 | Jump, JumpIfFalse, JumpIfTrueKeep |
| Functions | ~3 | Call, Return, PushFrame |
| Shell Ops | ~24 | Exec, PipelineBegin, Redirect, Glob, TestFile |
| Fused | ~8 | AccumSumLoop, SlotIncLtIntJumpBack |
| Extension | 2 | Extended(u16, u8), ExtendedWide(u16, usize) |
[0x06] EXTENSION MECHANISM
Language-specific opcodes use Extended(u16, u8) which dispatches through a handler table registered by the frontend:
let mut vm = VMnew;
vm.set_extension_handler;
stryke registers ~450 extended ops. zshrs registers ~20. awkrs registers ~95. They don't conflict — each frontend owns its own ID space.
[0x07] JIT COMPILATION
The JitCompiler compiles eligible chunks to native code via Cranelift 0.130. Enable with cargo add fusevm --features jit.
use ;
let mut b = new;
b.emit;
b.emit;
b.emit;
let chunk = b.build;
let jit = new;
if jit.is_linear_eligible
Linear JIT — eligible ops
| Category | JIT'd Ops |
|---|---|
| Constants | LoadInt, LoadFloat, LoadConst (int/float), LoadTrue, LoadFalse |
| Arithmetic | Add, Sub, Mul, Div, Mod, Pow, Negate, Inc, Dec |
| Comparison | NumEq/Ne/Lt/Gt/Le/Ge, Spaceship |
| Bitwise | BitAnd/Or/Xor/Not, Shl, Shr |
| Logic | LogNot |
| Stack | Pop, Dup, Swap, Rot |
| Slots | GetSlot, SetSlot, PreIncSlot, PreIncSlotVoid, AddAssignSlotVoid |
Int/float promotion: when either operand is float, both are promoted to f64. Cranelift emits iadd/fadd/fcvt_from_sint as needed. Runtime helpers for Pow (wrapping integer + f64::powf) and Mod (float fmod).
[0x08] VALUE REPRESENTATION
Value is a tagged enum with fast-path immediates:
| Variant | Representation | Size |
|---|---|---|
Undef |
Tag only | 0 bytes payload |
Int(i64) |
Inline | 8 bytes |
Float(f64) |
Inline | 8 bytes |
Bool(bool) |
Inline | 1 byte |
Str(Arc<String>) |
Heap | pointer |
Array(Vec<Value>) |
Heap, in-place mutation | 3 words |
Hash(HashMap<String, Value>) |
Heap, in-place mutation | 7 words |
Status(i32) |
Inline | 4 bytes |
Ref(Box<Value>) |
Heap | pointer |
NativeFn(u16) |
Inline | 2 bytes |
String coercion returns Cow<str> via as_str_cow() — borrows the inner Arc<String> for Str variants, avoiding allocation on string comparisons, concatenation, hash key lookup, and I/O.
Array and hash mutations (ArrayPush, ArrayPop, ArrayShift, ArraySet, HashSet, HashDelete) operate in-place on globals — no clone-modify-writeback cycle. Read-only access (ArrayGet, ArrayLen, HashGet, HashExists, HashKeys, HashValues) borrows directly from the globals vector.
[0x09] BENCHMARKS
All benchmarks run via criterion on Apple M-series. cargo bench for all, cargo bench --features jit --bench jit_vs_interp for JIT comparisons. HTML report at target/criterion/report/index.html.
Classic algorithms
| Benchmark | Time | Ops/sec |
|---|---|---|
fib_iterative(35) |
2.7 µs | 374k |
fib_recursive(20) — 21,891 calls |
1.28 ms | 783 |
ackermann(3,4) — 10,547 calls |
774 µs | 1.3k |
sum(1..1M) fused AccumSumLoop |
142 ns | 7.0M |
sum(1..1M) unfused loop ops |
31.0 ms | 32 |
nested_loop(100×100) |
352 µs | 2.8k |
dispatch_nop_1M — raw dispatch overhead |
819 µs | 1.22 Gops/sec |
string_build(10k) via ConcatConstLoop |
11.9 µs | 84k |
Interpreter vs Cranelift JIT vs native Rust
Slot-based inputs prevent constant folding — honest apples-to-apples comparison:
| Workload | Interpreter | JIT (cached) | Native Rust | JIT vs interp | JIT vs native |
|---|---|---|---|---|---|
slot_mixed × 100 |
2.2 µs | 75 ns | 42 ns | 29x faster | 1.8x slower |
slot_bitwise × 200 |
6.6 µs | 130 ns | 74 ns | 51x faster | 1.8x slower |
slot_float × 200 |
3.1 µs | 246 ns | 137 ns | 13x faster | 1.8x slower |
JIT cache lookup is O(1) — chunk hash precomputed at build time (24ns overhead). The JIT is consistently ~1.8x slower than LLVM -O3 on real computation and 13–51x faster than the interpreter. Being within 2x of LLVM is strong for a single-pass Cranelift JIT.
Tracking improvements
# ... make changes ...
[0xFF] LICENSE
MIT — Copyright (c) 2026 MenkeTechnologies