WASM-PVM: WebAssembly to PolkaVM Recompiler
WARNING: This project is largely vibe-coded. It was built iteratively with heavy AI assistance (Claude). While it has 412 passing integration tests and produces working PVM bytecode, the internals may contain unconventional patterns, over-engineering in some places, and under-engineering in others. Use at your own risk. Contributions and proper engineering reviews are very welcome!
A Rust compiler that translates WebAssembly (WASM) bytecode into PolkaVM (PVM) bytecode for execution on the JAM (Join-Accumulate Machine) protocol. Write your JAM programs in AssemblyScript (TypeScript-like), hand-written WAT, or any language that compiles to WASM — and run them on PVM.
WASM ──► LLVM IR ──► PVM bytecode ──► JAM program (.jam)
inkwell mem2reg Rust backend
Getting Started
Prerequisites
- Rust (stable, edition 2024)
- LLVM 18 — the compiler uses inkwell (LLVM 18 bindings)
- macOS:
brew install llvm@18thenexport LLVM_SYS_181_PREFIX=/opt/homebrew/opt/llvm@18 - Ubuntu:
apt install llvm-18-dev
- macOS:
- Bun (for running integration tests and the JAM runner) — bun.sh
Build
Hello World: Compile & Run
Create a simple WAT program that adds two numbers:
;; add.wat
(module
(memory 1)
(func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
;; Read two i32 args, add them, write result to memory
(i32.store (i32.const 0)
(i32.add
(i32.load (local.get $args_ptr))
(i32.load (i32.add (local.get $args_ptr) (i32.const 4)))))
(i64.const 17179869184))) ;; packed ptr=0, len=4
Compile it to a JAM blob and run it:
# Compile WAT → JAM
# Run with two u32 arguments: 5 and 7 (little-endian hex)
# Output: 0c000000 (12 in little-endian)
Inspect the Output
Upload the resulting .jam file to the PVM Debugger for step-by-step execution, disassembly, register inspection, and gas metering visualization.
AssemblyScript Example
You can also write programs in AssemblyScript:
// fibonacci.ts
export function main(args_ptr: i32, args_len: i32): i64 {
const buf = heap.alloc(256);
let n = load<i32>(args_ptr);
let a: i32 = 0;
let b: i32 = 1;
while (n > 0) {
b = a + b;
a = b - a;
n = n - 1;
}
store<i32>(buf, a);
return (buf as i64) | ((4 as i64) << 32); // packed ptr + len
}
Compile via the AssemblyScript compiler to WASM, then use wasm-pvm-cli to produce a JAM blob. See the tests/fixtures/assembly/ directory for more examples.
How It Works
The compiler pipeline:
Entry functions use a unified ABI: main(args_ptr: i32, args_len: i32) -> i64, where the return value packs the result pointer in the lower 32 bits and the result length in the upper 32 bits. The compiler unpacks this into PVM's SPI convention (r7 = start address, r8 = end address).
- Adapter merge (optional) — merges a WAT adapter module into the WASM binary, replacing matching imports with adapter function bodies
- WASM → LLVM IR — translates WASM opcodes to LLVM IR using inkwell (LLVM 18 bindings), with PVM-specific intrinsics for memory operations
- LLVM optimization passes —
mem2reg(SSA promotion),instcombine,simplifycfg,gvn,dce, and optional function inlining - LLVM IR → PVM bytecode — a custom Rust backend reads LLVM IR and emits PVM instructions with per-block register caching (store-load forwarding)
- SPI assembly — packages the bytecode into a JAM/SPI program blob with entry headers, jump tables, and data sections
Key Design Decisions
- Stack-slot approach with register allocation: every SSA value gets a dedicated 8-byte memory offset from SP. A linear-scan register allocator assigns high-use values to available callee-saved registers r9-r12 when not used for this function's incoming parameters (and reserves r9+ needed for outgoing call arguments in non-leaf functions) to eliminate redundant memory traffic across block boundaries and loops
- Per-block register cache: eliminates redundant loads when a value is reused shortly after being computed (~50% gas reduction)
- No
unsafecode:deny(unsafe_code)enforced at workspace level - No floating point: PVM lacks FP support; WASM floats are rejected at compile time
- All optimizations are toggleable:
--no-llvm-passes,--no-peephole,--no-register-cache,--no-icmp-fusion,--no-shrink-wrap,--no-dead-store-elim,--no-const-prop,--no-inline,--no-cross-block-cache,--no-register-alloc,--no-fallthrough-jumps
Benchmark: Optimizations Impact
All PVM-level optimizations enabled (default):
| Benchmark | WASM size | JAM size | Code size | Gas Used |
|---|---|---|---|---|
| add(5,7) | 68 B | 201 B | 130 B | 39 |
| fib(20) | 110 B | 270 B | 186 B | 612 |
| factorial(10) | 102 B | 242 B | 161 B | 269 |
| is_prime(25) | 162 B | 328 B | 239 B | 80 |
| AS fib(10) | 234 B | 708 B | 572 B | 324 |
| AS factorial(7) | 233 B | 697 B | 562 B | 281 |
| AS gcd(2017,200) | 228 B | 686 B | 558 B | 190 |
| AS decoder | 1.5 KB | 20.8 KB | 6.8 KB | 721 |
| AS array | 1.4 KB | 19.9 KB | 6.0 KB | 623 |
| aslan-fib accumulate | 7.8 KB | 37.1 KB | 17.6 KB | 15,968 |
| anan-as PVM interpreter | 57.7 KB | 180.2 KB | 127.8 KB | - |
PVM-in-PVM: programs executed inside the anan-as PVM interpreter (outer gas cost):
| Benchmark | JAM Size | Code Size | Outer Gas | Direct Gas | Overhead |
|---|---|---|---|---|---|
| TRAP (interpreter overhead) | 21 B | 1 B | 80,577 | - | - |
| add(5,7) | 201 B | 130 B | 1,238,302 | 39 | 31,751x |
| AS fib(10) | 708 B | 572 B | 1,753,546 | 324 | 5,412x |
| JAM-SDK fib(10)* | 25.4 KB | 16.2 KB | 7,230,603 | 42 | 172,157x |
| Jambrains fib(10)* | 61.1 KB | - | 6,373,683 | 1 | 6,373,683x |
| JADE fib(10)* | 67.3 KB | 45.7 KB | 19,555,955 | 504 | 38,801x |
| aslan-fib accumulate* | 37.1 KB | 17.6 KB | 10,511,413 | 15,968 | 658x |
*JAM-SDK fib(10), Jambrains fib(10), JADE fib(10), and aslan-fib accumulate exit on unhandled host calls (ecalli). The gas cost reflects program parsing/loading plus partial execution up to the first unhandled ecalli.
Memory layout summary
The JAM blob reserves separate ranges for RO data, a guard gap, globals/overflow metadata, and the WASM heap; see the Architecture docs for the full breakdown, including GLOBAL_MEMORY_BASE, PARAM_OVERFLOW_BASE, SPILLED_LOCALS_BASE, and how wasm_memory_base is computed.
The SPI rw_data section is simply a contiguous copy of every byte from GLOBAL_MEMORY_BASE up to the highest initialized heap address, which is why stub AssemblyScript fixtures such as decoder-test/array-test emit ~13 KB of RW data even though only a handful of bytes are non-zero: the encoder must preserve the absolute addresses of the data segments, so the zero stretch between globals and the first heap byte is encoded verbatim. Keeping globals/data near the heap base or introducing sparse RW descriptors (future work) are the only ways to shrink those blobs without redesigning SPI.
Supported WASM Features
| Category | Operations |
|---|---|
| Arithmetic (i32 & i64) | add, sub, mul, div_u/s, rem_u/s, all comparisons, clz, ctz, popcnt, rotl, rotr, bitwise ops |
| Control flow | block, loop, if/else, br, br_if, br_table, return, unreachable, block results |
| Memory | load/store (all widths), memory.size, memory.grow, memory.fill, memory.copy, globals, data sections |
| Functions | call, call_indirect (with signature validation), recursion, stack overflow detection |
| Type conversions | wrap, extend_s/u, sign extensions (i32/i64 extend8/16/32_s) |
| Imports | Text-based import maps (--imports) and WAT adapter files (--adapter) |
Not supported: floating point (by design — PVM has no FP instructions).
CLI Usage
# Compile WAT or WASM to JAM
# With import resolution
# Disable specific optimizations
# Disable all optimizations
See the Import Handling section for details on resolving WASM imports.
Using as a Library
The wasm-pvm crate can be used as a Rust dependency. It supports two modes:
# Full compiler (default) — requires LLVM 18
= "0.5.2"
# PVM types only — no LLVM dependency, compiles to wasm32-unknown-unknown
= { = "0.5.2", = false }
With default-features = false, only the PVM type definitions are available: Instruction, Opcode, ProgramBlob, SpiProgram, abi::*, memory_layout::*, and Error. This is useful for downstream tools that need to work with PVM bytecode (interpreters, debuggers, analyzers) without requiring the full LLVM compiler toolchain.
| Feature | Default | Description |
|---|---|---|
compiler |
Yes | Full WASM-to-PVM compiler (inkwell, wasmparser, wasm-encoder) |
test-harness |
Yes | Test utilities for unit testing (implies compiler) |
Project Structure
crates/
wasm-pvm/ # Core library
src/
pvm/ # PVM instruction definitions (always available)
memory_layout.rs # PVM memory address constants (always available)
spi.rs # JAM/SPI format encoder (always available)
abi.rs # Register & frame layout constants (always available)
llvm_frontend/ # WASM → LLVM IR translation (feature = "compiler")
llvm_backend/ # LLVM IR → PVM bytecode lowering (feature = "compiler")
translate/ # Compilation orchestration & SPI assembly (feature = "compiler")
wasm-pvm-cli/ # Command-line interface
tests/ # 412 integration tests (TypeScript/Bun)
fixtures/
wat/ # WAT test programs
assembly/ # AssemblyScript examples
imports/ # Import maps & adapter files
vendor/
anan-as/ # PVM interpreter (submodule)
Testing
# Rust unit tests
# Lint
# Integration tests (builds artifacts, then runs all layers)
&&
# Quick validation (Layer 1 smoke tests only)
&&
The test suite is organized into layers:
- Layer 1: Core/smoke tests (~50 tests) — fast, run during development
- Layer 2: Feature tests (~140 tests)
- Layer 3: Regression/edge cases (~220 tests)
- Layer 4-5: PVM-in-PVM tests — the PVM interpreter itself compiled to PVM, running the test suite inside PVM
Import Handling
WASM modules that import external functions need those imports resolved before compilation. Two mechanisms are available:
Import Map (--imports)
A text file mapping import names to simple actions:
# my-imports.txt
abort = trap # emit unreachable (panic)
console.log = nop # do nothing, return zero
Adapter WAT (--adapter)
A WAT module whose exports replace matching imports, enabling arbitrary logic for import resolution (pointer conversion, memory reads, host calls):
(module
(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))
(func (export "console.log") (param i32)
(drop (call $host_call_5
(i64.const 100) ;; ecalli index
(i64.const 3) ;; log level
(i64.const 0) (i64.const 0) ;; target ptr/len
(call $pvm_ptr (i64.extend_i32_u (local.get 0))) ;; message ptr
(i64.extend_i32_u (i32.load offset=0
(i32.sub (local.get 0) (i32.const 4))))))) ;; message len
)
When both --imports and --adapter are provided, the adapter runs first, then the import map handles remaining unresolved imports. All imports must be resolved or compilation fails.
Resources
- PVM Debugger — upload
.jamfiles for disassembly, step-by-step execution, and register/gas inspection - PVM Decompiler — decompile PVM bytecode back to human-readable form
- ananas (anan-as) — PVM interpreter written in AssemblyScript, compiled to PVM itself for PVM-in-PVM execution
- as-lan — example AssemblyScript project compiled from WASM to PVM using this tool
- JAM Gray Paper — the JAM protocol specification (PVM is defined in Appendix A)
- AssemblyScript — TypeScript-like language that compiles to WASM
- Documentation Book — full compiler docs (run
mdbook serve docsto browse locally)
License
Contributing
Contributions are welcome! See AGENTS.md for coding guidelines, project conventions, and a map of the codebase.