# WASM-PVM: WebAssembly to PolkaVM Recompiler
> **WARNING: This project is largely vibe-coded.**
> It was built iteratively with heavy AI assistance (Claude). While it has 412 passing integration tests and
> produces working PVM bytecode, the internals may contain unconventional patterns, over-engineering in some
> places, and under-engineering in others. Use at your own risk. Contributions and proper engineering reviews
> are very welcome!
A Rust compiler that translates WebAssembly (WASM) bytecode into [PolkaVM](https://github.com/paritytech/polkavm) (PVM) bytecode for execution on the [JAM](https://graypaper.com/) (Join-Accumulate Machine) protocol. Write your JAM programs in [AssemblyScript](https://www.assemblyscript.org/) (TypeScript-like), hand-written WAT, or any language that compiles to WASM — and run them on PVM.
```text
WASM ──► LLVM IR ──► PVM bytecode ──► JAM program (.jam)
inkwell mem2reg Rust backend
```
## Getting Started
### Prerequisites
- **Rust** (stable, edition 2024)
- **LLVM 18** — the compiler uses [inkwell](https://github.com/TheDan64/inkwell) (LLVM 18 bindings)
- macOS: `brew install llvm@18` then `export LLVM_SYS_181_PREFIX=/opt/homebrew/opt/llvm@18`
- Ubuntu: `apt install llvm-18-dev`
- **Bun** (for running integration tests and the JAM runner) — [bun.sh](https://bun.sh)
### Build
```bash
git clone https://github.com/tomusdrw/wasm-pvm.git
cd wasm-pvm
cargo build --release
```
### Hello World: Compile & Run
Create a simple WAT program that adds two numbers:
```wat
;; add.wat
(module
(memory 1)
(func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
;; Read two i32 args, add them, write result to memory
(i32.store (i32.const 0)
(i32.add
(i32.load (local.get $args_ptr))
(i32.load (i32.add (local.get $args_ptr) (i32.const 4)))))
(i64.const 17179869184))) ;; packed ptr=0, len=4
```
Compile it to a JAM blob and run it:
```bash
# Compile WAT → JAM
cargo run -p wasm-pvm-cli -- compile add.wat -o add.jam
# Run with two u32 arguments: 5 and 7 (little-endian hex)
npx @fluffylabs/anan-as run add.jam 0500000007000000
# Output: 0c000000 (12 in little-endian)
```
### Inspect the Output
Upload the resulting `.jam` file to the [**PVM Debugger**](https://github.com/fluffylabs/pvm-debugger) for step-by-step execution, disassembly, register inspection, and gas metering visualization.
### AssemblyScript Example
You can also write programs in AssemblyScript:
```typescript
// fibonacci.ts
export function main(args_ptr: i32, args_len: i32): i64 {
const buf = heap.alloc(256);
let n = load<i32>(args_ptr);
let a: i32 = 0;
let b: i32 = 1;
while (n > 0) {
b = a + b;
a = b - a;
n = n - 1;
}
store<i32>(buf, a);
return (buf as i64) | ((4 as i64) << 32); // packed ptr + len
}
```
Compile via the AssemblyScript compiler to WASM, then use `wasm-pvm-cli` to produce a JAM blob. See the `tests/fixtures/assembly/` directory for more examples.
## How It Works
The compiler pipeline:
Entry functions use a unified ABI: `main(args_ptr: i32, args_len: i32) -> i64`, where the return value packs the result pointer in the lower 32 bits and the result length in the upper 32 bits. The compiler unpacks this into PVM's SPI convention (`r7` = start address, `r8` = end address).
1. **Adapter merge** (optional) — merges a WAT adapter module into the WASM binary, replacing matching imports with adapter function bodies
2. **WASM → LLVM IR** — translates WASM opcodes to LLVM IR using [inkwell](https://github.com/TheDan64/inkwell) (LLVM 18 bindings), with PVM-specific intrinsics for memory operations
3. **LLVM optimization passes** — `mem2reg` (SSA promotion), `instcombine`, `simplifycfg`, `gvn`, `dce`, and optional function inlining
4. **LLVM IR → PVM bytecode** — a custom Rust backend reads LLVM IR and emits PVM instructions with per-block register caching (store-load forwarding)
5. **SPI assembly** — packages the bytecode into a JAM/SPI program blob with entry headers, jump tables, and data sections
### Key Design Decisions
- **Stack-slot approach with register allocation**: every SSA value gets a dedicated 8-byte memory offset from SP. A **linear-scan register allocator** assigns high-use values to available callee-saved registers r9-r12 when not used for this function's incoming parameters (and reserves r9+ needed for outgoing call arguments in non-leaf functions) to eliminate redundant memory traffic across block boundaries and loops
- **Per-block register cache**: eliminates redundant loads when a value is reused shortly after being computed (~50% gas reduction)
- **No `unsafe` code**: `deny(unsafe_code)` enforced at workspace level
- **No floating point**: PVM lacks FP support; WASM floats are rejected at compile time
- **All optimizations are toggleable**: `--no-llvm-passes`, `--no-peephole`, `--no-register-cache`, `--no-icmp-fusion`, `--no-shrink-wrap`, `--no-dead-store-elim`, `--no-const-prop`, `--no-inline`, `--no-cross-block-cache`, `--no-register-alloc`, `--no-fallthrough-jumps`
### Benchmark: Optimizations Impact
All PVM-level optimizations enabled (default):
| Benchmark | WASM size | JAM size | Code size | Gas Used |
|-----------|----------|----------|-----------|----------|
| add(5,7) | 68 B | 165 B | 99 B | 28 |
| fib(20) | 110 B | 227 B | 148 B | 409 |
| factorial(10) | 102 B | 199 B | 124 B | 156 |
| is_prime(25) | 162 B | 286 B | 201 B | 62 |
| AS fib(10) | 235 B | 631 B | 504 B | 245 |
| AS factorial(7) | 234 B | 616 B | 490 B | 207 |
| AS gcd(2017,200) | 229 B | 640 B | 517 B | 174 |
| AS decoder | 1.5 KB | 20.8 KB | 6,469 B | 635 |
| AS array | 1.4 KB | 19.9 KB | 5,740 B | 551 |
| regalloc two loops | 252 B | 588 B | 461 B | 16,769 |
| host-call-log | 171 B | 12.5 KB | 112 B | 42 |
| aslan-fib accumulate | - | 38.5 KB | 18,042 B | 11,044 |
| anan-as PVM interpreter | 54.6 KB | 155.5 KB | 106,577 B | - |
PVM-in-PVM: programs executed inside the anan-as PVM interpreter (outer gas cost):
| Benchmark | JAM Size | Outer Gas | Direct Gas | Overhead |
|-----------|----------|-----------|------------|----------|
| TRAP (interpreter overhead) | 21 B | 66,864 | - | - |
| add(5,7) | 165 B | 1,145,302 | 28 | 40,904x |
| host-call-log | 12.5 KB | 2,476,470 | 42 | 58,963x |
| AS fib(10) | 631 B | 1,476,903 | 245 | 6,028x |
| JAM-SDK fib(10)\* | 26.0 KB | 8,839,149 | - | - |
| Jambrains fib(10)\* | 62.6 KB | 6,365,026 | - | - |
| JADE fib(10)\* | 68.9 KB | 18,606,692 | - | - |
| aslan-fib accumulate\* | 38.5 KB | 17,726,490 | 11,044 | 1,605x |
\*JAM-SDK fib(10), Jambrains fib(10), JADE fib(10), and aslan-fib accumulate exit on unhandled host calls (ecalli). The gas cost reflects program parsing/loading plus partial execution up to the first unhandled ecalli.
## Memory layout summary
The JAM blob reserves separate ranges for RO data, a guard gap, globals/overflow metadata, and the WASM heap; see the [Architecture docs](docs/src/architecture.md#memory-layout) for the full breakdown, including `GLOBAL_MEMORY_BASE`, `PARAM_OVERFLOW_BASE`, `SPILLED_LOCALS_BASE`, and how `wasm_memory_base` is computed.
The SPI `rw_data` section is simply a contiguous copy of every byte from `GLOBAL_MEMORY_BASE` up to the highest initialized heap address, which is why stub AssemblyScript fixtures such as `decoder-test`/`array-test` emit ~13 KB of RW data even though only a handful of bytes are non-zero: the encoder must preserve the absolute addresses of the data segments, so the zero stretch between globals and the first heap byte is encoded verbatim. Keeping globals/data near the heap base or introducing sparse RW descriptors (future work) are the only ways to shrink those blobs without redesigning SPI.
## Supported WASM Features
| Category | Operations |
|----------|-----------|
| **Arithmetic** (i32 & i64) | add, sub, mul, div_u/s, rem_u/s, all comparisons, clz, ctz, popcnt, rotl, rotr, bitwise ops |
| **Control flow** | block, loop, if/else, br, br_if, br_table, return, unreachable, block results |
| **Memory** | load/store (all widths), memory.size, memory.grow, memory.fill, memory.copy, globals, data sections |
| **Functions** | call, call_indirect (with signature validation), recursion, stack overflow detection |
| **Type conversions** | wrap, extend_s/u, sign extensions (i32/i64 extend8/16/32_s) |
| **Imports** | Text-based import maps (`--imports`) and WAT adapter files (`--adapter`) |
**Not supported**: floating point (by design — PVM has no FP instructions).
## CLI Usage
```bash
# Compile WAT or WASM to JAM
wasm-pvm compile input.wat -o output.jam
wasm-pvm compile input.wasm -o output.jam
# With import resolution
wasm-pvm compile input.wasm -o output.jam \
--imports imports.txt \
--adapter adapter.wat
# Disable specific optimizations
wasm-pvm compile input.wasm -o output.jam --no-inline --no-peephole
# Disable all optimizations
wasm-pvm compile input.wasm -o output.jam \
--no-llvm-passes --no-peephole --no-register-cache \
--no-icmp-fusion --no-shrink-wrap --no-dead-store-elim \
--no-const-prop --no-inline --no-cross-block-cache \
--no-register-alloc
```
See the [Import Handling](#import-handling) section for details on resolving WASM imports.
## Using as a Library
The `wasm-pvm` crate can be used as a Rust dependency. It supports two modes:
```toml
# Full compiler (default) — requires LLVM 18
wasm-pvm = "0.5.2"
# PVM types only — no LLVM dependency, compiles to wasm32-unknown-unknown
wasm-pvm = { version = "0.5.2", default-features = false }
```
With `default-features = false`, only the PVM type definitions are available: `Instruction`, `Opcode`, `ProgramBlob`, `SpiProgram`, `abi::*`, `memory_layout::*`, and `Error`. This is useful for downstream tools that need to work with PVM bytecode (interpreters, debuggers, analyzers) without requiring the full LLVM compiler toolchain.
| Feature | Default | Description |
|---------|---------|-------------|
| `compiler` | Yes | Full WASM-to-PVM compiler (inkwell, wasmparser, wasm-encoder) |
| `test-harness` | Yes | Test utilities for unit testing (implies `compiler`) |
## Project Structure
```text
crates/
wasm-pvm/ # Core library
src/
pvm/ # PVM instruction definitions (always available)
memory_layout.rs # PVM memory address constants (always available)
spi.rs # JAM/SPI format encoder (always available)
abi.rs # Register & frame layout constants (always available)
llvm_frontend/ # WASM → LLVM IR translation (feature = "compiler")
llvm_backend/ # LLVM IR → PVM bytecode lowering (feature = "compiler")
translate/ # Compilation orchestration & SPI assembly (feature = "compiler")
wasm-pvm-cli/ # Command-line interface
tests/ # 412 integration tests (TypeScript/Bun)
fixtures/
wat/ # WAT test programs
assembly/ # AssemblyScript examples
imports/ # Import maps & adapter files
vendor/
anan-as/ # PVM interpreter (submodule)
```
## Testing
```bash
# Rust unit tests
cargo test
# Lint
cargo clippy -- -D warnings
# Integration tests (builds artifacts, then runs all layers)
cd tests && bun run test
# Quick validation (Layer 1 smoke tests only)
cd tests && bun test layer1/
```
The test suite is organized into layers:
- **Layer 1**: Core/smoke tests (~50 tests) — fast, run during development
- **Layer 2**: Feature tests (~140 tests)
- **Layer 3**: Regression/edge cases (~220 tests)
- **Layer 4-5**: PVM-in-PVM tests — the PVM interpreter itself compiled to PVM, running the test suite inside PVM
## Import Handling
WASM modules that import external functions need those imports resolved before compilation. Two mechanisms are available:
### Import Map (`--imports`)
A text file mapping import names to simple actions:
```text
# my-imports.txt
abort = trap # emit unreachable (panic)
console.log = nop # do nothing, return zero
```
### Adapter WAT (`--adapter`)
A WAT module whose exports replace matching imports, enabling arbitrary logic for import resolution (pointer conversion, memory reads, host calls):
```wat
(module
(import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
(import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))
(func (export "console.log") (param i32)
(drop (call $host_call_5
(i64.const 100) ;; ecalli index
(i64.const 3) ;; log level
(i64.const 0) (i64.const 0) ;; target ptr/len
(call $pvm_ptr (i64.extend_i32_u (local.get 0))) ;; message ptr
(i64.extend_i32_u (i32.load offset=0
(i32.sub (local.get 0) (i32.const 4))))))) ;; message len
)
```
When both `--imports` and `--adapter` are provided, the adapter runs first, then the import map handles remaining unresolved imports. All imports must be resolved or compilation fails.
## Resources
- **[PVM Debugger](https://github.com/fluffylabs/pvm-debugger)** — upload `.jam` files for disassembly, step-by-step execution, and register/gas inspection
- **[PVM Decompiler](https://github.com/tomusdrw/pvm-decompiler)** — decompile PVM bytecode back to human-readable form
- **[ananas (anan-as)](https://github.com/tomusdrw/anan-as)** — PVM interpreter written in AssemblyScript, compiled to PVM itself for PVM-in-PVM execution
- **[as-lan](https://github.com/tomusdrw/as-lan)** — example AssemblyScript project compiled from WASM to PVM using this tool
- **[JAM Gray Paper](https://graypaper.com/)** — the JAM protocol specification (PVM is defined in Appendix A)
- **[AssemblyScript](https://www.assemblyscript.org/)** — TypeScript-like language that compiles to WASM
- **[Documentation Book](docs/src/SUMMARY.md)** — full compiler docs (run `mdbook serve docs` to browse locally)
## License
[MIT](./LICENSE)
## Contributing
Contributions are welcome! See [AGENTS.md](./AGENTS.md) for coding guidelines, project conventions, and a map of the codebase.