kataan 0.0.4

A high-performance JavaScript engine written in pure Rust. Library, C FFI, and CLI.
Documentation
# Kataan Roadmap — the road to a complete JS + WASM engine

Kataan is a JavaScript engine written in pure Rust, usable three ways — a
standalone binary, a Rust library, and a C library. This roadmap describes **the
work that remains** to reach a fully complete, conformant, high-performance
JS+WASM engine. Finished foundations are summarized once (below) and not
re-litigated; everything after §1 is forward-looking.

---

## 0. Non-negotiables (unchanged)

- **Pure Rust, no foreign code on the critical path.** Crypto/TLS via
  [`purecrypto`]https://github.com/KarpelesLab/purecrypto, HTTP via
  [`rsurl`]https://github.com/KarpelesLab/rsurl; regex, Intl-lite collation, GC,
  and the WASM engine are all in-house.
- **`unsafe` is quarantined.** `unsafe_code = "deny"` (not `forbid`); only the
  `ffi` module and a small, audited set of VM/JIT/mmap hot-path primitives opt
  back in with a scoped `#[allow(unsafe_code)]` + safety comment.
- **Specification fidelity.** Conformance is measured against Test262 (JS) and
  the upstream WebAssembly spec suite (WASM); correctness is never knowingly
  traded for speed without a flag.
- **Deployable, host-native bytecode + heap.** Compiled bytecode — and an
  initialized heap — are first-class serializable artifacts: exportable,
  content-addressed, and `mmap`-reloadable zero-copy on a matching host, with an
  on-demand byte-swap conversion for a mismatched one (never a re-encode tax on
  the common path). See §6.

---

## 1. Where we are now (the finished base)

Treat this as done; build on it.

- **Front end:** complete lexer + full ECMAScript parser + AST.
- **Two execution engines that agree on every test:** a tree-walking interpreter
  (`nbexec`, the corpus/default engine) and a **register bytecode VM** (`nbvm`,
  the primary path for `kataan run` / `eval` / the C ABI), compiling nearly all of
  the common language. A dual-path conformance corpus (**510/510**, a curated
  Test262-style suite) runs on both.
- **Object model in production:** NaN-boxed values, hidden classes/shapes +
  transition tree, shape+slots objects, interned atoms, rope strings, inline-cache
  slots, a generational handle table, and a tracing GC — all live behind `Realm`,
  not a side experiment.
- **Bytecode codec (`KTBC`)** with a host-native header (byte order + pointer
  width), dedup constant pool, version/magic/truncation checks; survives
  serialize → reload → run.
- **D′ snapshot tier:** a verified codec that captures the live heap and
  `mmap`-reloads it over the **moving/compacting GC**, across eleven reference
  cell kinds (string, array, object, date, bigint, closure, collection, promise,
  proxy, regexp, symbol), cross-kind cycles, insertion-order-preserving.
- **Machine-code JIT (x86-64 / Linux, `jit`):** W^X executable memory via raw
  syscalls; an **optimizing integer path** (four-pass optimizer + register
  allocator) and a **float path** covering `+ - * / %`, comparisons, control flow,
  and the SSE-expressible `Math` intrinsics; each op has an `eval_*` oracle for
  differential testing.
- **WASM engine (`no_std`):** binary decode + validate + stack interpreter for
  the MVP plus sign-extension, saturating conversion, bulk-memory, multi-value,
  and typed structured control; a `.wast`/WAT spec harness drives a spec-derived
  corpus; the `WebAssembly` builtin exists.
- **Stdlib breadth:** Object/Array/String/Number/Math/JSON, Map/Set/WeakMap,
  Symbol, BigInt, Promise + (eager) async, Proxy/Reflect, partial typed arrays,
  Date, RegExp, `Function`, `new.target`. C ABI `kt_eval` runs scripts end-to-end.

The rest of this document is the gap between that base and "complete."

---

## 2. The three headline deliverables

These are the load-bearing items that make Kataan more than a fast interpreter.

### 2.1 Complete the machine-code JIT (for the *whole* VM, not just numbers)

Today the JIT compiles integer/float numeric functions and bails to the
interpreter for everything else. "Complete" means hot functions JIT regardless of
what they touch.

- **The re-entrancy + GC-safety substrate (the hard core).** Native code must be
  able to call back into `Realm` primitives (property get/set, allocation, string
  ops) and survive a moving GC mid-call. Concretely:
  - a calling convention and **stack maps** so the GC can find/relocate live
    object references held in JIT frames and registers across a safepoint;
  - safepoints at allocation/call sites; pinning or re-loading of relocatable
    handles after any call that can collect;
  - a sound `&mut Realm` re-entry path (no aliasing UB when native code re-enters
    the VM) — the current blocker that keeps object/string ops interpreter-only.
- **Non-numeric op lowering:** property access through inline-cache slots (shape
  guard → slot load, deopt on miss), array element load/store with the
  element-kind fast paths, string/rope ops, closure/upvalue access, `this`/scope.
- **Calls in JITed code:** direct + polymorphic call sites, native-builtin calls,
  argument/return marshaling, exception propagation across native frames.
- **Tiering & deopt:** a **baseline tier** (fast copy-and-patch / template
  compile of bytecode, no optimization, with on-stack replacement for hot loops)
  feeding a **profiling** layer; **deoptimization** back to bytecode when a
  speculative guard fails (the mechanism that makes speculative type
  specialization safe).
- **Shared native backend:** factor codegen into a Cranelift-style backend
  (register allocation, executable-memory mgmt, relocations) that both the JS JIT
  and the WASM compiler (Track 2.2) lower into. Designed to target **native or
  WASM output** (the sandbox fallback — JIT-by-emitting-WASM when hosted inside a
  WASM sandbox that forbids native codegen).
- **Portability of the JIT itself:** the current path is x86-64/Linux; the
  backend abstraction is what makes aarch64 and other OSes additive.

Exit criteria: a hot function doing object/array/string work runs as native code,
verified by differential execution against the interpreter and by the conformance
corpus passing with the JIT forced on; deopt round-trips proven; no GC-safety
holes under stress/fuzz.

### 2.2 A conformant (non-numeric) WASM engine + full upstream suite

Today the WASM engine passes a spec-*derived* corpus over the numeric/control
core. "Complete" means the full standardized feature set and the **upstream
WebAssembly spec test suite**.

- **Reference types & tables:** `funcref`/`externref`, `table.*` ops, multiple
  tables, `call_indirect` type checks and traps, element segments (active /
  passive / declarative), `ref.func`/`ref.null`/`ref.is_null`.
- **The JS ↔ WASM boundary (the "non-numeric" surface):** *Done:* `validate`,
  `instantiate` (→ `{module, instance}`), the `Module`/`Instance` constructors,
  `compile` (→ `Promise<Module>`), host-function imports (JS callable from WASM),
  **stateful instances** — mutable globals and linear memory persist across export
  calls, each instance independent — and the **`WebAssembly.Global`** object (typed
  value cell with a `.value` accessor, ToInt32/`BigInt` coercion, immutability
  enforced), **`WebAssembly.Memory`** (`.buffer` ArrayBuffer + `byteLength`,
  `grow(delta)` page accounting with content copy, `maximum` → RangeError), and
  **`WebAssembly.Table`** (`.length`, `get`/`set` of function refs, `grow(delta,
  init?)` → prior length, out-of-bounds + `maximum` → RangeError). *Remaining:*
  making `Memory.buffer` a *live shared view* with typed arrays/DataView (blocked
  on the separate typed-array ↔ ArrayBuffer backing-buffer work — today typed-array
  views don't share an ArrayBuffer's bytes). **Imported globals** are wired
  (`importObject[m][f]` from a `WebAssembly.Global` or a Number/BigInt, coerced to
  the declared type), and a module's **exported globals** are exposed as
  `WebAssembly.Global` objects (value + mutability; `instanceof` works for
  `Global`/`Memory`/`Table`/`Module`). *Remaining* imports/exports wiring: imported
  memories/tables, exposing a module's exported `Memory`/`Table`, `compileStreaming`/
  `instantiateStreaming`, and the `externref` bridge. Traps already surface as JS
  exceptions.
- **Post-MVP proposals (prioritized):** SIMD (`v128`), threads + `Atomics` on
  shared memory, multi-memory, tail calls, extended-const, then GC types and
  exception handling as they stabilize.
- **Compilation, not just interpretation:** lower validated WASM through the
  shared native backend (Track 2.1) for a real baseline/optimizing WASM tier, on
  the **shared GC and heap** (linear memory as a GC-tracked byte array;
  future WASM-GC objects on the same heap).
- **The test gate:** wire the **official `wabt`/spec `.wast` suite** into CI (not
  only the in-house corpus), tracking a pass-rate per proposal; fuzz the decoder
  and validator.

Exit criteria: the upstream spec suite passes for MVP + reference-types +
bulk-memory + multi-value + sign-ext + sat-conversion (and a tracked, growing set
of post-MVP proposals); JS↔WASM interop round-trips real modules; validator
rejects malformed modules safely under fuzzing.

### 2.3 The mmap-able zero-copy D′ layout, fully complete

Today the snapshot codec round-trips eleven cell kinds and `mmap`-reloads over the
moving GC. "Complete" means *any* live heap snapshots, reloads zero-copy, and
**executes** — and the bytecode code-cache it shares is production-grade.

- **Hidden-state cells.** *Done:* object capture now records **all own data
  properties** — non-enumerable ones and the engine's internal `\0`-prefixed slots
  — with a per-property hidden flag, so bound-function target/this/args, typed-array
  kind, error `stack`, and the `constructor` back-reference round-trip
  (`snapshots_preserve_non_enumerable_and_hidden_slots`), an object's
  `[[Prototype]]` link round-trips so `Object.create(p)` / inheritance chains
  resolve through the restored chain (`snapshots_preserve_prototype_links`), and
  accessor (getter/setter) properties round-trip with their functions and
  enumerability (`snapshots_preserve_accessor_properties`) — so object-cell fidelity
  is now complete (data + non-enumerable + internal slots + accessors + prototype).
  *Remaining:* generator/async suspension state (depends on Track 3 lazy frames);
  the `ArrayBuffer` backing-buffer *identity* shared across typed-array views —
  audit each remaining non-object `Cell` variant.
- **End-to-end restore-and-execute.** *Done:* a restored closure runs and carries
  its snapshotted captured state, both in place
  (`snapshot_restores_an_executable_closure`) and **across runtimes** — snapshot in
  one interpreter, serialize to bytes, drop it, then restore into a *fresh*
  interpreter holding the same code and call the closure, which resumes from the
  snapshotted state independent of the new runtime's own program instance
  (`snapshot_reloads_into_a_fresh_runtime`) — the load → evict → reload scenario.
  A **public library API** exposes this: `Interp::snapshot(&[roots]) -> bytes`
  and `Interp::restore_snapshot(&bytes) -> Vec<Value>`, tested cross-runtime through
  the supported surface alone (`public_snapshot_api_round_trips_across_runtimes`),
  with a malformed snapshot rejected rather than panicked on. A **C-ABI binding** of
  the same two calls is in — `kt_snapshot(source) -> bytes` and `kt_restore(bytes)
  -> string` (panic-guarded, length-convention, header-documented; round-trip
  tested through the C entry points). *Remaining:* persisting the bytes through the
  content-addressed artifact store below.
- **The shared, versioned, mmap-able artifact store (code-cache, §6):** *Started:*
  a content-addressed store keys serialized snapshots by an FNV-1a hash of their
  own bytes, so identical artifacts deduplicate and a fetch can re-verify the bytes
  still hash to the requested address (`snapshot::store::{ArtifactStore,
  content_address, address_hex}`; end-to-end capture → serialize → store → fetch →
  deserialize → restore tested). **Source-hash + host-tag keying** is in
  (`host_tag` encodes pointer width / byte order / arch; `host_keyed_address` and
  `ArtifactStore::{put,get}_for_host` resolve a source to a per-host artifact, so a
  host-native artifact never aliases another host's). *Remaining:*
  lazy per-function bodies faulted in
  on first call; **module-local atom remap** on load; IC slots load reset; the
  **on-demand byte-swap conversion** path for a mismatched host (convert once,
  re-cache, zero-copy thereafter); **read-only pages shareable across many
  concurrent contexts/processes** (the "hundreds of bases" model — immutable
  bytecode shared, each context owns its mutable heap/globals).
- **Untrusted-load verifier.** Loading bytecode *is* executing it: a verifier
  (bounds-checked jumps, constant/atom/register indices, stack-depth invariants)
  for untrusted artifacts; trusted-cache fast path relies on version tag +
  checksum. Safe-Rust means a bad index is a clean rejection, never UB.
- **Heap-snapshot startup.** Skip initialization (not just parsing) by booting
  from a snapshot of an initialized realm — the natural payoff once the above and
  the moving GC's pointer-relocation pass are solid.

Exit criteria: an arbitrary initialized heap snapshots, reloads zero-copy on a
matching host (and via one-time conversion on a mismatched one), and executes;
the code-cache passes the load → run → evict → reload churn + cross-tenant dedup
scenario; the verifier rejects malformed blobs under fuzzing.

---

## 3. Language conformance to "fully complete"

The corpus is curated (510 tests); completeness means the **full upstream
Test262** for the language (non-Intl) at a high, tracked pass-rate, with the known
semantic gaps closed.

- **Sparse arrays / holes.** The array model stores holes as `undefined`; a real
  empty-slot representation (sentinel or presence bitmap) must thread through
  literal construction, every iteration method, `in`/`hasOwnProperty`,
  `Object.keys`/`entries`, and `join`/`toString`. (Currently `Object.keys` on a
  hole array and `forEach`/`in` over holes are non-conformant.)
- **Lazy, suspendable generators.** Generators run *eagerly* into a value buffer:
  `.next(value)` injection and `.throw()` into a `try{ yield }catch` are
  unsupported, and the suspension model can't interleave. Replace with a real
  coroutine/state-machine so `yield` truly suspends.
- **Lazy async / correct microtask interleaving.** `await` runs its continuation
  synchronously instead of yielding to the caller as a microtask (values are
  correct; ordering is not). Same suspendable-frame work as generators.
- **First-class prototype methods.** `Array.prototype.map.call(arrayLike)` and
  extracting a method as a value (`[].slice`) don't work — methods are built-in
  dispatch, not function objects on real `…​.prototype` objects. `Array`/`Object`
  are namespace objects, so `typeof Array === "object"` (should be `"function"`)
  and `Array instanceof Function` is false. Needs real builtin prototype objects
  with installed, callable method functions and `this`-generic algorithms.
- **Remaining builtins / edges:** `Atomics` (+ `wait`/`notify`),
  `SharedArrayBuffer`, resizable/growable `ArrayBuffer`, complete `%TypedArray%`
  method set + `TypedArray.from`/`subarray`, `WeakRef`/`FinalizationRegistry` GC
  cooperation, `Symbol.toPrimitive`/well-known-symbol coverage, tagged-template
  and regex `d`/`v`-flag/named-group/lookaround completeness, JSON source-text
  access.
- **Intl-lite (`intl`):** flesh out `Intl.Collator`/`NumberFormat`/`DateTimeFormat`
  /`PluralRules`/`Segmenter` + locale negotiation over the embedded trimmed CLDR
  data.
- **The gate:** run the **full upstream Test262** in CI with a tracked pass-rate
  (target >95% of the non-Intl suite); fuzz parser, regex, JSON, and the VM.

---

## 4. Host runtime to a real runtime (`host`)

- **Event loop & timers:** a complete in-house loop (mio-style readiness or std
  threads), `setTimeout`/`setInterval`/`setImmediate` + `clear*`,
  `process.nextTick`, microtask checkpoint integration.
- **Modules:** ESM (static + dynamic `import`, `import.meta`, top-level `await`),
  CommonJS interop, JSON modules, import maps / resolution.
- **Web platform:** `TextEncoder`/`Decoder`, `atob`/`btoa`, `Buffer`,
  `URL`/`URLSearchParams`, WHATWG streams, `structuredClone`,
  `performance.now`/marks.
- **`fetch`** (+ `Headers`/`Request`/`Response`/`Blob`/`FormData`) over `rsurl`
  with `purecrypto` TLS; **`crypto`** (`getRandomValues`/`randomUUID`/`subtle`)
  over `purecrypto`.
- **Node-compat subset:** `fs`/`path`/`os`/`net`/`http(s)`/`events`/`stream`/
  `util`/`process`/`Buffer` — a useful subset, gaps documented.

---

## 5. Performance frontier

- **Interpreter pass:** IC tuning, array element-kind fast paths, string-rope
  tuning, the generational/moving GC upgrades that bump-allocation `new` relies
  on (the moving GC exists for D′; finish its generational nursery + write
  barriers for throughput).
- **Optimizing JIT (after the baseline tier in 2.1):** an SSA IR with inlining,
  escape analysis, range/redundancy elimination, type-feedback-driven speculation
  with guard-based deopt — lowering through the shared backend. The point where we
  contend with V8 on compute-bound code.
- **Benchmarks:** SunSpider/Kraken-style microbenchmarks + realistic scripts;
  cold-start vs `node -e`; code-cache load/evict/reload throughput; per-object
  memory vs equivalent V8 heaps.

---

## 6. Design invariants that still constrain remaining work

Kept from the original architecture because Tracks 2.1–2.3 must honor them.

- **Own JS bytecode + VM; WASM is a *peer* engine sharing the backend, not a
  compile target.** Routing JS through WASM would forfeit guard-based deopt — the
  mechanism that makes a JS optimizing JIT fast. The two engines share the **GC +
  heap, the native backend, the value/interop boundary, and the host runtime**;
  the two bytecodes stay distinct (JS = dynamic/deopt-friendly, WASM = statically
  typed). The one place emitting WASM *for JS* is correct: the sandbox fallback in
  2.1.
- **Serializable, host-native artifacts (the code-cache + heap snapshot).** A
  serialized unit is position-independent (cross-referenced by index, never by
  live pointer), with host-native integer encoding and an explicit, cheap,
  on-demand byte-swap conversion for a mismatched host — *not* a slow canonical
  encoding everyone pays for, and *not* a recompile. Versioned + integrity-checked
  with two distinct mismatch paths: version/flags mismatch or bad checksum →
  recompile; host-encoding mismatch, same version → convert. Atoms are
  module-local and remapped on load; IC slots and shapes are runtime state, never
  serialized; function bodies are lazy. The JS bytecode cache and the compiled-WASM
  cache share **one** artifact store.

---

## 7. Definition of done

Kataan is "complete" when:

1. **JS:** the full upstream Test262 (non-Intl) passes at a high tracked rate on
   both engines; the documented semantic gaps (holes, lazy generators/async,
   first-class prototypes) are closed; `Atomics`/`SharedArrayBuffer`/typed-array
   completeness and the host runtime land.
2. **JIT:** hot functions JIT regardless of object/string/closure content, with a
   baseline + optimizing tier, type-feedback speculation, sound deopt, and a
   GC-safe native re-entry substrate — proven against the interpreter and under
   fuzz.
3. **WASM:** the upstream WebAssembly spec suite passes for MVP + reference-types
   + the implemented proposals, with full JS↔WASM interop and a compiling tier on
   the shared backend.
4. **D′ / code-cache:** any initialized heap snapshots, `mmap`-reloads zero-copy
   (or via one-time conversion), and executes; the content-addressed,
   verifier-guarded, cross-context-shareable artifact store passes the
   hundreds-of-bases churn scenario.
5. **Always:** pure Rust, `unsafe` quarantined and audited; CI green across the
   feature matrix; embeddable in <30 lines via the Rust and C APIs.

---

## 8. Reused Karpelès Lab crates

- **`purecrypto`**`crypto.subtle`/WebCrypto, `getRandomValues`, `randomUUID`,
  TLS. No foreign crypto.
- **`rsurl`** — HTTP/HTTPS behind `fetch` and the Node `http(s)` compat layer.

Patterns shared: tri-modal lib/CLI/C-FFI packaging, `unsafe` quarantine,
feature-gated layered modules, sans-I/O core, cargo-fuzz harnesses.