splicer 2.0.2 - Docs.rs

# Adapter generation — architecture

Low-level map of the code that produces a tier-1 adapter component.
Companion doc: [`adapter-components.md`](./adapter-components.md) is the
user-facing explainer; this file is for contributors working on the
generator itself.

## Mission

Given

- the target interface (cviz `InterfaceType::Instance`),
- a split `.wasm` that imports or exports it,
- the set of `splicer:tier1/*` interfaces the middleware exports,

emit a WebAssembly Component binary that:

- Re-exports the target interface unchanged (drop-in replacement for
  the upstream caller).
- Imports the target interface from a handler-providing component.
- Imports the middleware's tier-1 hooks (`before`, `after`, `blocking`).
- For each function in the target interface, wraps it with the hooks'
  before/after/blocking phases, handling the canonical-ABI lift/lower
  and async machinery transparently.

## Design thesis

**Splicer does not implement the Component Model's canonical ABI.** It
consumes one. The adapter generator's job is to:

1. Know what *shape* the adapter should have (which hooks fire, which
   handler gets called, how the phases sequence).
2. Emit the wasm to *drive* a canonical-ABI implementation someone else
   owns.

The canonical-ABI authority lives in two upstream crates:

- [`wit-parser`] — type model, `SizeAlign` for canonical-ABI layout
  (size / align / field offsets / variant payload offsets),
  `Resolve::push_flat` for flattening.
- [`wit-bindgen-core::abi`] — instruction-level codegen via the
  `Bindgen` trait. Walks a type, emits an abstract instruction stream
  (`I32Load { offset }`, `VariantLift { … }`, `RecordLift { … }`,
  `FixedLengthListLiftFromMemory { … }`, etc.).

Splicer implements `Bindgen` against `wasm-encoder::Instruction`. Every
canonical-ABI decision — walk order, offsets, discriminant widths,
joined flat shapes, widening rules — comes from upstream. Splicer's
implementation is a transcriber: abstract `Instruction` → concrete wasm
opcode.

When upstream adds a new canonical-ABI feature, splicer picks it up
via `cargo update` with at most one new `emit` match arm. When the
upstream types grow a new variant splicer doesn't handle, Rust's
exhaustive-match rule fails at compile time — **loud at build, silent
never.**

[`wit-parser`]: https://docs.rs/wit-parser
[`wit-bindgen-core::abi`]: https://docs.rs/wit-bindgen-core

## Module layout

```
src/adapter/
├── abi/                  — canonical-ABI abstraction
│   ├── bindgen.rs        — WasmEncoderBindgen (Bindgen impl)
│   ├── bridge.rs         — WitBridge (cviz → wit-parser translator)
│   └── compat.rs         — verbatim cast / flat_types (pending [PR #1597])
├── build/                — wasm binary emission
│   ├── component.rs      — build_adapter_bytes (outer Component, 13 phases)
│   ├── dispatch.rs       — inner dispatch core module
│   ├── encoders.rs       — component-level type-section encoders
│   ├── mem_layout.rs     — MemoryLayoutBuilder (scratch-memory allocator)
│   └── ty.rs             — prim_cv, val_type_byte_size, align_to_val
├── filter/               — closure-based split dep walker + raw-sections re-encoder
├── func.rs               — AdapterFunc value object
├── indices.rs            — ComponentIndices / DispatchIndices / FunctionIndices
├── names.rs              — stable import/export name strings
├── tests.rs              — integration tests
└── mod.rs                — generate_tier1_adapter entry
```

Two layers (`abi/`, `build/`) plus three cross-cutting root files
(`func.rs`, `indices.rs`, `names.rs`) and a `filter/` module that
stands on its own.

### Layer responsibilities

**`abi/` — spec-consuming.** Encodes knowledge *of* the canonical ABI
by importing `wit-parser` and `wit-bindgen-core`. Never touches
`wasm-encoder` section builders directly except for individual
`Instruction` opcodes inside the `Bindgen` impl. Nothing in `abi/`
knows about "the adapter's shape" — it's generic lift-from-memory
machinery.

**`build/` — wasm-emitting.** Knows about the adapter's shape (13
phases, hook sequencing, nested core modules, name conventions) and
uses `wasm-encoder` sections to assemble the final binary. Consumes
`abi/` for the lift-from-memory instruction bytes at the one spot
that needs them (`task.return` load).

**Cross-cutting (root).** `AdapterFunc` is a per-function value object
produced by `func.rs::extract_adapter_funcs`. `indices.rs` holds three
running-index allocators (one per namespace: outer component, dispatch
core module, wasm function locals). `names.rs` centralizes string
constants. Used by both layers.

## Type-flow: cviz arena → emitted wasm

```
cviz::TypeArena                        src/adapter/mod.rs entry
      │
      │ (splicer's internal type model, populated from the split)
      ▼
WitBridge::from_cviz(&arena)          abi/bridge.rs
      │
      │ Walks every ValueTypeId in the arena, allocates a wit_parser
      │ TypeDef per compound type, records a HashMap<ValueTypeId, Type>.
      │ Types insert children-first so Resolve::types stays topologically
      │ ordered (SizeAlign::fill requires this).
      ▼
wit_parser::Resolve + SizeAlign       (owned by WitBridge)
      │
      │ Every canonical-ABI query now goes through this. WitBridge
      │ exposes `size_bytes`, `flat_types`, `has_strings`, `has_lists`
      │ wrappers so splicer consumers don't import wit-parser directly.
      ▼
AdapterFunc list                      func.rs::extract_adapter_funcs
      │
      │ Per-function resolution: param/result type ids, core-wasm flat
      │ signature (via push_flat), result buffer size, has-strings /
      │ has-lists predicates. Also allocates the initial bytes of the
      │ dispatch module's scratch memory (function-name blob +
      │ per-function result buffers).
      ▼
build_adapter_bytes                   build/component.rs
      │
      │ The 13 phases assemble the outer Component: type / import /
      │ alias sections, handler instance type, canon lift/lower,
      │ embed mem module + dispatch module, wire instances, export.
      ▼
build_dispatch_module                 build/dispatch.rs
      │
      │ Emits the inner core-wasm module: per-function wrapper bodies
      │ with hook phases + async wait-loops + task.return. For async
      │ funcs with a result, pre-runs WasmEncoderBindgen on the result
      │ type to get the instruction sequence that task.return needs.
      ▼
Final .wasm bytes
```

### Where the Bindgen actually fires

Just one spot: `build/dispatch.rs::build_task_return_loads`. For each
async function whose result is non-void, we:

1. Allocate an i32 local via `FunctionIndices::alloc_local` to hold
   the result buffer's base address.
2. Emit `I32Const(result_ptr); LocalSet(addr_local)` to stash it.
3. Construct a `WasmEncoderBindgen` over the `&bridge.sizes` and
   `&mut indices` (so any locals the bindgen needs for variant
   dispatch / fixed-size-list iteration are allocated into the same
   function-local space).
4. Call `wit_bindgen_core::abi::lift_from_memory(&bridge.resolve,
   &mut bindgen, (), &result_type)`.
5. `bindgen.into_instructions()` — a `Vec<wasm_encoder::Instruction>`
   that, when flushed into the function body, leaves the joined flat
   representation of the result on the wasm value stack, ready for
   the `task.return` call that follows.

All of the canonical-ABI heavy lifting — walking the type, picking
load widths, computing offsets, dispatching variant arms, widening
arm flats to the joined flat, unrolling fixed-size-list iteration —
happens inside upstream's `read_from_memory` via our `Bindgen::emit`.

## `WasmEncoderBindgen` — design notes

Key invariants:

- **`Operand = ()`**. The wasm value stack is the source of truth.
  The generator's internal operand stack tracks *counts*, not
  identities. Splicer's emit arms pop/push placeholders to match each
  `Instruction` variant's declared arity.

- **Address handling by local.** The base address lives in
  `addr_local` (or a per-iteration `iter_addr_local` for fixed-size
  lists); every load emit funnels through `emit_load`, which emits
  `local.get $addr; <load> offset=N`. The generator's abstract
  address operand can be cloned freely because we never pop a wasm
  value for it — each load re-reads from the local.

- **Block-capture IR.** `push_block` / `finish_block` redirect emits
  into an `ActiveBlock` buffer; `finish_block` stashes the buffer in
  `completed_blocks` for the variant / fixed-size-list lift to
  consume. Variant emits splice captured arm bodies inside a
  `block ... br_table ... end` structure; fixed-size-list emits
  replay the single element-read body N times with the address local
  advanced by `elem_size` each iteration.

- **Local allocation is shared with the outer function.** The
  `Bindgen` borrows `&mut FunctionIndices` from the caller, so every
  local it allocates (disc locals for variants, payload locals for
  widening stash, iter address locals for fixed-size lists) lands in
  the *same* contiguous local-index space as the dispatch module's
  own locals (subtask / waitable-set). The caller calls
  `indices.into_locals()` once when constructing the `Function`.

See the module docstring in `src/adapter/abi/bindgen.rs` for the full
treatment, including the block-capture rationale and the fixed-size
vs dynamic list table.

## Heterogeneous variants and joined flat

Variant / option / result arms can have different flat shapes. E.g.:

- `result<u8, u64>` — ok arm flats to `[i32]`, err arm flats to
  `[i64]`. Joined payload: `[i64]`. Ok arm's load must be widened via
  `i64.extend_i32_u`.

- `result<string, u64>` — ok flats to `[Pointer, Length]`, err flats
  to `[I64]`. Joined payload: `[PointerOrI64, Length]`. Ok arm's
  Pointer at position 0 is i32 at the wasm level; PointerOrI64 is
  i64. Widening: `i64.extend_i32_u`.

The widening table lives in `abi/compat.rs::cast` (verbatim copy of
`wit-bindgen-core`'s private `fn cast`). The `Bindgen`'s
`emit_bitcast` maps each `Bitcast` variant to its wasm opcode. **Key
subtlety on wasm32**: `Pointer` and `Length` collapse to `i32` but
`PointerOrI64` collapses to `i64`, so the four cross-boundary casts
(`PToP64`, `LToI64`, `P64ToP`, `I64ToL`) need `i64.extend_i32_u` /
`i32.wrap_i64` — not no-ops. Tested by
`lift_result_string_u64_widens_pointer_to_pointer_or_i64` and
`lift_result_list_u64_widens_pointer_to_pointer_or_i64` in
`abi/bindgen.rs`.

## Dispatch module — what splicer still owns

The `abi/` layer is generic lift-from-memory. The *shape* of the
adapter — which is splicer's unique value-add — lives in
`build/dispatch.rs`:

- Per-function wrapper body, sequencing five phases: before,
  blocking, handler call, after, return.
- Async wait-loop emission (`waitable-set.wait` blocks) for hook
  subtasks and async handler calls.
- `task.return` wiring: custom wasm function types when the result
  flattens to multiple values, shared `void → ()` / `(i32) → ()`
  types for common cases, per-func import aliases.
- Name-blob data segment + function-name hook invocations.
- Nested core module 0 (memory provider, optionally with a bump
  realloc) whose exports `mem` / `realloc` are aliased out and used
  as canon-lift/lower options.

None of this is canonical-ABI logic — it's adapter policy and
component-model plumbing. Splicer owns it because it's the *shape*
nobody else generates.

## `abi/compat.rs` — a temporary borrow

`wit-bindgen-core`'s private helpers `cast(WasmType, WasmType) ->
Bitcast` and `flat_types(&Resolve, &Type, Option<usize>) ->
Option<Vec<WasmType>>` aren't part of its public API. Splicer's
variant widening needs both. `abi/compat.rs` contains verbatim copies
of those two functions (plus the `MAX_FLAT_PARAMS` constant).

Visibility-flip PR filed upstream:
<https://github.com/bytecodealliance/wit-bindgen/pull/1597>. When it
merges, delete `abi/compat.rs` and change `abi/bindgen.rs` to import
`wit_bindgen_core::abi::{cast, flat_types}` directly. The two
functions reference only already-public types (`WasmType`, `Bitcast`,
`Resolve`, `Type`), so the flip is semantically trivial.

## Supported WIT types for async results

Everything except Map. Specifically:

- All primitives (bool, s8..s64, u8..u64, f32, f64, char, string,
  error-context)
- Records, tuples at any nesting
- Enums, flags (any case/flag count)
- Resources (Own / Borrow)
- Futures, Streams (as i32 handles)
- Dynamic lists, strings
- Variants, options, results — including heterogeneous arms with
  `Pointer`/`Length` ↔ `PointerOrI64` widening
- Fixed-size lists — unrolled N-element reads with per-iteration
  address advancement
- Any nesting of the above

**`Map<K, V>`**: `wit-bindgen-core`'s `read_from_memory` currently
has `TypeDefKind::Map(..) => todo!()`. An async result type
containing a Map would panic at lift time. Workaround: WIT authors
can use `list<tuple<K, V>>` instead. Fixing this requires a
`wit-bindgen-core` patch.

The sync function path and function parameter handling flow through
component-model `canon lift` / `canon lower` opcodes, which are the
runtime's responsibility. Splicer never emits instruction-level
lift/lower for those — it just declares the lift/lower operations
via `CanonicalFunctionSection`, and the runtime does the rest. So
those paths have always handled whatever types the component model
supports.

## How canonical-ABI evolution affects the code

Three failure modes, in order of frequency:

1. **New `TypeDefKind` upstream.** `WitBridge::translate` has an
   exhaustive match over cviz's `ValueType`, so cviz evolution forces
   a compile error. The wit-parser side uses upstream-provided
   `push_flat` / `SizeAlign` behavior, which absorbs most new type
   kinds without our code changing. If upstream adds a
   `TypeDefKind` splicer genuinely can't express (because cviz
   doesn't have a matching `ValueType`), the bridge needs a new
   translation arm.

2. **New `Instruction` variant in `wit-bindgen-core::abi`.**
   `WasmEncoderBindgen::emit`'s match over `AbiInst` is NOT
   exhaustive — the fallback is `unimplemented!()`. When upstream
   adds a new instruction (say, a new async bookkeeping op), we
   won't notice at compile time, but the first run that exercises it
   panics with a clear message. Add a new emit arm.

3. **Bitcast table expansion.** `abi/compat.rs::cast` has a
   non-exhaustive match (it ends with `unreachable!()` for
   bitcast pairs the canonical ABI doesn't allow). If upstream adds a
   new `WasmType` variant or changes the allowed join pairs, we'd
   need to update the copied table. This is one of the reasons to
   prefer upstream's version once `cast` is made public.

None of these are silent.

## Index spaces

Three separate counter allocators, one per namespace:

| Struct | Namespace | Scope |
|---|---|---|
| `ComponentIndices` | outer Component types / instances / funcs / core instances / core funcs | one per adapter component |
| `DispatchIndices` | dispatch core module types / funcs | one per dispatch module |
| `FunctionIndices` | wasm function locals | one per emitted wasm function |

Keeping them separate makes the "different index spaces" explicit —
the dispatch core module's type table has no relationship to the
outer component's, and a function's locals are disjoint from both.

`FunctionIndices` is the cross-cutting one: both the dispatch module
(for subtask / waitable-set locals) and the Bindgen (for iter
address, disc, and payload stash locals) allocate into the same
instance. The caller constructs it, pre-allocates anything it knows
about, then threads a `&mut FunctionIndices` into `Bindgen::new`.
When the bindgen is dropped, the caller calls `into_locals()` and
feeds the result into `Function::new_with_locals_types`.

## Testing

Three layers of test coverage:

- **Unit tests in `abi/bindgen.rs`** (~11 tests): emit-level
  assertions — "loading a u32 emits one `i32.load`", "heterogeneous
  variant emits one `i64.extend_i32_u`", "option's None arm pads
  with `i32.const 0`". These catch bitcast / widening regressions at
  `cargo test` time.

- **In-process adapter validation in `src/adapter/tests.rs`** (~60
  tests): run the full adapter generator for various interface
  shapes, then validate the emitted binary with wasmparser. Catches
  structural bugs but not runtime behavior.

- **End-to-end composition in `tests/component-interposition/`**: run
  `./run.sh __testme` to build every configuration (single middleware
  / chain / fan-in / nested / …), compose with real handler
  components, and execute the result through a wasmtime runner. This
  is the gold standard for "does the adapter actually work."

Any non-trivial change should clear all three. The bitcast widening
bug we fixed in Stage 2 was invisible to the first two — it only
surfaced when `__testme` tried to compose the real `wasi:http`
error-code variant.

## References

- [`CanonicalABI.md`](https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md)
  — the spec.
- [`definitions.py`](https://github.com/WebAssembly/component-model/blob/main/design/mvp/canonical-abi/definitions.py)
  — precise reference semantics.
- `docs/TODO/investigate-canon-abi.md` — the decision / migration
  doc that drove the Bindgen adoption.
- `docs/TODO/adapter-comp-planning.md` — broader planning notes on the
  tier-1 adapter.