bytesandbrains 0.3.4

# IR_AND_DSL.md — BytesAndBrains as an ONNX extension

A focused mapping of every BytesAndBrains (BB) concept to its canonical
representation in the ONNX intermediate representation (IR). BB does
not invent a parallel schema; it adds three vendor opsets, a
type-denotation namespace, and a small set of `metadata_props` keys —
nothing more. The ModelProto is the BB program.

See `crates/bytesandbrains-old/src/proto/onnx-ml.proto` for the
canonical schema this document references throughout. Field numbers
and message-member names cited here exist verbatim in that file.

---

## Part 1 — Thesis

ONNX defines three things: an extensible computation graph model,
standard data types, and built-in operators. BB extends the graph
model by registering vendor opsets in the `ai.bytesandbrains.*`
domain, extends the data types via `TypeProto.Opaque`, and supplies
its own built-in operators that complement (rather than replace)
`ai.onnx`. A BB Node is conceptually an ONNX runtime that:

- Dispatches `ai.onnx` ops to a bound backend (Burn, ONNX Runtime,
  TFLite, custom).
- Dispatches `ai.bytesandbrains.syscall` ops to the framework's
  built-in scheduler / bus / lifecycle machinery.
- Dispatches `ai.bytesandbrains.wire` ops to the bound wire runtime,
  surfacing typed envelopes for transport.
- Dispatches `ai.bytesandbrains.role.*` ops in one of two modes —
  graph-inlined or opaque Rust call — chosen per op by the bound
  runtime impl.

Everything BB needs to load, validate, snapshot, restore, and execute
a graph lives in canonical ONNX messages: `FunctionProto` for module
composition, `AttributeProto` for slot attributes and inlined
sub-graphs, `TypeProto.Opaque` for vendor types, `GraphProto.initializer`
for weights, `ModelProto.opset_import` for version negotiation,
`NodeProto.metadata_props` for the very few extension annotations the
framework requires. No parallel schema, no out-of-band binding tables,
no metadata-as-bytes workarounds.

**Role ops dispatch atomically**: every role-DSL call site emits a
NodeProto stamped with `(required_trait, slot_id)` metadata. The
engine routes by `(domain, op_type, instance)` against the
per-Node atomic dispatch table to the bound impl's
`dispatch_atomic`. Backends + Index + Aggregator + Model +
Codec + DataSource + PeerSelector + Protocol all share the
same atomic-dispatch contract; the role-method-as-subgraph splice
path is reserved for non-default-path overrides at higher
abstraction levels.

---

## Part 2 — Concept-to-proto mapping

A single dense reference for every BB concept's canonical ONNX home.
Each row cites the ONNX message and field the concept rides on.

| BB concept | ONNX representation |
|---|---|
| **Node program** | `ModelProto` (proto §444) |
| **Module** | `FunctionProto` (proto §933) with `(domain, name, overload)` identity, registered in `ModelProto.functions` (proto §516) |
| **Module body** | `FunctionProto.node: repeated NodeProto` (proto §963) |
| **Module bootstrap** | A sibling `FunctionProto` named `"<module>__bootstrap"`, stamped `metadata_props["ai.bytesandbrains.module_phase"] = "bootstrap"`. Recorded by `Module::bootstrap(&self, g)` next to `Module::body`. Install registers it on `BootstrapState::install_order` (`bb-runtime/src/engine/bootstrap.rs:376-392`) without arming the queue; the host kicks via `Node::run_bootstrap(BootstrapTarget)` (`bb-runtime/src/node/mod.rs:723-749`) — variants `BootstrapTarget::All` drive every install-order target, `ModuleNames(&[&str])` / `ModuleRequests(&[BootstrapRequest])` drive specific Module targets (the latter staging input formals), and `Slots(&[&str])` drives Component bootstraps. The engine seeds bootstrap bodies onto the frontier under a fresh `ExecId`; the per-component `is_op_locked` gate (`bb-runtime/src/engine/core.rs:1762-1806`) parks body ops touching any in-flight bootstrap's `ComponentRef` touch set until the bootstrap drains. See ENGINE.md §6.8. |
| **Module typed I/O** | `FunctionProto.input/output: repeated string` (proto §949–950) + `FunctionProto.value_info: repeated ValueInfoProto` (proto §994) |
| **Sub-module call** | `NodeProto { op_type: <function_name>, domain: <function_domain> }` in parent body; the runtime resolves `(domain, op_type)` against `ModelProto.functions` per ONNX's standard model-local-function rule (proto §502–516) |
| **Generic component placeholder** (`Backend`, `Model`, …) | `FunctionProto.attribute: repeated string` (proto §954) — required attribute name; the framework requires a binding at load |
| **Concrete component impl** (`BurnModel(configs)`, …) | `FunctionProto.attribute_proto: repeated AttributeProto` (proto §960) — attribute with a default value; the `AttributeProto` carries the impl's construction config in `.s` (bytes), `.t` (TensorProto for embedded weights/state), `.g` (sub-graph if construction itself is graph-shaped), or `.tp` (TypeProto for type-parameterised impls) |
| **Component slot identity** | The string name in `FunctionProto.attribute` / `attribute_proto.name` (e.g. `"backend"`, `"model"`, `"teacher"`, `"student"`) |
| **Per-node slot binding** | `NodeProto.metadata_props["ai.bytesandbrains.slot"] = "<function_attr_name>"` — points at the FunctionProto attribute that owns this node |
| **Tensor (live runtime value)** | `SlotValue` trait (Rust runtime, out-of-IR); on-wire / on-disk = `TensorProto` (proto §602) |
| **Tensor type declaration** | `TypeProto.Tensor { elem_type, shape: TensorShapeProto }` (proto §824) on `ValueInfoProto.type` (proto §204) |
| **Tensor memory ownership** | Backend-owned. `Backend::Tensor` is an `Arc`-shared handle around a backend-managed buffer (e.g. `CpuTensor(Arc<CpuBackendBuffer>)` at `bb-ops/src/backends/cpu/tensor.rs:44-65`); `Clone` is `Arc::clone`. Wire-receive of a tensor slot routes through `Backend::materialize_from_wire(type_hash, bytes: Vec<u8>) -> Result<Self::Tensor, _>` (`bb-runtime/src/contracts/backend.rs:497-522`) — the framework moves `fill.payload` into the call by value, the backend chooses pool / fresh / zero-copy adoption. Engine wraps the result in `BackendTensorCarrier` (`bb-runtime/src/slot_value.rs:43-174`) for slot residency. See [ROLES.md §Backend-owned tensor memory](ROLES.md#backend-owned-tensor-memory). |
| **Model weights / parameters** | `GraphProto.initializer: repeated TensorProto` (proto §570) — named tensors referenced by `NodeProto.input` |
| **Sparse weights** | `GraphProto.sparse_initializer: repeated SparseTensorProto` (proto §573) |
| **BB scalar types** (`Trigger`, `PeerId`, `RequestId`, `WireRequestId`, `CommandId`, `Timestamp`, `EventKind`, `CorrelationToken`) | `TypeProto.Opaque { domain: "ai.bytesandbrains", name: "<TypeName>" }` (proto §867) |
| **BB collection types** (`Vec<PeerId>`, `ResponseBatch`) | `TypeProto.Sequence { elem_type: TypeProto }` (proto §833) wrapping the canonical element type |
| **Opset declaration** | `ModelProto.opset_import: repeated OperatorSetIdProto` (proto §457) and `FunctionProto.opset_import` (proto §980) — each entry is `OperatorSetIdProto { domain, version }` (proto §915) |
| **Sub-graph carried on an op** (If/Loop branches; future role-method bodies) | `AttributeProto.g: GraphProto` (proto §182) for single sub-graph, `AttributeProto.graphs: repeated GraphProto` (proto §192) for multiple |
| **Op constant config** | `NodeProto.attribute: repeated AttributeProto` (proto §234) — typed via `AttributeProto.type` enum (proto §138), payload in one of `.f/.i/.s/.t/.g/.tp/.sparse_tensor` or their repeated variants |
| **Symbolic shape dimension** | `TensorShapeProto.Dimension { dim_param: "batch" }` (proto §807) |
| **Standard dimension semantics** | `TensorShapeProto.Dimension.denotation` (proto §814) — e.g. `"DATA_BATCH"`, `"DATA_CHANNEL"` |
| **Standard type semantics** | `TypeProto.denotation` (proto §909) — string-keyed standard semantic description |
| **Cross-Node type-identity hash** (the per-Node decoder dispatch key) | Computed at runtime from the value's `TypeProto.denotation` + the version from the relevant `OperatorSetIdProto`. Not stored in ONNX — the receiving Node computes and looks up. |
| **Quantization config** (Codec codebooks, scale/zero-point) | `GraphProto.quantization_annotation: repeated TensorAnnotation` (proto §590) |
| **Multi-device sharding hints** | `NodeProto.device_configurations: repeated NodeDeviceConfigurationProto` (proto §243); model-level config in `ModelProto.configuration` (proto §520) |
| **Training step semantics** (optional ONNX-native path) | `ModelProto.training_info: repeated TrainingInfoProto` (proto §498). BB defaults to recording training as plain ops in the inference graph; ONNX-Runtime-compatible export is opt-in (sets `update_binding`s for the trainable initializers). |
| **Role-op original-op trace** | `NodeProto.metadata_props["ai.bytesandbrains.original_op"]` — telemetry tag carrying the source `<role>:<op>` for trace-back. Routing is by `(domain, op_type, instance)` lookup in the per-Node atomic dispatch table. |
| **Module-instance identity** (for descriptive partition naming) | `NodeProto.metadata_props["ai.bytesandbrains.module_instance"]` — the composition-hierarchy chain (`<parent>_<child>_<grandchild>`) stamped by `Graph::with_module(name, |g| { ... })` scope helpers. The partition pass uses this only to *name* each wire-op-bounded partition; it is NOT the partition boundary itself (wire ops are). Distinct from the per-component `instance` key below. |
| **Concrete component type tag** | `NodeProto.metadata_props["ai.bytesandbrains.concrete_type"]` — the `ConcreteComponent::TYPE_NAME` of the component whose DSL method recorded this op. Absent for ops emitted by generic placeholders. Stamped at DSL recording time via `Graph::register_concrete::<T>(&T)`. |
| **Per-op instance disambiguator** | `NodeProto.metadata_props["ai.bytesandbrains.instance"]` — monotonic integer assigned at DSL recording time from `Graph`'s pointer-identity index. Multiple DSL calls from the same `&instance` share an `instance` value; two distinct concrete instances of the same TYPE_NAME get different values. The `partition_by_wire_ops` pass propagates it through every NodeProto it splices, merges, or moves. Distinct from `module_instance`. |
| **Generic placeholder slot tag** | `NodeProto.metadata_props["ai.bytesandbrains.required_trait"]` + `["ai.bytesandbrains.slot_id"]` — stamped at DSL recording time via `Graph::register_generic(ptr, trait)`. Identifies a slot that must be filled at Node.build() via the user's `with_<role>(impl)` chain call. |
| **Snapshot** | `ModelProto` bytes of the resolved-state graph (every slot already filled, every `attribute_proto` populated) PLUS framework-side `TransientSnapshot` (out-of-ONNX) for in-flight engine state |
| **Wire envelope** | NOT in ONNX. Lives in `proto/bb_envelope.proto`. Payload may carry ONNX-shaped values (TensorProto bytes, Opaque-typed bincode) but the envelope itself is the transport plane, separate from the IR. |
| **Function bodies (hoisted sub-Modules + backend subgraphs)** | `ModelProto.functions[]` per ONNX (proto §516). The Node holds ONE canonical `ModelProto` — every registered Module's main partition function + every hoisted/collapsed sub-function is one entry in `functions[]`, deduped by `(domain, name, overload)` at register time (linker ODR check). |
| **Function call** | A plain `NodeProto` whose `(op_type, domain, overload)` matches a registered `FunctionProto`'s `(name, domain, overload)`. Per the ONNX spec, this is the canonical call mechanism. No special call op_type — same NodeProto shape as any other. |
| **Hoisted sub-Module domain** | `ai.bytesandbrains.module`. `FunctionProto.name` is `Hoist_<chain>_<body_hash>` where `<chain>` is the joined `with_module` scope chain and `<body_hash>` is a hex hash over the canonicalized body (positional formals `__hoist_in_<i>`, `__hoist_out_<j>`, `__hoist_v_<n>`). Identical bodies — whether from N invocations in one Module or one body shared across N registered Modules — converge on the same name and dedupe at link time. |
| **Function-call overload convention** | Always empty string. Multi-instance disambiguation rides on the function `name` (`<type>#<instance>` for concrete bindings, the full scope chain for hoist), so `overload` is unused. |

That's the entire BB-to-ONNX mapping in a single page. Everything that
follows elaborates on these rows; nothing introduces a row not in this
table.

---

## Part 3 — Generic vs concrete components via FunctionProto attributes

ONNX `FunctionProto` already has the exact distinction BB needs
between "slot to be filled at load" and "slot with a default already
specified":

- `FunctionProto.attribute: repeated string` (proto §954) — names of
  attributes the function REQUIRES from its caller. No default. A
  caller that does not supply one is malformed.
- `FunctionProto.attribute_proto: repeated AttributeProto` (proto §960)
  — attributes with a default `AttributeProto` payload. The caller MAY
  override; if not, the default is used.

BB uses these two lists, with NO INVENTION of a parallel schema, to
distinguish generic placeholders from concrete impls:

### Generic placeholder slot — required attribute, no default

```
Module struct in Rust:
  struct MyModule {
      backend: Backend,   // unit-struct placeholder
      // …
  }

Recorded in FunctionProto for MyModule:
  function.attribute = ["backend"]
  // "backend" appears in `function.attribute` but NOT in
  // `function.attribute_proto`. Required, no default.
```

At load:

- The framework walks `function.attribute`. Each entry is a slot
  needing a runtime impl. The user supplies bindings via the chained
  Node API (`with_backend(impl)`); the framework verifies the bound
  impl satisfies the trait implied by the slot's name + opset.
- If any required attribute lacks a binding, load fails with
  `LoadError::UnboundGenericSlot { slot_name }`.

### Concrete impl slot — defaulted attribute carrying construction config

```
Module struct in Rust:
  struct MyModule {
      model: BurnModel,    // concrete with configs
      // …
  }
  let m = MyModule {
      model: BurnModel::new(config_0, config_1),
      …
  };

Recorded in FunctionProto for MyModule:
  function.attribute_proto = [
      AttributeProto {
          name: "model",
          type: STRING,        // or TENSOR / GRAPH / TYPE_PROTO,
                               // depending on the impl's construction shape
          s: <serialized BurnModel construction state, bincode bytes>,
          metadata_props: [
              ("ai.bytesandbrains.concrete_type", "burn_integration::BurnModel"),
          ],
      },
      …
  ]
```

The `AttributeProto` is fully expressive:

- `.s: bytes` — opaque serialized state (bincode/serde) for impls
  whose construction is "give me these bytes and I'll deserialize"
- `.t: TensorProto` — for impls whose construction is "give me these
  weights" (e.g. a `LoadedMlp` initialized from a TensorProto)
- `.g: GraphProto` — for impls whose construction itself is graph-
  shaped (e.g. a "model defined by this ONNX function")
- `.tp: TypeProto` — for impls parameterised by type metadata
- `.metadata_props["ai.bytesandbrains.concrete_type"]` — the Rust type
  identifier (or Python class name); the framework looks up the
  registered deserializer for that type and reconstructs the impl

At load:

- The framework walks `function.attribute_proto`. Each entry is a
  concrete slot with construction state baked in. The framework looks
  up the registered deserializer for the type (registered at process
  startup via `Engine::register_concrete_type<T>()`) and instantiates
  the impl from the AttributeProto.
- If the deserializer is not registered for a `concrete_type`, load
  fails with `LoadError::UnregisteredConcreteType { type_name }`.

### Multi-instance per role

`function.attribute_proto` carries multiple entries with distinct
names. A Module with two `BurnModel` fields — one named `teacher`, one
named `student` — produces two entries in `attribute_proto` with names
`"teacher"` and `"student"`, both with `concrete_type =
"burn_integration::BurnModel"` but distinct construction bytes (and
therefore distinct deserialized instances at load). The NodeProtos
emitted by teacher.forward(…) carry
`metadata_props["ai.bytesandbrains.slot"] = "teacher"`; student.forward
emissions carry `slot = "student"`. Disambiguation is by attribute
name throughout.

### What this gives us

- **The framework needs no `components()` accessor on the Module
  trait**: the slot list is exactly `function.attribute +
  function.attribute_proto`. The Rust struct fields are the authoring
  surface; the FunctionProto's attribute lists are the runtime
  surface; both describe the same thing through the DSL recording.
- **Cross-language**: Python's `onnx` library knows FunctionProto
  natively. Python-side BB walks the same attribute lists. A
  ModelProto produced from Rust loads in Python (or vice versa) with
  identical slot resolution semantics.
- **Snapshot is free**: every concrete impl's construction state is
  already in the ModelProto's FunctionProto. A ModelProto round-trip
  is a snapshot round-trip.

---

## Part 4 — TypeProto.Opaque for BB-domain types

ONNX provides `TypeProto.Opaque { domain, name }` (proto §867) for
vendor-defined types whose internal layout only the vendor
understands. This is exactly the right fit for every non-tensor BB
type. We register every BB scalar / non-tensor type as an Opaque
under the `ai.bytesandbrains` domain:

```
Trigger          → Opaque { domain: "ai.bytesandbrains", name: "Trigger" }
PeerId           → Opaque { domain: "ai.bytesandbrains", name: "PeerId" }
Address          → Opaque { domain: "ai.bytesandbrains", name: "Multiaddress" }
RequestId        → Opaque { domain: "ai.bytesandbrains", name: "RequestId" }
WireRequestId    → Opaque { domain: "ai.bytesandbrains", name: "WireRequestId" }
CommandId        → Opaque { domain: "ai.bytesandbrains", name: "CommandId" }
Timestamp        → Opaque { domain: "ai.bytesandbrains", name: "Timestamp" }
EventKind        → Opaque { domain: "ai.bytesandbrains", name: "EventKind" }
CorrelationToken → Opaque { domain: "ai.bytesandbrains", name: "CorrelationToken" }
ResponseBatch    → Opaque { domain: "ai.bytesandbrains", name: "ResponseBatch" }
```

Collection types compose canonically:

```
Vec<PeerId>      → Sequence { elem_type: Opaque { ai.bytesandbrains, PeerId } }
Vec<Address>     → Opaque { domain: "ai.bytesandbrains", name: "address_vec" }
```

`Vec<Address>` rides on the concrete leaf `TYPE_ADDRESS_VEC`
(`bb-ir/src/types/builtins.rs:306-318`) rather than a generic
`Sequence` wrapper so the wire-hash (`0x0303`) distinguishes it
from a single `TYPE_MULTIADDRESS` on the wire. The carrier is
`AddressVecValue` (`bb-runtime/src/syscall/values.rs:67-68`),
populated by `AddressBook::Lookup` outputs and by `wire.Send`'s
`src_peer_addresses` envelope stamp.

Tensor types stay canonical:

```
Dense<f32>       → Tensor { elem_type: FLOAT,   shape: [dynamic] }
Dense<f64>       → Tensor { elem_type: DOUBLE,  shape: [dynamic] }
Dense<i32>       → Tensor { elem_type: INT32,   shape: [dynamic] }
Dense<i64>       → Tensor { elem_type: INT64,   shape: [dynamic] }
```

For consumers that need a stable cross-process type-identity key (the
per-Node decoder dispatch hash):

```
hash = compute_wire_hash(opaque.name, opset_version)
     = FNV-1a-64 of format!("{}@{}", opaque.name, opset_version)
```

The hash is NOT stored anywhere in ONNX. The sender computes it from
its outgoing value's TypeProto + the opset version declared in
`opset_import`; the receiver computes the same hash from its loaded
type expectation; both end up at the same `u64` and route to the
same decoder. Pure function of stable inputs; no registry needed.

### Why Opaque (not Tensor) for scalars

A naive mapping might cast `PeerId` (a `Multihash<64>`) as
`Tensor { elem_type: UINT8, shape: [N] }`. This works for byte-
level round-trip but loses type identity at the schema level:
every other byte-vector scalar in the graph collapses to the same
ONNX type, defeating the framework's typed-input-port validation.
`Opaque { domain, name }` keeps each BB scalar distinct in the IR
and in the eyes of every ONNX consumer.

---

## Part 5 — Opset catalogs

### Part 5a — `ai.bytesandbrains.syscall v1`

Framework primitives. Domain: `ai.bytesandbrains.syscall`. Version:
`1`. Dispatch: all stateless framework dispatch (the
`DispatchEntry::Stateless` variant in [ENGINE.md §8.1](ENGINE.md));
each op runs in-engine via the built-in framework Components.

| op_type | inputs | outputs | attributes | semantics |
|---|---|---|---|---|
| `Pulse` | – | `trigger: Opaque<Trigger>` | – | One-shot at bootstrap |
| `OnTrigger` | `trigger: Opaque<Trigger>` | `trigger: Opaque<Trigger>` | – | Re-fires on each input arrival |
| `Threshold` | `inputs: variadic` | `trigger: Opaque<Trigger>` | `n: int` | Fires after N inputs arrive |
| `Interval` | – | `tick: Opaque<Timestamp>` | `period_ns: int` | Periodic timer |
| `EventSource` | – | `event: Opaque<EventKind>` | `kind: int` | Fires on bus event of given kind |
| `After` | `trigger: Opaque<Trigger>` | `trigger: Opaque<Trigger>` | `delay_ns: int` | Delays trigger |
| `Limit.Acquire` | `trigger: Opaque<Trigger>` | `trigger: Opaque<Trigger>` | `name: string, n: int` | Semaphore acquire |
| `Limit.Release` | `trigger: Opaque<Trigger>` | – | `name: string` | Semaphore release |
| `Any` | `inputs: variadic` | `value: <first-arrival type>` | `group: string` | First-arrival group |
| `Gate` | `value: any, trigger: Opaque<Trigger>` | `value: any` | – | Host-controlled gate |
| `Serialize.Enqueue` | `value: any` | `trigger: Opaque<Trigger>` | `queue: string` | FIFO enqueue |
| `Serialize.Dequeue` | `trigger: Opaque<Trigger>` | `value: any` | `queue: string` | FIFO dequeue |
| `CorrelateTag` | `trigger: Opaque<Trigger>` | `token: Opaque<CorrelationToken>` | – | Mints a fresh correlation token |
| `Hold.Stash` | `value: any` | – | `slot: string` | Buffers value |
| `Hold.Flush` | `trigger: Opaque<Trigger>` | `value: any` | `slot: string` | Releases held value |
| `AppEmit` | `value: any` | – | `name: string` | Surfaces `EngineStep::AppEvent { topic: name }` to host |
| `AppNotify` | `trigger: Opaque<Trigger>` | – | `name: string` | Marker `EngineStep::AppEvent` |
| `Record` | `value: any` | – | `name: string` | Push to per-Node ring buffer |
| `IncrMetric` | `trigger: Opaque<Trigger>` | – | `name: string, delta: int` | Counter increment |
| `LifecyclePhase` | – | `trigger: Opaque<Trigger>` | `phase: int` (Shutdown=1, Snapshot=2) | Fires on `Engine::fire_lifecycle(phase)`. Bootstrap is not a lifecycle phase — see `Module::bootstrap` below. |
| `GateDispatch` | `value: any` | `value: any` | (compiler-inserted) | Edge-gate inserted by augmentation pass |
| `MintDispatch` | `trigger: Opaque<Trigger>` | `token: Opaque<CorrelationToken>` | (compiler-inserted) | Token mint inserted by augmentation pass |
| `GateManyDispatch` | `value: any, gates: variadic` | `value: any` | (compiler-inserted) | Multi-edge gate |
| `Clock` | `trigger: Opaque<Trigger>` | `now: Opaque<Timestamp>` | – | Reads system clock |
| `RngU64` | `trigger: Opaque<Trigger>` | `value: u64` | – | PRNG output |
| `Sleep` | `trigger: Opaque<Trigger>` | `trigger: Opaque<Trigger>` | `duration_ns: int` | Async timer |
| `DeadlineMatch` | `then: Opaque<Trigger>, timeout: Opaque<Trigger>` | `winner: Opaque<Trigger>` | – | First-to-fire selector |
| `PassThrough` | `value: any` | `value: any` | – | Identity |
| `Tee` | `value: any` | `outputs: variadic` | `fanout: int` | Duplicate input N ways |
| `Constant` | – | `value: any` | `value: AttributeProto` | Emit a constant at boot (value carried in the attribute) |

All syscall ops are framework-internal stateless dispatch (the
`DispatchEntry::Stateless` variant in ENGINE.md §8.1; not routed
through the atomic dispatch table). They run on the framework's
built-in dispatch through `RuntimeResourceRef`'s scheduler /
event_source / bus / outbound_queue.

### Engine event channels — function signature vs. AppEmit

There are TWO complementary paths that surface `EngineStep::AppEvent`
to the host; both coexist and both produce the same step variant.

**(a) Top-level Module's function signature.** The Module the host
binds with `bb::install(peer, addrs, compiled, &[target], Config::new())`
exposes its `function.input` ports as ingress trigger sites and its
`function.output` ports as engine-observable result sites. When a
value lands at one of those output sites AND no downstream consumer
in the function reads it, the engine emits
`EngineStep::AppEvent { topic: <output port name> }`. Sub-Module
outputs do NOT take this path — their outputs always have a
downstream consumer in the parent's body.

**(b) Explicit syscall ops.** `AppEmit` / `AppNotify` can be placed
anywhere in the graph, including inside deeply nested sub-Modules.
They fire mid-cycle, push to `framework.pending_app_events`, and
Phase 8 drains them into `EngineStep::AppEvent`. Use this channel
for intermittent reporting / progress events that don't fit the
single-final-output shape.

### Part 5a.1 — `ai.bytesandbrains.address_book v1`

DAG-mutable `AddressBook` ops. Domain: `ai.bytesandbrains.address_book`.
Version: `1`. Dispatch: custom ops registered via `bb::register_op!`
in `bb-ops/src/syscalls/peers/`; the engine routes through the
shared atomic dispatch path. Carriers: `TYPE_PEER_ID`,
`TYPE_MULTIADDRESS`, `TYPE_ADDRESS_VEC`.

| op_type | inputs | outputs | attributes | semantics |
|---|---|---|---|---|
| `Insert` | `peer: PeerId, address: Multiaddress` | – | – | New peer → `add_peer(peer, vec![addr])`; known peer → `register_address(peer, addr)`. Errors on empty list / `Full`. (`bb-ops/src/syscalls/peers/insert.rs`) |
| `InsertMany` | `peer: PeerId, addresses: AddressVec` | – | – | New peer → `add_peer(peer, addrs)`; known peer → one `register_address` per address. Errors on empty input / `Full`. (`bb-ops/src/syscalls/peers/insert_many.rs:33-67`) |
| `Lookup` | `peer: PeerId` | `addresses: AddressVec` | – | Full ordered slice via `AddressBook::lookup`. Errors on unknown peer / empty list. (`bb-ops/src/syscalls/peers/lookup.rs:29-49`) |

The `AddressVec` output type lands on `TYPE_ADDRESS_VEC`
(`ai.bytesandbrains.address_vec`, wire-hash `0x0303`,
`bb-ir/src/types/builtins.rs:306-318`). The receiver-side merge
inside `Engine::poll` (`bb-runtime/src/engine/poll.rs:1005-1062`)
calls the underlying `AddressBook` methods directly rather than
recording syscalls — the syscall surface exists for discovery
protocols that compile address propagation into a graph.

DSL helpers live at `bb-dsl/src/syscalls.rs:55-83`
(`address_book_insert_many`, `address_book_lookup`); the
single-address `Insert` path is runtime-internal.

### Part 5b — `ai.bytesandbrains.wire v1`

Network endpoint ops. Domain: `ai.bytesandbrains.wire`. Version: `1`.
Dispatch: the engine registers `Send` and `Recv` as stateless
syscalls at construction (`src/syscall/wire.rs`). There is no
`WireRuntime` binding — wire is engine-native infrastructure.

| op_type | inputs | outputs | attributes | semantics |
|---|---|---|---|---|
| `Send` | `data: any, dest: Address` (multiaddr) | – | – | Fire-and-forget broadcast. N typed `data` inputs are packed as N `SlotFill`s in one envelope to `dest`. |
| `SendReqBatched` | `data: any, dest: Address` | `req_id: Opaque<RequestId>, responses: Opaque<ResponseBatch>` | – | Batched request/response; `responses` fires ONCE when cohort completes |
| `SendResp` | `data: any, dest: Address, req_id: Opaque<RequestId>` | – | – | Reply to an inbound request |
| `Recv` | – | `trigger: Opaque<Trigger>, payload: any` | `payload_type: TypeProto (via attribute_proto.tp)` | Declare inbound type acceptance. The Recv's `NodeSiteId` becomes the routable destination; senders construct `/site/<id>` suffixes for it. Inbound payload bytes materialise into a typed `SlotValue` via the shared `wire_decoder_registry` per [WIRE.md §5.4](WIRE.md#typed-receive) — the same registry the `CompositeValue` codec consults, symmetric with Bundle's wire encode. |
| `RecvReq` | – | `trigger: Opaque<Trigger>, payload: any, req_id: Opaque<RequestId>` | `payload_type: TypeProto` | Declare inbound request acceptance |
| `RecvRespBatched` | `req_id: Opaque<RequestId>` | `trigger: Opaque<Trigger>, responses: Sequence<any>` | – | Receiver-side batched-response collector |

Per [ADDRESSING.md](ADDRESSING.md), `dest` is a multiaddr (Address)
not a `PeerId` — it encodes both the transport target and the per-slot
suffix that identifies the destination Recv site or component op.

**Correlation modeling.** Every inbound/outbound wire NodeProto
carries `metadata_props["ai.bytesandbrains.wire_correlation"]` with
one of `"none"`, `"request"`, `"response"`. The wire envelope's
proto-level `WireCorrelation` field (in `bb_envelope.proto`, not
ONNX) is the runtime echo of this static annotation.

**TriggerOnly classification.** Cross-Node edges carry
`metadata_props["ai.bytesandbrains.wire_transport"]` with `"data"` or
`"trigger_only"`. Set by the compiler's partition pass after walking
consumer types; the engine reads it at send-time to skip payload
encoding for trigger-only fills.

**Validator pairing.** Every `SendReqBatched` node MUST be paired with
exactly one `SendResp` node whose `req_id` input traces back to the
`SendReqBatched`'s `req_id` output. Unpaired requests fail validation
(`ValidationError::UnpairedWireRequest`).

**Streaming variants are intentionally absent.** Use `SendReqBatched`
with cohort sizing for fanout patterns.

**Allocation path (Send + Recv).** `Send` invokes
`SlotValue::to_wire_bytes` (bincode for the framework-carrier
shape; `BackendTensorCarrier::wire_encode_fn` for the
backend-mediated shape) and builds a `SlotFill { dest_suffix,
payload: Vec<u8>, trigger_only }`. The `Vec<u8>` is
framework-owned for the lifetime of the outbound envelope. `Recv`
delivers via `decode_typed_fill`
(`bb-runtime/src/engine/poll.rs:996-1083`): the framework charges
`fill.payload.len()` against
`NodeConfig::ingress_byte_budget`, branches on whether the
destination slot binds a `Backend` role
(`Engine::slot_id_to_role_ref` —
`bb-runtime/src/engine/core.rs:236`), and either `mem::take`s
the bytes into `Backend::materialize_from_wire` (tensor path,
zero memcpy on the framework side) or runs the global
`wire_decoder_registry` decoder against `&fill.payload`
(framework-carrier path). Per-fill failures (`AllocationFailed`,
`BudgetExceeded`, `BackendMaterializeFailed`, `TypeMismatch`,
`UnknownTypeHash`, `DecodeFailed`) surface as
`InfraEvent::WireReceiveError` and continue iterating sibling
fills (partial-delivery semantics). See [WIRE.md §5.4](WIRE.md#54-wire-eligibility-and-typed-receive)
for the full failure-mode catalog.

### Part 5c — `ai.bytesandbrains.role.* v1`

Six role opsets, one per role trait. Domains:
`ai.bytesandbrains.role.index`, `ai.bytesandbrains.role.model`,
`ai.bytesandbrains.role.aggregator`,
`ai.bytesandbrains.role.compressor`,
`ai.bytesandbrains.role.data_loader`,
`ai.bytesandbrains.role.peer_selector`. Version: `1` for all.

#### Part 5c.1 — Role-op dispatch: graph-returning trait methods + atomic-op opsets

Every `ai.bytesandbrains.role.*` op enters the IR as a NodeProto
stamped with `(required_trait, slot_id)` metadata. The engine
routes by `(domain, op_type, instance)` against the per-Node atomic
dispatch table to the bound impl's `dispatch_atomic`. Role methods
ARE the contract surface; there is no separate "role method returns
a GraphProto" path in the production pipeline.

##### Atomic-op opset (current pipeline)

Each `<Role>Runtime::atomic_opset()` declares the impl's per-op
domain + the typed input/output shape of each op. The DSL records
NodeProtos under that domain; the engine resolves
`(domain, op_type, instance)` → bound impl → `dispatch_atomic` at
install time. This is the canonical path for everything role-shaped:
Index ops, Aggregator ops, Backend per-op kernels, Model forward /
backward / step, Codec encode / decode, DataSource next_batch,
PeerSelector sample / current_view, Protocol custom opsets.

##### Future — role-method-returns-graph

The architecture reserves space for `<Role>Runtime::<method>` to
return a `Result<GraphProto, Self::Error>` so the compiler can
splice the body into the parent graph (enabling backend-portable
role definitions that decompose into `ai.onnx v1` math). The
splicing pipeline is not in the production compiler today; the
extension is future work for `ai.onnx`-decomposable roles.

##### Mixing per op

A single `ModelRuntime` impl freely mixes both shapes per op:

```rust
impl ModelRuntime for BurnModel {
    fn forward(&self) -> Result<GraphProto, Self::Error> {
        // Shape 1 — decomposable Gemm + ReLU + Gemm body.
        Ok(self.build_forward_graph_ai_onnx())
    }
    fn backward(&self) -> Result<GraphProto, Self::Error> {
        // Shape 2 — single atomic node referencing this impl's opset.
        Ok(single_node_graph(
            "bb-burn.BurnModel.atomic", "Backward", &["grad"], &["cmd"]
        ))
    }
    fn step(&self) -> Result<GraphProto, Self::Error> {
        // Shape 2 — optimizer state mutation; can't be a graph.
        Ok(single_node_graph(
            "bb-burn.BurnModel.atomic", "Step", &["grads"], &["cmd"]
        ))
    }

    fn atomic_opset(&self) -> AtomicOpsetDecl { /* registers Backward, Step, … */ }
    fn dispatch_atomic(
        &mut self,
        op_type: &str,
        inputs: &[(&str, &dyn SlotValue)],
    ) -> Result<DispatchResult, Self::Error> {
        match op_type {
            "Backward" => /* run autograd backward; return CommandId */,
            "Step"     => /* mutate optimizer state; return CommandId */,
            _ => unreachable!(),
        }
    }
}
```

The trait's role methods are called once per Node load by the
compiler; `dispatch_atomic` is called repeatedly at execution. See
[ROLES.md §2](ROLES.md) for the full runtime-trait contract.

##### Cross-runtime portability

- **Shape-1 bodies are backend-portable.** A Module whose
  `model.forward` returns a Shape-1 body runs on any `bb::Backend`
  Contract impl declaring the opsets the body uses. Swap Burn for
  ONNX Runtime without changing the Module.
- **Shape-2 bodies pin to a specific impl.** A Module whose
  `model.backward` returns a Shape-2 body referencing
  `bb-burn.BurnModel.atomic::Backward` only runs on `BurnModel`'s
  `ModelRuntime` impl (or another impl declaring the same atomic
  opset). The IR carries the requirement ("`bb-burn.BurnModel.atomic`
  must be bound in the atomic dispatch table"); the binding answers
  with the registered impl.

##### NodeProto schema

Each role-op NodeProto stays under the impl's atomic opset:

```
NodeProto {
    op_type: "Backward",
    domain: "bb-burn.BurnModel.atomic",
    input: [<input_value_name>],
    output: [<output_value_name>],
    metadata_props: [
        ("ai.bytesandbrains.concrete_type", "bb-burn::BurnModel"),
        ("ai.bytesandbrains.instance",      "0"),
        ("ai.bytesandbrains.original_op",
         "ai.bytesandbrains.role.model::Backward"),
    ],
}
```

Routing is by `(domain, op_type, instance)` lookup in the per-Node
atomic dispatch table. The `original_op` metadata is retained for
telemetry and trace-back.

#### Part 5c.2 — Op-by-op tables

Each role op below has fixed inputs / outputs / attributes — the
contract the runtime trait's role method (`<Role>Runtime::<op>`) must
match in the GraphProto it returns. The **canonical body** column
indicates the typical shape (Shape 1 = decomposable; Shape 2 =
single-atomic per §5c.1). Concrete impls are free to choose either
shape per op — the contract is the IO signature, not the body.

##### `ai.bytesandbrains.role.index v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `Add` | `vec: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (stateful) |
| `Search` | `query: Tensor` | `results: Sequence<Tuple<Tensor, FLOAT>>` | `k: int` | Shape 2 (typically); Shape 1 for in-memory flat indexes |
| `Remove` | `id: Tensor (UINT64)` | `cmd: Opaque<CommandId>` | – | Shape 2 (stateful) |

##### `ai.bytesandbrains.role.model v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `Forward` | `input: Tensor` | `output: Tensor` | – | Shape 1 (decomposable; fuses with surrounding `ai.onnx` math) |
| `Backward` | `grad: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (autograd internals) |
| `Step` | `grads: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (optimizer state mutation) |
| `Evaluate` | `input: Tensor, target: Tensor` | `loss: Tensor` | – | Shape 1 (decomposable) |
| `ApplyDelta` | `delta: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (parameter mutation) |
| `LoadParameters` | `params: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (parameter mutation) |
| `Params` | – | `params: Tensor` | – | Shape 2 (snapshot read) |

##### `ai.bytesandbrains.role.aggregator v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `Contribute` | `contribution: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (buffer write) |
| `Aggregate` | `trigger: Opaque<Trigger>` | `result: Tensor` | – | Shape 1 (mean / weighted-sum / replace expressible in `ai.onnx`) |
| `CurrentTensor` | `trigger: Opaque<Trigger>` | `tensor: Tensor` | – | Shape 2 (state read) |

##### `ai.bytesandbrains.role.compressor v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `TrainCodebook` | `training: Tensor` | `cmd: Opaque<CommandId>` | – | Shape 2 (codebook mutation) |
| `Compress` | `t: Tensor` | `code: Tensor` | – | Shape 2 (impl-specific nearest-codeword search) |
| `Decompress` | `code: Tensor` | `t: Tensor` | – | Shape 1 (`ai.onnx::Gather` over the codebook) |

##### `ai.bytesandbrains.role.data_loader v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `NextBatch` | – | `batch: Tensor, labels: Optional<Tensor>` | – | Shape 2 (data source has side effects) |
| `Reset` | `trigger: Opaque<Trigger>` | `trigger: Opaque<Trigger>` | – | Shape 2 |
| `OnDataLoaded` | – | `trigger: Opaque<Trigger>` | – | Shape 2 |

##### `ai.bytesandbrains.role.peer_selector v1`

| op_type | inputs | outputs | attributes | canonical body |
|---|---|---|---|---|
| `Sample` | – | `peers: Sequence<Opaque<PeerId>>` | `n: int` | Shape 2 (state-dependent sampling) |
| `CurrentView` | – | `view: Sequence<Opaque<PeerId>>` | – | Shape 2 (state read) |

### Part 5d — `ai.onnx v1` (the minimum-viable required subset)

A `bb::Backend` Contract impl declaring `ai.onnx v1` MUST support
these 51 op types. Semantics are canonical ONNX; backends executing
them follow the standard ONNX spec. BB does NOT redefine semantics —
it only specifies the required subset for compatibility.

**Arithmetic:** `Add`, `Sub`, `Mul`, `Div`, `Neg`, `Abs`, `Sqrt`,
`Exp`, `Log`, `Pow`.

**Linear algebra:** `MatMul`, `Gemm`, `Dot`.

**Activations:** `Relu`, `Sigmoid`, `Tanh`, `Softmax`, `LeakyRelu`,
`Gelu`.

**Shape / structural:** `Reshape`, `Transpose`, `Concat`, `Split`,
`Slice`, `Squeeze`, `Unsqueeze`, `Identity`, `Cast`.

**Reductions:** `ReduceSum`, `ReduceMean`, `ReduceMax`, `ReduceMin`.

**Comparison:** `Equal`, `Greater`, `Less`.

**Normalization:** `BatchNormalization`, `LayerNormalization`.

**Conv / Pool:** `Conv`, `MaxPool`, `AveragePool`, `GlobalAveragePool`.

**Creation:** `Zeros`, `Ones`, `Constant`.

**Indexing:** `Gather`, `Scatter`.

**Control flow:** `If`, `Loop`.

Backends supporting a superset (e.g. ONNX Runtime, Burn) trivially
pass the load pre-flight check. Backends supporting a subset fail
`LoadError::UnsupportedOps` listing the missing op_types — surfaced
before any execution.

---

## Part 6 — DSL → NodeProto records

Every DSL method materializes into one or more NodeProtos. The
pattern is mechanical:

```rust
// DSL call:
self.backend.matmul(g, a, b)

// Recorded NodeProto (for a call on a concrete ConcreteComponent impl):
NodeProto {
    op_type: "MatMul",
    domain: "ai.onnx",
    input: vec![a.name.clone(), b.name.clone()],
    output: vec![g.next_site_name()],
    attribute: vec![],
    metadata_props: vec![
        StringStringEntryProto {
            key: "ai.bytesandbrains.concrete_type".into(),
            value: "bb-burn::BurnBackend".into(),  // = T::TYPE_NAME
        },
        StringStringEntryProto {
            key: "ai.bytesandbrains.instance".into(),
            value: "0".into(),  // = instance_id from Graph::register_concrete
        },
    ],
    name: "",
    doc_string: "",
    overload: "",
    device_configurations: vec![],
}
```

The DSL's contract:

- Method name maps to `op_type` via standard CamelCase (`matmul` →
  `"MatMul"`, `recv_req` → `"RecvReq"`, `forward` → `"Forward"`,
  `next_batch` → `"NextBatch"`).
- The component handle's opset (looked up from the trait it
  satisfies) maps to `domain`.
- `Output` arguments contribute their `name` strings to `input`.
- Newly-created sites get fresh names via `Graph::next_site_name()`,
  populated into `output` and returned as new `Output` handles.
- Op-specific config arguments populate `attribute` as
  `AttributeProto`s — `axis: i64` → `AttributeProto { type: INT, i:
  axis, name: "axis" }`, etc.
- Identity metadata goes into the `concrete_type` + `instance` keys
  (for ConcreteComponent impls) or the `required_trait` + `slot_id`
  keys (for generic placeholder unit structs). The DSL method
  calls `g.register_concrete::<Self>(self)` or
  `g.register_generic(self as *const _, REQUIRED_TRAIT)` at the top of
  its body; the Graph tracks pointer-identity and assigns the
  per-instance id or per-slot id, returning the values for the DSL
  method to stamp into the NodeProto.

The Output return shape mirrors the canonical ONNX op signature:

- One `Output` for single-output ops.
- A `(Output, Output, …)` tuple for multi-output ops.
- The output's `TypeNode` is statically known from the DSL method
  signature; the Graph populates `value_info` with the
  matching `ValueInfoProto.type` for downstream type checking.

There is no implicit type erasure: every Output carries its
canonical `TypeProto.denotation` so the validator can match
producer/consumer types.

---

## Part 7 — Graph identity, opset_import, version negotiation

A loaded `ModelProto` declares its opsets in `opset_import`:

```
model.opset_import = [
    OperatorSetIdProto { domain: "ai.onnx",                      version: 17 },
    OperatorSetIdProto { domain: "ai.bytesandbrains.syscall",    version: 1 },
    OperatorSetIdProto { domain: "ai.bytesandbrains.wire",       version: 1 },
    OperatorSetIdProto { domain: "ai.bytesandbrains.role.model", version: 1 },
    OperatorSetIdProto { domain: "ai.bytesandbrains.role.aggregator", version: 1 },
]
```

A `FunctionProto` body carries its own `opset_import` declaring the
opsets its inlined nodes use. This lets sub-modules import additional
opsets the parent doesn't directly use.

**Per ONNX semantics, when multiple opsets declare the same op_type,
the runtime binds against the HIGHEST version in the imported sets.**
BB follows this rule verbatim. A backend supporting `ai.onnx v17` but
graph importing `ai.onnx v18` runs with v18 semantics for any v18-
defined ops; v17-stable ops use v17 semantics.

**Pre-flight check at load.** The framework walks `opset_import` and
verifies the bound runtime impls cover each opset's required ops:

- For `ai.onnx v<n>`: the bound backend's `supported_ops()` covers
  every `ai.onnx` op_type appearing in the graph.
- For `ai.bytesandbrains.role.<role> v<n>`: the bound role runtime's
  `supported_ops()` covers every role op_type appearing.
- For `ai.bytesandbrains.syscall v<n>`: framework-built-in; no binding
  required (always supported).
- For `ai.bytesandbrains.wire v<n>`: framework-built-in; the engine
  registers `Send` and `Recv` as stateless syscalls, no binding
  required (always supported).

Failure produces `LoadError::IncompatibleRuntime { opset, missing_ops }`.

---

## Part 8 — Wire envelope (out-of-IR, coherent with it)

The wire envelope is **transport, not IR**. It lives in a separate
proto file (`proto/bb_envelope.proto`) and is NOT part of the ONNX
schema. The envelope's job is to carry an opaque payload between
Nodes; the IR (loaded ModelProtos on both ends) defines what the
payload means.

Envelope schema (per [ADDRESSING.md](ADDRESSING.md) — addresses
route themselves):

```proto
syntax = "proto3";
package bb.core;

enum CorrelationKind { NONE = 0; REQUEST = 1; RESPONSE = 2; }

message WireCorrelation {
  CorrelationKind kind = 1;
  uint64 wire_req_id = 2;
}

message WireEnvelope {
  repeated bytes dest_peer_addresses = 1;  // resolved address list from
                                           // AddressBook::lookup(peer);
                                           // transport picks one entry.
                                           // Lookup miss → no envelope
                                           // (EngineStep::PeerResolveFailed
                                           // surfaces instead).
  repeated SlotFill fills = 2;             // batched fills
  WireCorrelation correlation = 3;         // request/response pairing
  // ... fields 4-7: deadline propagation + RTT piggyback + ...
  repeated bytes src_peer_addresses = 8;   // sender's local-address bag
                                           // (snapshot of `ctx.local_addresses()`
                                           // at send time); receiver merges
                                           // into AddressBook entry for the
                                           // sender. Capped at decode time
                                           // via EnvelopeCaps.
}

message SlotFill {
  bytes dest_suffix = 1;   // per-slot multiaddr suffix (intra-node):
                           //   /site/<NodeSiteId>           — data plane
                           //   /component/<cref>/op/<name>  — control plane
  bytes payload = 2;       // wire-encoded bytes; empty when trigger_only
  bool  trigger_only = 3;
}
```

Peer routing is the resolved
`dest_peer_addresses: repeated bytes` (the wire syscall populates
it from `AddressBook::lookup(peer)`; the transport adapter picks
one entry by capability); intra-node routing is each fill's
`dest_suffix`. Receivers parse the suffix segments to dispatch (see
[ADDRESSING.md](ADDRESSING.md) for the canonical reference,
including the DAG-mutable `peers/` syscall ops + `PeerResolveFailed`
lifecycle event).
- `SlotFill.dest_suffix` ending in `/site/<NodeSiteId>` identifies
  the slot inside the receiver's installed graph. The slot's
  declared `TypeNode` (looked up from `ValueInfoProto.type` via the
  installed graph's `site_names` map) tells the receiver which
  decoder to use.
- `SlotFill.dest_suffix` ending in `/component/<ComponentRef>/op/<name>`
  routes directly to `components[cref].dispatch_atomic(name, ...)`
  for control-plane components. The component owns its payload
  encoding.

The envelope plane and the IR plane never collide: the envelope is
how Nodes exchange bytes; the IR (graphs + type-meta + addresses)
is what makes those bytes meaningful on the receiver side.

---

## Part 9 — Worked example: canonical SplitLearning Module

Source Rust:

```rust
struct SplitLearning {
    backend: Backend,                            // generic
    network_server: NoBarrierOneShot,            // concrete
    network_client: BarrierNetworkReqResp,       // concrete
    model: BurnModel,                            // concrete
    codec: ProductQuantization,                  // concrete
    gossip: Cyclone,                             // concrete
    aggregator: WeightAggregator,                // concrete
}

impl Module for SplitLearning {
    fn name(&self) -> &str { "SplitLearning" }

    fn op(&self, g: &mut Graph, _inputs: &[Output]) -> Vec<Output> {
        let (_t1, enc_in) = self.network_server.recv(g);
        let dec_in = self.codec.decompress(g, enc_in);
        let dec_out = self.model.forward(g, dec_in);
        let enc_out = self.codec.compress(g, dec_out);

        let peers = self.gossip.sample(g, 5);
        let (req_id, _ack) = self.network_client.send_req_batched(g, enc_out, peers);

        let (_t2, batched_grads) = self.network_client.recv_responses(g, req_id);
        let dec_grads = self.codec.decompress(g, batched_grads);
        let avg_grad = self.aggregator.aggregate(g, dec_grads);
        let _ = self.model.step(g, avg_grad);
        let _ = self.model.backward(g, avg_grad);
        vec![]  // no top-level outputs
    }
}

// Application entry point.
let modules = SplitLearning { /* ... */ }.build()?;
```

Produced `ModelProto`:

```proto
ModelProto {
  ir_version: 12,
  producer_name: "bytesandbrains",
  producer_version: "0.9.0",
  domain: "user.app",
  model_version: 1,
  opset_import: [
    {domain: "ai.onnx",                         version: 17},
    {domain: "ai.bytesandbrains.syscall",       version: 1},
    {domain: "ai.bytesandbrains.wire",          version: 1},
    {domain: "ai.bytesandbrains.role.model",    version: 1},
    {domain: "ai.bytesandbrains.role.aggregator", version: 1},
    {domain: "ai.bytesandbrains.role.compressor",  version: 1},
    {domain: "ai.bytesandbrains.role.peer_selector", version: 1},
  ],
  graph: GraphProto {
    name: "SplitLearning",
    node: [
      NodeProto {
        op_type: "SplitLearning",
        domain: "user.app",
        // The top-level graph just calls the SplitLearning function:
        input: [],
        output: [],
        metadata_props: [{
          key: "ai.bytesandbrains.module_instance",
          value: "SplitLearning#0",
        }],
      },
    ],
  },
  functions: [
    FunctionProto {
      name: "SplitLearning",
      domain: "user.app",
      // Generic placeholders (required, no default):
      attribute: ["backend"],
      // Concrete impls (defaulted; payload carries construction config):
      attribute_proto: [
        AttributeProto {
          name: "network_server",
          type: STRING,
          s: <bincode: NoBarrierOneShot construction state>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "framework_wire::NoBarrierOneShot",
          }],
        },
        AttributeProto {
          name: "network_client",
          type: STRING,
          s: <bincode: BarrierNetworkReqResp construction state>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "framework_wire::BarrierNetworkReqResp",
          }],
        },
        AttributeProto {
          name: "model",
          type: STRING,
          s: <bincode: BurnModel construction state + weights references>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "burn_integration::BurnModel",
          }],
        },
        AttributeProto {
          name: "codec",
          type: STRING,
          s: <bincode: ProductQuantization{M, N} state>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "framework_compressors::ProductQuantization",
          }],
        },
        AttributeProto {
          name: "gossip",
          type: STRING,
          s: <bincode: Cyclone{C, H, S} state>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "framework_peer_selector::Cyclone",
          }],
        },
        AttributeProto {
          name: "aggregator",
          type: STRING,
          s: <bincode: WeightAggregator state>,
          metadata_props: [{
            key: "ai.bytesandbrains.concrete_type",
            value: "framework_aggregators::WeightAggregator",
          }],
        },
      ],
      input: [],   // SplitLearning takes no graph inputs at top level
      output: [],  // and produces no top-level outputs (effects via wire + step)
      opset_import: [...],  // mirrors ModelProto.opset_import
      node: [
        NodeProto {
          op_type: "Recv",
          domain: "ai.bytesandbrains.wire",
          input: [],
          output: ["site_1", "site_2"],  // trigger, encoded_input
          attribute: [
            AttributeProto {
              name: "payload_type",
              type: TYPE_PROTO,
              tp: TypeProto.Tensor { elem_type: FLOAT, shape: <dynamic> },
            },
          ],
          metadata_props: [
            {key: "ai.bytesandbrains.concrete_type", value: "framework_wire::NoBarrierOneShot"},
            {key: "ai.bytesandbrains.instance",      value: "0"},
          ],
        },
        NodeProto {
          op_type: "Decompress",
          domain: "ai.bytesandbrains.role.compressor",
          input: ["site_2"],
          output: ["site_3"],
          metadata_props: [
            {key: "ai.bytesandbrains.concrete_type", value: "framework_compressors::ProductQuantization"},
            {key: "ai.bytesandbrains.instance",      value: "0"},
          ],
        },
        NodeProto {
          op_type: "Forward",
          domain: "ai.bytesandbrains.role.model",
          input: ["site_3"],
          output: ["site_4"],
          metadata_props: [
            {key: "ai.bytesandbrains.concrete_type", value: "burn_integration::BurnModel"},
            {key: "ai.bytesandbrains.instance",      value: "0"},
          ],
        },
        NodeProto {
          op_type: "Compress",
          domain: "ai.bytesandbrains.role.compressor",
          input: ["site_4"],
          output: ["site_5"],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_compressors::ProductQuantization"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "Sample",
          domain: "ai.bytesandbrains.role.peer_selector",
          input: [],
          output: ["site_6"],
          attribute: [
            AttributeProto { name: "n", type: INT, i: 5 },
          ],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_peer_selector::Cyclone"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "SendReqBatched",
          domain: "ai.bytesandbrains.wire",
          input: ["site_5", "site_6"],   // encoded_output, peers
          output: ["site_7", "site_8"],  // req_id, responses
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_wire::BarrierNetworkReqResp"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "RecvRespBatched",
          domain: "ai.bytesandbrains.wire",
          input: ["site_7"],
          output: ["site_9", "site_10"],  // trigger, batched_grads
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_wire::BarrierNetworkReqResp"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "Decompress",
          domain: "ai.bytesandbrains.role.compressor",
          input: ["site_10"],
          output: ["site_11"],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_compressors::ProductQuantization"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "Aggregate",
          domain: "ai.bytesandbrains.role.aggregator",
          input: ["site_11"],
          output: ["site_12"],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "framework_aggregators::WeightAggregator"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "Step",
          domain: "ai.bytesandbrains.role.model",
          input: ["site_12"],
          output: ["site_13"],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "burn_integration::BurnModel"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
        NodeProto {
          op_type: "Backward",
          domain: "ai.bytesandbrains.role.model",
          input: ["site_12"],
          output: ["site_14"],
          metadata_props: [{key: "ai.bytesandbrains.concrete_type", value: "burn_integration::BurnModel"},
                           {key: "ai.bytesandbrains.instance", value: "0"}],
        },
      ],
      value_info: [
        ValueInfoProto {
          name: "site_2",
          type: TypeProto.Tensor { elem_type: FLOAT, shape: <dynamic> },
        },
        // ... one per intermediate value, optional but useful for validation
      ],
    },
  ],
}
```

Everything the framework needs to load, validate, snapshot, and
execute this graph is in the `ModelProto`. The Rust struct declared
the components; the DSL methods recorded the `FunctionProto.attribute`
+ `attribute_proto` + `node` lists; the concrete impls' construction
state is baked into `attribute_proto.s`. At load:

1. The framework walks `function.attribute = ["backend"]` — the user
   must supply a `bb::Backend` Contract binding via the chained Node
   API; `#[derive(bb::Backend)]` generates the runtime bridge.
2. The framework walks `function.attribute_proto` — for each entry,
   look up the registered deserializer for `concrete_type`,
   instantiate from `.s` (or `.t` / `.g` / `.tp`).
3. The compiler runs: validates the recorded NodeProtos, infers
   peer classes, partitions by wire ops, and inserts the deadline /
   dedup / backoff / peer-health gate ops on every wire path. Role
   NodeProtos stay atomic-opset entries and dispatch at runtime via
   the per-Node atomic dispatch table.
4. Pre-flight: every used op_type in every opset has a covering
   binding. Failure surfaces typed errors before any execution.

---

## Part 10 — What ONNX gives us free

By riding inside canonical ONNX messages, BB inherits without code:

- **Netron, onnxruntime, Burn's loader, TFLite's converter, the
  Python `onnx` package** all read framework graphs natively. The
  vendor opsets show as namespaced ops; the rest is just ONNX.
- **Snapshot = ModelProto bytes.** Any ONNX-aware tool opens a BB
  snapshot. Diffing, lineage analysis, visualization — all free.
- **FunctionProto-based composition is how ONNX itself models reusable
  graphs.** Inlining, parameter substitution, multi-instance — all
  spec-defined behaviors.
- **`opset_import` solves version negotiation.** The same mechanism
  used between PyTorch and ONNX Runtime works for BB graphs across
  Nodes and across framework versions.
- **`GraphProto.initializer`-based weights round-trip** without any
  serializer code on our side. A `BurnModel` whose construction state
  references initializer names exports a graph any ONNX consumer can
  load with weights intact.
- **`TensorAnnotation` for quantization** is canonical. PQ codebooks,
  scale/zero-point pairs, etc. live where every consumer expects them.
- **`TypeProto.Opaque`** is the right primitive for our domain types
  without inventing custom representations. Python's `onnx` library
  knows Opaque types as opaque (it preserves the `domain` + `name`
  without trying to interpret them); the framework's deserializer
  registry interprets them where they need interpretation.
- **Role-op bodies decompose into shared opsets or terminate at a
  single atomic-op NodeProto**, mirroring ONNX's standard-op vs
  vendor-extension distinction. Same conceptual shape applied at the
  role boundary. Toolchain knowledge transfers directly.

---

## Part 11 — The Rust-dispatch boundary (closing principle)

> **Graph decomposition stops at Rust dispatch.**
>
> Every Op in a loaded ModelProto is either (a) graph-expressible —
> its body is recoverable as a sub-`GraphProto` and the compiler may
> inline it; or (b) Rust-dispatched — the BB engine calls a Rust
> function the bound runtime supplied, and from that point the op is
> opaque to the IR. There is no third mode.
>
> This is what makes a BB Node an ONNX runtime: standard ops
> (`ai.onnx`) and graph-expressible role ops are dispatched through
> normal graph-execution machinery; opaque role ops + framework
> primitives (`ai.bytesandbrains.syscall`, `ai.bytesandbrains.wire`)
> are the vendor-specified dispatch surface.

Everything above the Rust dispatch boundary is graph-traversable:
inlineable, collapsible, partitionable, snapshottable, exportable to
any ONNX consumer. Everything below is opaque: dispatched only by the
bound runtime's Rust function, never further decomposed.

A backend's `execute_subgraph` is also a Rust-dispatch terminal: once
the BB engine hands the GraphProto to the backend, what the backend
does internally (JIT compile, fuse kernels, dispatch to ONNX Runtime,
hand to a GPU) is invisible to the IR. The IR's contract is
`(inputs, GraphProto) -> outputs`; the implementation is the
vendor's.

This invariant is what lets BB cleanly compose: graphs flow through
graphs (composition, inlining, collapse, partition); Rust runs Rust
(backends, role impls, framework primitives). The two layers don't
leak into each other. The transition is explicit and observable in
the IR via the `(domain, op_type)` pair of each NodeProto: anything
under a registered atomic-op opset is Rust dispatch, anything else is
graph-level composition.

## Update — M-phase additions

This section reflects the M1–M11 + Phase D landings.

### Module ports

Ports are declared in the Module body recording surface: `g.input(name)`
for local inputs, `g.output(name, value)` for local outputs,
`g.net_out(port, peers, value)` for network outputs, and
`g.lookup_output(port)` to pull a value the compiler has wired in from
a network input. The compiler infers the port set from the recorded
body.

### Module::bootstrap recording

`Module::bootstrap(&self, g: &mut Graph)` is the author entry
point for pre-body initialization. The trait method defaults to
no-op
(`bb-dsl/src/module.rs`); authors override it next to
`Module::body`:

```rust
impl Module for VectorStore {
    fn bootstrap(&self, g: &mut Graph) {
        // Stage initial inputs via `g.input(name)` — same recorder
        // call body uses for top-level formals. Each input
        // becomes a declared formal on the emitted
        // `"<module>__bootstrap"` FunctionProto, addressable from
        // the host via `BootstrapRequest::inputs`.
        let seed_corpus = g.input("seed_corpus");
        let _ = self.index.train(g, seed_corpus);
    }

    fn body(&self, g: &mut Graph) {
        let query = g.input("query");
        let _ = self.index.search(g, query, 10);
    }
}
```

`Module::build()` emits the bootstrap recording as a sibling
`FunctionProto` named `"<module>__bootstrap"` stamped with
`metadata_props["ai.bytesandbrains.module_phase"] = "bootstrap"`
(see Part 2). The bootstrap function's `function.input` list is
the recorder's seen `g.input(name)` calls inside the bootstrap
recording, in order.

The host stages bytes for each declared formal via the F5
immediate-fire entry point:

```rust
node.run_bootstrap(bb::engine::BootstrapTarget::ModuleRequests(&[
    bb::engine::BootstrapRequest {
        target: "VectorStore",
        inputs: &[("seed_corpus", corpus_bytes.as_slice())],
    },
]))?;
node.poll(cx); // drives the bootstrap body to quiescence
```

The engine validates `inputs` against the target's declared
formals at the boundary (`bb-runtime/src/engine/core.rs:1488-1528`):
`UnknownInput` rejects extras, `MissingInput` rejects gaps,
`UnknownTarget` rejects unknown names — all before any bytes
stage. Validated requests follow the Principle 1a copy
(`try_charge → try_reserve_exact → extend_from_slice`,
`bb-runtime/src/engine/core.rs:1534-1567`) and the framework-owned
`BytesValue` carriers land in the bootstrap's slot table entries
at the body's fresh `ExecId`. Caller's borrowed `&[u8]` slices
may drop the moment `run_bootstrap` returns.

A bootstrap that takes no formals records zero `g.input` calls;
the host kicks it via `Node::run_bootstrap(BootstrapTarget::All)`
(every install-order bootstrap, no inputs needed) or
`Node::run_bootstrap(BootstrapTarget::ModuleNames(&["<target>"]))`
(sugar for empty-input batch). Component bootstraps fire via
`BootstrapTarget::Slots(&["<slot>"])`.

### Composition API

The canonical composition shape:

```rust
let cell_out = self.cell.call()
    .input("query", q)
    .input("incoming_grad", grad)
    .build(g);                              // returns ModuleOutputs<'_>
let response = cell_out.output("response"); // by name, not position
```

### Network primitives at module boundaries

`g.net_out(name, peers, value)` is the single-slot network-sink
primitive on the recorder. It emits a `wire.Send` NodeProto
and registers `name` as a network-typed output port on the
current function. `peers` must be a `Vec<PeerId>` output;
`value` is the payload. The compiler's `partition_by_wire_ops`
cuts the graph at `wire.Send` boundaries; `synthesize_wire_recvs`
materializes the matching `wire.Recv` NodeProto on every
consumer-side partition that reads the named port. `wire.Recv`
is compiler-synthesized and does not appear in user-authored
Module bodies. The receive site's type is inferred by the
TypeSolver from the matching `wire.Send` payload type.

### Composition: bundling typed Outputs

`g.bundle(parts: &[Output]) → Output` packs N typed Outputs into
ONE composite Output for transmission through a single port; the
matching `g.unbundle(composite, &[&TypeNode, …]) → Vec<Output>`
decomposes the envelope back into N typed children on the
receiver. The composite envelope rides `TYPE_COMPOSITE` (a new
concrete leaf under `Any`); the wire infrastructure already
supports any wire-eligible value through `wire.Send`, so the
single composite hop reuses the existing `net_out` machinery
verbatim.

The recorded NodeProto shapes:

- **Bundle** (`domain = "ai.bytesandbrains.composite"`,
  `op_type = "Bundle"`): variable-arity input `[parts[0].name,
  parts[1].name, …, parts[N-1].name]`; single output port carrying
  the assembled `CompositeValue`. Stamps
  `ai.bytesandbrains.composite.child_count` (INT) and
  `ai.bytesandbrains.composite.child_types` (comma-joined
  TypeNode denotations).
- **Unbundle** (same domain, `op_type = "Unbundle"`): single input
  `[composite.name]`; N outputs named `child_0..child_{N-1}` with
  `ValueInfoProto.denotation` stamped from the corresponding
  `part_types[i].denotation`. Each child output is the original
  concrete `SlotValue` carrier the sender bundled (`PeerIdValue`,
  `CpuTensor`, …), not a `BytesValue`. Downstream consumers
  downcast directly via
  `as_any().downcast_ref::<T>()` against the declared denotation.

#### Type-fidelity story

`CompositeValue` is in-process typed: its `children` field carries
`Vec<Box<dyn SlotValue>>` (`bb-runtime/src/syscall/values.rs:80-85`),
not a `Vec<(u64, Vec<u8>)>` bag. Bundle's invoke clones each input
via `SlotValue::clone_boxed`
(`bb-ops/src/syscalls/composite/bundle.rs:43-46`); Unbundle's
invoke emits each child via `clone_boxed`
(`bb-ops/src/syscalls/composite/unbundle.rs:61-64`). In-process
forwarding pays one `clone_boxed` per child — no bincode encode,
no decode, no opaque `BytesValue` hop.

At the wire boundary `SlotValue::to_wire_bytes` invokes
`CompositeValue`'s hand-rolled `Serialize`
(`bb-runtime/src/syscall/values.rs:114-131`), which encodes each
child as a `(type_hash, child.to_wire_bytes())` tuple. The
receiver's `Deserialize`
(`bb-runtime/src/syscall/values.rs:133-165`) reads each
`(type_hash, bytes)` pair, looks the hash up in
`wire_decoder_registry()` (`bb-ir/src/slot_value.rs:199-212`), and
materialises a typed `Box<dyn SlotValue>` carrier — so Unbundle
on the receiver downcasts to `T` even after a cross-Node hop.

The decoder registry is populated automatically by every
`register_type_node!(MyValue, &TYPE_X)` invocation
(`bb-ir/src/slot_value.rs:237-256`); a peer running a build that
does not know a given carrier's `type_hash` surfaces a typed
`SlotValueError::DecodeFailed` on receive rather than crashing.

The intended pattern: pack `(params, metadata)` once with
`g.bundle`, ship through a single `net_out`, unpack on the
receiver with `g.unbundle`. Single-port DAG semantics hold
because the bundle/unbundle pair traverses one Output between
peers; `synthesize_wire_recvs` keeps its single-port cross-
partition resolution.

Empty `parts` (Bundle) or empty `part_types` (Unbundle) panic at
recording time — composition of zero values has no semantic
meaning and is almost certainly an author bug.

### PeerSelector + SelectParams

`bb::contracts::PeerSelector::select(ctx, params, completion)` is
the generic peer-selection surface (see [ROLES.md](ROLES.md) for
the canonical `ctx` / `completion` shape every Contract method
follows). `SelectParams` carries:

- `Random { n }` — sample N peers uniformly.
- `NearKey { key, n }` — closest N peers under the selector's
  metric.
- `All` — every peer in the current view.

Concrete impls handle the variants they support and fail the
unsupported ones via `ContractResponse::Now(Err(...))`. Built-in
selectors: `GlobalRegistryServer` (centralized peer registry), `ConstantView` (fixed peer list).

### Wire op cardinality

`extract_dest_peers` accepts ONLY `PeerIdVecValue` at position 1.

### RecordedModule.module_tree

Every recorded module carries a `module_tree: Vec<ModuleTreeNode>`
with port declarations + parent/child relationships. The
`partition_by_module_boundary` pass walks this tree and emits
one partition per module + a NetworkEdge per matching
`g.net_out` → `g.lookup_output` pair.

### Multi-target compile + entry-point semantics

`Compiler::compile(module) → ModelProto` emits a single
`ModelProto` whose `functions[]` carries **every partition** produced
by `partition_by_wire_ops`. One compile call → one proto, regardless
of partition count. A federated module that partitions into `Client`
+ `Server` emits both as sibling `FunctionProto`s under
`model.functions`; sub-Module bodies and the synthesized helpers
(gate carriers, lifecycle containers) ride alongside in the same
list. The compilation passport (`ai.bytesandbrains.compiled = "v1"`)
+ per-target binding metadata
(`ai.bytesandbrains.binding.<target>.<slot> =
"<role>|<TYPE_NAME>|<slot_id>"`) stamp onto `model.metadata_props`
keyed by partition name, so the same proto carries every target's
binding spec without colliding.

`bb::install(peer_id, addresses, model, targets: &[&str], config)`
(`src/install.rs:235-338`) takes an ordered slice of target names
and installs **all** of them onto one Node. The host picks which
partitions live on each peer by passing different `targets` slices
to `install` on different peers; the proto is the same artifact
across the deployment. A peer hosting both halves of a federated
round receives `&["Client", "Server"]`; a single-Node demo passes
`&["MyModule"]`. The order is observable: bootstrap functions fire
in slice order — `BootstrapState::install_order`
(`bb-runtime/src/engine/bootstrap.rs:256-296`) is the append-only
queue the seeder walks front-to-back. See ENGINE.md §6.8.

Per-target lookup uses exact-match against `model.functions[].name`
first, then falls back to the compiler's content-hash suffix
(`<target>#<hash>`) — the partition pass stamps the hash so two
modules emitting partitions named `Client` from different
authoring crates don't collide
(`src/install.rs:356-373`).

The compiled `ModelProto` is shareable across targets at the Node
layer: `bb::install` wraps it in `Arc<ModelProto>` once via
`Node::set_model` and shares the handle across every
`Node::register_module` call so the proto bytes live on the Node
exactly once
(`src/install.rs:332-335`, `bb-runtime/src/node/mod.rs:55-65`,
`530-548`).