agent-phone 0.1.0

Minimal sync RPC between two AI agents (Rust port of @p-vbordei/agent-phone). Self-custody keys, Noise-framework handshake, DID-bound WebSocket.
Documentation
# Architecture — agent-phone (Rust)

## Goal

Port the [`agent-phone` v0.1 spec](../SPEC.md) to idiomatic Rust while staying **wire-format-identical** with the [TypeScript reference](https://github.com/p-vbordei/agent-phone). "Identical" here means: given the same DIDs, prologue, and ephemeral keys, every byte the Rust implementation puts on the wire — Noise handshake messages, post-handshake transport frames, JCS-canonical envelopes — matches what the TS reference produces. The shared C4 hex vector enforces the envelope side of that contract in CI on both sides.

## Module map

The Rust crate mirrors the TS reference module-for-module, so a change on one side is easy to mirror on the other.

| Rust (`src/`) | TS reference (`src/`) | Responsibility |
|---|---|---|
| `envelope.rs` | `envelope.ts` | JCS-canonical encode/decode (`serde_jcs`) + envelope validation. |
| `noise.rs` | `noise.ts` | `Noise_XK_25519_ChaChaPoly_BLAKE2s` state machine + primitives. |
| `frame.rs` | `frame.ts` | Post-handshake `seal`/`open` over the Noise CipherStates. |
| `did.rs` | `did.ts` | `did:key` codec + Ed25519↔X25519 conversion. |
| `session.rs` | `session.ts` | Stream multiplexing, unary RPC, credit-based backpressure, cancel. |
| `client.rs` | `client.ts` | WebSocket dial + handshake + Session wiring. |
| `server.rs` | `server.ts` | WebSocket accept (with `?caller=<did>` query) + handshake + Session per peer. |
| `error.rs` | — | `thiserror`-based error enum. |

## Dependency choices

**Crypto: hand-rolled Noise XK, not `snow` / `dissononce`.** The wire-format contract is byte-for-byte equality with TS, including specifics like the BLAKE2s-only HMAC, the nonce layout (`0x00000000 || u64 LE counter`), and the HKDF-2 split orientation per role. The mature Noise libraries can do XK, but each has its own framing quirks and they're hard to pin to "exactly what `@noble/curves` does." So Noise is reimplemented directly from the protocol description:

- **`chacha20poly1305`** (RustCrypto) for the AEAD.
- **`blake2`** (RustCrypto) for `Blake2s256`.
- **`curve25519-dalek`**'s `MontgomeryPoint` + `clamp_integer` for the `dh()` operations. We avoid `x25519-dalek` because we want explicit control over the clamping (Noise XK feeds the same X25519 key both as `e_priv` and `static_priv`, and we want to know exactly which clamping rule applies where — `curve25519-dalek` exposes that primitive directly).
- **`ed25519-dalek`** + **`sha2`** for the DID side: Ed25519 keypair generation + the SHA-512 path used in `ed25519_priv_to_x25519`.
- **`bs58`** for the `did:key` multibase encoding.
- **`serde_json` (with `preserve_order`) + `serde_jcs`** for RFC 8785 canonical-JSON bytes. `preserve_order` is required so envelope round-trips don't reorder keys before JCS even sees them.
- **`tokio-tungstenite`** for client and server WebSockets.

### Ed25519 → X25519 derivation: not BIP-32

Both ports derive a static X25519 keypair from each agent's Ed25519 signing key, so the DID alone determines the Noise static key. The derivation is **not** any sort of HD wallet path:

- **Public key:** Montgomery `u = (1 + y) / (1 − y) mod p` from the Edwards `y`-coordinate (with the sign bit cleared). `ed25519_pub_to_x25519` uses `curve25519-dalek`'s `EdwardsPoint::to_montgomery`. Matches `@noble/curves`' `edwardsToMontgomeryPub`.
- **Private key:** `SHA-512(seed)[0..32]`, then RFC 7748 X25519 clamping (`h[0] &= 248; h[31] &= 127; h[31] |= 64`). Matches `edwardsToMontgomeryPriv`. Implemented directly in `did.rs::ed25519_priv_to_x25519` — eight lines, no abstractions.

The temptation when you see "derive an X25519 key from an Ed25519 key" is to reach for HKDF or BIP-32. Doing that here would silently break interop with the TS reference. The SHA-512-then-clamp construction is the only thing that produces the same Montgomery scalar the TS path produces, which is the only thing that produces the same `es`/`se` shared secrets, which is the only thing that lets the handshake succeed at all.

## Byte-determinism invariants

Three things must agree across all ports for a session to even establish, let alone interoperate:

1. **Noise handshake bytes.** Prologue layout (`"agent-phone/1" || u16-be(len(init_did)) || init_did || u16-be(len(resp_did)) || resp_did`), `mixHash` of the responder's pre-known static at start, exact nonce format, HKDF-2 output order per side at split. Pinned by `tests/noise.rs`.
2. **Frame bytes.** A WebSocket binary message *is* one Noise transport frame: ChaChaPoly ciphertext of the JCS envelope, 16-byte tag appended. The WS message boundary is the frame boundary; no extra length prefix. Pinned by `tests/frame.rs`.
3. **Envelope JCS bytes.** RFC 8785 lexical key ordering, no whitespace, no extra precision. Pinned by `tests/conformance.rs::c4_frame_determinism` against `vectors/c4.json` — the same vector the TS suite verifies.

## WebSockets quirk: capturing `?caller=<did>` before the upgrade

The spec requires the initiator to send its own DID as a `?caller=<did>` query parameter so the responder can build the matching Noise prologue *before* the handshake reads message 1. With `tokio-tungstenite`, the only place the inbound HTTP request is visible is the callback passed to `accept_hdr_async` (or `process_request` on the unstable API).

Server-side flow (`server.rs::handle_connection`):

1. Wrap a `Mutex<Option<String>>` and pass a closure into `accept_hdr_async` that parses `req.uri().query()`, extracts `caller=…`, URL-decodes it, and stores the DID into the slot.
2. After the upgrade resolves, lock the slot and pull the DID out.
3. If no DID was supplied, close the connection with a handshake error before any Noise bytes flow.

The closure also injects `Sec-WebSocket-Protocol: agent-phone.v1` into the response headers so the negotiated subprotocol is set.

## Testing strategy

`cargo test` runs 21 tests in well under a second:

- Unit tests per module: `did`, `envelope`, `frame`, `noise` (each in `tests/*.rs`).
- End-to-end loopback in `tests/e2e.rs`: ephemeral-port WS server + client + round-trip + streaming + cancel.
- Conformance C1–C4. C1/C2/C3 live alongside the e2e tests; C4 is its own file because it reads the shared hex vector.

The end-to-end tests are the strongest guard against accidental wire-format drift, because any byte-level disagreement between client and server surfaces as a handshake or AEAD failure during the test.