x0x 0.19.21

Agent-to-agent gossip network for AI systems — no winners, no losers, just cooperation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
# x0x Comprehensive Test Suite Guide

**x0x version:** 0.19.17
**Last updated:** 2026-05-01

This document describes the production test architecture for x0x — Rust
unit/integration tests, end-to-end shell harnesses, GUI parity checks, and
the cross-surface parity proofs against Communitas (Dioxus + Apple).

The capability source of truth is [`docs/parity-matrix.md`](docs/parity-matrix.md):
every capability in x0x must be reachable — and behave identically — from
every supported surface (REST, CLI, embedded GUI, Communitas Dioxus,
Communitas Apple). Each row in the matrix is backed by a test in this
guide.

---

## Test Architecture

```
┌──────────────────────────────────────────────────────────────────────┐
│  tests/e2e_proof_runner.sh --all     (single-command release proof)  │
└──────────────────────────────────────────────────────────────────────┘
       ├── --rust-tests       cargo nextest (52 integration files, 1006+ tests)
       ├── --comprehensive    tests/e2e_comprehensive.sh          (3 local daemons)
       ├── --stress           tests/e2e_stress_gossip.sh          (drop detection)
       ├── --chrome           tests/e2e_gui_chrome.mjs            (Playwright GUI)
       ├── --dioxus           tests/e2e_communitas_dioxus.sh      (Dioxus IPC)
       ├── --xcuitest         CommunitasGoldenPathsUITests.swift  (Apple UI)
       ├── --vps              tests/e2e_vps.sh                    (6 region matrix, SSH-per-call, legacy)
       ├── --vps-mesh         tests/e2e_vps_mesh.py               (6 region matrix, mesh-relay)
       ├── --vps-groups       tests/e2e_vps_groups.py             (6 region groups + contacts dogfood)
       ├── --dogfood-local    tests/e2e_dogfood_local.sh          (2-instance ~5 s smoke)
       ├── --dogfood-groups   tests/e2e_dogfood_groups.sh         (3-instance groups dogfood)
       └── --lan              tests/e2e_lan.sh                    (Mac Studios)
```

> **Dogfood harness family — Phases A/B/C/D.** A coordinated set of
> harnesses that exercise x0x via x0x's own primitives (DMs, named
> groups, group messages) instead of curl-from-outside. They share a
> single Phase-A wire protocol (`x0xtest|cmd|`/`res|`/`hop|` payload
> prefixes) implemented by `tests/runners/x0x_test_runner.py` deployed
> as a systemd service on every VPS. The Mac harness opens **one** SSH
> tunnel to an anchor node — every assertion thereafter is a real
> protocol round-trip.
>
> | Phase | Harness | Use |
> |---|---|---|
> | A | `e2e_vps_mesh.py` | All-pairs DM matrix (§7b) |
> | B | `e2e_vps_groups.py` / `e2e_dogfood_groups.sh` | Groups + contacts (§7c) |
> | C | `e2e_deploy.sh --mesh-verify` | Deploy + integrated mesh verification (§7d) |
> | D | `e2e_dogfood_local.sh` | Fast 2-instance pre-commit smoke, ~5 s (§7e) |

Every phase writes proof artefacts under `proofs/<timestamp>/` so a release
can be replayed and audited after the fact.

---

## 1. Rust Unit + Integration Tests

**Runner:** `cargo nextest run --all-features --workspace`

**Scope:** 52 integration files in `tests/`, plus inline `#[cfg(test)]`
modules. ~1,006 tests at last release-blocking run.

Highlights (full inventory in `tests/`):

| File | Coverage |
|------|----------|
| `identity_integration.rs` | Three-layer identity, keypair management, certificates |
| `identity_unification_test.rs` | `MachineId == ant-quic PeerId`, announcement key derivation |
| `trust_evaluation_test.rs` | TrustEvaluator decisions, machine pinning, ContactStore mutations |
| `announcement_test.rs` | Announcement round-trips, NAT fields, discovery cache, reachability |
| `connectivity_test.rs` | ReachabilityInfo heuristics, ConnectOutcome, `connect_to_agent()` |
| `peer_lifecycle_integration.rs` | ant-quic 0.27.x lifecycle bus events |
| `crdt_integration.rs` / `crdt_convergence_concurrent.rs` / `crdt_partition_tolerance.rs` | TaskList CRUD, CRDT convergence, partition recovery |
| `kv_store_integration.rs` | KV CRUD, access policies, CRDT sync |
| `mls_integration.rs` | Group encryption, key rotation |
| `named_group_integration.rs` + `named_group_*` | Named groups, invites, policy, public messages, state-commit, C2 live, D4 apply, E live |
| `direct_messaging_integration.rs` | Direct send/receive, connection lifecycle |
| `exec_acl_unit.rs` + inline `src/exec/service.rs` tests | Tier-1 exec ACL parsing, strict argv templates, shell metachar rejection, output cap/drain state, duration cap, concurrency slots, frame prefix routing |
| `file_transfer_integration.rs` | Send / accept / reject / progress |
| `presence_*` | Beacons, FOAF, adaptive failure detection |
| `nat_traversal_integration.rs` | NAT hole-punching |
| `bootstrap_cache_integration.rs` | Cache persistence, quality scoring |
| `gossip_cache_adapter_integration.rs` | Gossip cache adapter wrapping bootstrap cache |
| `rendezvous_integration.rs` | Rendezvous shard discovery |
| `upgrade_integration.rs` | Self-update manifest signing, verification, rollout |
| `vps_e2e_integration.rs` | VPS bootstrap node smoke |
| `api_coverage.rs` + `api_manifest.rs` + `parity_cli.rs` | REST/CLI parity (every endpoint has a CLI command) |
| `gui_smoke.rs` + `gui_named_group_parity.rs` | Embedded GUI smoke + named-group parity |
| `ant_quic_0272_surface.rs` | Pass-through smoke for new ant-quic 0.27.x surfaces |
| `proptest_*` | Property-based tests for connectivity, CRDT, files, groups, KV, direct-msg |

```bash
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo nextest run --all-features --workspace
```

CI builds enforce `RUSTDOCFLAGS="-D warnings"` on `cargo doc --all-features --no-deps`.

---

## 2. Local End-to-End — `e2e_comprehensive.sh`

**Path:** `tests/e2e_comprehensive.sh`
**Scope:** 3 local daemons (`alice`, `bob`, `charlie`) on isolated ports +
identity dirs, exercising **all 75+ REST endpoints** across 19 categories.

What it covers:

- Contacts lifecycle (add / block / trust / forget)
- Machine pinning enforcement
- Trust evaluator — all 5 decision paths
- MLS group full lifecycle (add / remove / re-add / encrypt / decrypt)
- Named groups (invite validation, leave / rejoin, policy, roles, bans)
- KV stores (multi-key, update, access control)
- Presence — every endpoint (`/presence/online`, `/foaf`, `/find/:id`,
  `/status/:id`, `/events` SSE)
- Direct messaging round-trip
- Pub/sub publish + subscribe + WebSocket live feed
- File transfer offer / accept / reject
- Self-update apply (`POST /upgrade/apply` concurrency)
- Diagnostics endpoints (`/diagnostics/connectivity`, `/diagnostics/gossip`, `/diagnostics/dm`, `/diagnostics/exec`)
- Seedless (`charlie` with `--no-hard-coded-bootstrap`) bootstrap

```bash
cargo build --release
bash tests/e2e_comprehensive.sh                  # ~2 min
```

---

## 3. Local Exec End-to-End — `e2e_exec.sh`

**Path:** `tests/e2e_exec.sh`
**Scope:** 2 local daemons with restart-loaded exec ACLs. This is the
Tier-1 SSH-free remote-exec acceptance harness.

What it covers:

- Stable agent/machine identity capture before ACL generation
- Explicit `--exec-acl <PATH>` startup on both daemons
- Trusted card exchange and mesh/gossip-DM delivery
- Successful allowlisted argv over `POST /exec/run`
- Structured `argv_not_allowed` denial for a mismatched argv
- `stdin_b64` to `/bin/cat` with stdout cap truncation and warning frames
- `/exec/sessions`, `/diagnostics/exec`, and JSONL audit events for
  request, denial, warning, and truncated exit

```bash
cargo build --release --bin x0xd
bash tests/e2e_exec.sh
```

---

## 4. Gossip Stress / Drop Detection — `e2e_stress_gossip.sh`

**Path:** `tests/e2e_stress_gossip.sh`
**Scope:** N-daemon stress harness that asserts the delivery claim it
documents. Strictly enforces
`delivered_to_subscriber >= MESSAGES * MIN_DELIVERY_RATIO`
(default 1.0) — i.e. **zero drops on every subscriber**, not just zero
drops on the publisher.

Powered by the `GET /diagnostics/gossip` endpoint introduced in v0.18.0,
which exposes atomic counters at every stage of the pipeline:

```
publish → incoming → decoded → delivered → subscriber-channel-closed
```

The harness fails fast if any subscriber's `decoded → delivered`
delta is non-zero, isolating drops above the wire and below the app.

```bash
MESSAGES=500 SETTLE_SECS=15 PUBLISH_DELAY_MS=20 \
  bash tests/e2e_stress_gossip.sh
```

Related load-isolation harnesses in the same family:

- `tests/e2e_hunt12c_pubsub_load_isolation.sh` — pubsub under load
- `tests/e2e_hunt12e_release_manifest_storm.sh` — release-manifest flood
- `tests/e2e_slow_consumer.sh` — back-pressure handling
- `tests/e2e_soak_3node.sh` — long-running 3-node soak
- `tests/leak_hunt_idle.sh` / `tests/leak_hunt_publisher.sh` — memory leak hunts

---

## 5. GUI Parity — Chrome / Playwright

**Path:** `tests/e2e_gui_chrome.mjs` (driver) + `tests/e2e_gui_chrome.sh`
(wrapper)

Drives `src/gui/x0x-gui.html` via real Chrome and asserts every capability
in [`docs/parity-matrix.md`](docs/parity-matrix.md) round-trips against the
live `x0xd` daemon — same-origin via the daemon's `/gui` route.

Captures rich proof artefacts:

| Artefact | Purpose |
|----------|---------|
| `chrome-gui.har` | Full network HAR |
| `chrome-gui.console.jsonl` | Console log stream |
| `chrome-gui.screenshot.png` | Final-state screenshot |
| `gui-parity-report.json` | Per-capability pass/fail matrix |

Recent runs (e.g. `proofs/chrome-20260421-v0182/`) verify 13/13 GUI
capabilities including live pubsub round-trip, named-group invite/join,
KV CRUD, presence FOAF, and self-upgrade.

```bash
# Prereq (one-off)
npx playwright install chromium

# Run (daemon must be up on http://127.0.0.1:12700)
node tests/e2e_gui_chrome.mjs --proof-dir proofs/chrome-$(date +%s)
```

A complementary fast smoke variant lives in `tests/gui_smoke.rs` and
`tests/gui_named_group_parity.rs` (pure Rust, runs under nextest).

---

## 6. Communitas Dioxus Parity

**Top-level harness:** `tests/e2e_communitas_dioxus.sh` (in this repo)
**Detailed harness:** `../communitas/communitas-dioxus/tests/e2e/` +
`../communitas/communitas-dioxus/tests/e2e.sh`

The Dioxus desktop app consumes `communitas-x0x-client` directly. The
e2e harness drives it with `COMMUNITAS_TEST_MODE=1` and exercises the
golden paths via the app's built-in JSON IPC test hooks, asserting each
capability round-trips against a live `x0xd` daemon.

Per-feature E2E test modules in `communitas-dioxus/tests/e2e/`:

- `identity.rs` — agent ID / card, import, export
- `connectivity.rs` — connect, probe, health snapshot, peer lifecycle
- `groups.rs` — create, invite, join, policy, leave
- `kv_store.rs` — CRUD, access policies
- `presence.rs` — online, FOAF, find, status, SSE
- `trust_contacts.rs` — add / block / trust + machine pinning
- `upgrade.rs` — self-update apply

```bash
# From x0x repo root (daemon must be running on 12700)
bash tests/e2e_communitas_dioxus.sh                # quick smoke

# Full Dioxus parity sweep with proof bundle
cd ../communitas/communitas-dioxus
bash tests/e2e.sh                                  # writes proofs/dioxus-parity-YYYYMMDD/
```

---

## 7. Communitas Apple Parity — XCUITest

**Path:** `../communitas/communitas-apple/Tests/CommunitasUITests/CommunitasGoldenPathsUITests.swift`

UI-level golden-path tests that drive the full macOS app via
`XCUIApplication` and verify every capability in the parity matrix is
reachable from the Apple surface. Intentionally narrow but real — each
test walks one end-to-end flow and asserts on observable UI state, not
private APIs.

**16 golden paths** at v0.19.x:

1. App launches and shows identity
2. Direct-message composer surfaces send result
3. Publish + subscribe topic
4. Create + join named group
5. KV store round-trip
6. Identity export surface reachable
7. Connect-agent surface reachable
8. Discover-agents list present
9. Four-word bootstrap input present
10. Live feed reachable
11. File-transfer send button present
12. Group policy surface reachable
13. Group discover surface reachable
14. Presence FOAF button present
15. Presence status surface reachable
16. Presence SSE toast wiring

```bash
# Prereq: x0xd running on 127.0.0.1:12700, app signed (or ad-hoc) so
# XCUITest can launch it.
cd ../communitas/communitas-apple
xcodebuild \
  -scheme Communitas \
  -destination 'platform=macOS' \
  -only-testing:CommunitasUITests \
  test
```

CI machines without a macOS runner can set `XCUITEST_SKIP=1` to fast-pass.

A complementary live-daemon Swift unit-test layer lives in
`Tests/X0xClientTests/` with `DaemonFixture` (`X0X_LIVE_TESTS=1 swift test`)
covering identity / trust / KV wire-shape decoding.

---

## 8. Multi-Region VPS Test — `e2e_vps.sh`

**Path:** `tests/e2e_vps.sh`
**Scope:** 6 production bootstrap nodes, all-pairs matrix.

| Node | IP | Location | Provider | saorsa- |
|------|-------------|----------|----------|--------|
| NYC | 142.93.199.50 | New York, US | DigitalOcean | saorsa-2 |
| SFO | 147.182.234.192 | San Francisco, US | DigitalOcean | saorsa-3 |
| Helsinki | 65.21.157.229 | Helsinki, FI | Hetzner | saorsa-6 |
| Nuremberg | 116.203.101.172 | Nuremberg, DE | Hetzner | saorsa-7 |
| Singapore | 152.42.210.67 | Singapore, SG | DigitalOcean | saorsa-8 |
| Sydney | 170.64.176.102 | Sydney, AU | DigitalOcean | saorsa-9 |

What it asserts (~102 assertions):

- Health, identity, mesh state on all 6 nodes
- All-pairs direct messaging matrix (**30 directed pairs**)
- Three independent surface proofs per pair: REST API, CLI, GUI (WebSocket)
- MLS group encryption across continents
- Named groups, KV stores, task lists, file transfer
- Presence (FOAF, online, find, status)
- Contacts & trust lifecycle
- Constitution serving, self-upgrade, WebSocket session lifecycle

Every assertion either echoes actual API data or verifies a round-trip with
a unique `PROOF_TOKEN` — no hallucinated test results.

```bash
# 1. Cross-compile + deploy + collect tokens (writes tests/.vps-tokens.env)
bash tests/e2e_deploy.sh                           # ~5 min

# 2. Run multi-region matrix (SSH-per-call; legacy harness)
bash tests/e2e_vps.sh                              # ~4 min, SSH-bound
```

### VPS Port Configuration

| Port | Protocol | Purpose | Binding |
|------|----------|---------|---------|
| **5483** | UDP/QUIC | Transport (gossip network) | `[::]:5483` or `0.0.0.0:5483` |
| **12600** | TCP/HTTP | REST API on VPS nodes | `127.0.0.1:12600` (`/etc/x0x/config.toml`) |
| **12700** | TCP/HTTP | REST API local-dev default | `127.0.0.1:12700` |

API tokens live at `/root/.local/share/x0x/api-token` on the VPS nodes;
`e2e_deploy.sh` collects them into `tests/.vps-tokens.env`.

### SSH Notes for macOS

Sequential multi-host SSH on macOS needs
`-o ControlMaster=no -o ControlPath=none -o BatchMode=yes` to avoid
multiplexing hangs. The harness already passes these flags. Even with
those flags, the legacy `e2e_vps.sh` issues 60+ SSH+curl pairs in tight
loops — Sydney/Singapore have ~4 s SSH RTT from a US/EU laptop, so the
test is dominated by harness startup cost rather than daemon latency.
Use the mesh harness in §7b for clean cross-region results.

### Why send/receive failures in `e2e_vps.sh` are usually harness noise

If a run reports `{"error":"curl_failed"}` on Singapore- or Sydney-targeted
calls, the failure happened at the SSH/curl layer **before** the daemon
ever saw the request. Confirm with a manual probe:

```bash
time ssh -o ControlMaster=no -o ControlPath=none -o BatchMode=yes \
  root@<singapore_ip> "curl -sf http://127.0.0.1:12600/health"
```

A 4 s+ wall-clock here matches the failure pattern. Switch to
`e2e_vps_mesh.py` (§7b) to remove SSH from the per-assertion path.

---

## 7b. Mesh-Driven VPS Test — `e2e_vps_mesh.py` *(recommended)*

**Path:** `tests/e2e_vps_mesh.py` (orchestrator) + `tests/runners/x0x_test_runner.py`
(per-node service) + `tests/runners/x0x-test-runner.service` (systemd unit)

**Scope:** same all-pairs DM matrix as `e2e_vps.sh`, but drives every
remote action through x0x's own pubsub instead of through SSH.

### Architecture

```
Mac orchestrator ──── 1 SSH tunnel ───► NYC daemon ──── QUIC mesh ────► all 6 nodes
       │                                    │
       │ /publish        x0x.test.control.v1│
       │ /events SSE     x0x.test.results.v1│
       │                                    │
       └── publishes commands ──┐           ├── runner on each node:
                                │           │    • subscribes to control topic
                                │           │    • subscribes to /direct/events
                                │           │    • executes targeted commands
                                │           │    • publishes results
                                ▼           ▼
                               <every result/receipt arrives via the same SSE>
```

The orchestrator opens **one** SSH connection (a port-forward), subscribes
to the results topic, fans out 30 directed-pair `send_dm` commands on the
control topic, and tabulates the responses as they stream back. Every
remote action — including the `/direct/send` call on the source node and
the `/direct/events` SSE on the destination node — happens *inside* the
fleet, with no further SSH involved.

### Protocol — Phase A (direct-DM control plane)

Pubsub is used **once**, for the orchestrator's discover announcement.
Every subsequent command and every result envelope flows as a direct
DM. Three payload prefixes keep the routing stateless:

| Prefix | Direction | Payload |
|---|---|---|
| `x0xtest\|cmd\|<b64-json>` | orchestrator → runner | command envelope `{command_id, target_node, action, anchor_aid, params}` |
| `x0xtest\|res\|<b64-json>` | runner → orchestrator | result envelope `{command_id, request_id, node, kind, outcome, agent_id, machine_id, digest_marker, details, ts_ms}` |
| `x0xtest\|hop\|<rid>\|<digest>\|<anchor_aid>\|<payload>` | runner → runner | actual matrix test traffic; receiver DMs a `res` `received_dm` back to the embedded `anchor_aid` |

One-shot pubsub topic:

| Topic | Use |
|---|---|
| `x0x.test.discover.v1` | orchestrator publishes one envelope per harness run carrying the anchor's `agent_id`; runners reply via DM |

Legacy compatibility:

| Topic | Use |
|---|---|
| `x0x.test.control.v1` | runners still subscribed; the orchestrator publishes here when sending a command to its own collocated runner (a self-DM would be refused by the daemon) |
| `x0x.test.results.v1` | the runner falls back to publishing here if a result DM fails irretrievably; the orchestrator subscribes opportunistically |

Actions: `discover`, `send_dm`, `noop_ack`. Result kinds:
`runner_ready`, `discover_reply`, `send_result`, `received_dm`, `ack`,
`error`.

`digest_marker` is a BLAKE3 prefix of the user payload — identical on
the sender and receiver — so the orchestrator can pair every
`send_result` with its `received_dm` independent of timing.

Command, result, and test-hop DMs intentionally **do not** request
`raw_quic_acked` by default — they ride the daemon's default path
(gossip-inbox first, with one retry) so brief raw-QUIC supersedes do not
drop harness control/result traffic. The harness's `send_result` and
`received_dm` envelopes are the application-level delivery proof.

### Deployment

Runners are installed automatically by `e2e_deploy.sh` (after the binary
upload):

```bash
bash tests/e2e_deploy.sh                           # also pushes:
#   /usr/local/bin/x0x-test-runner.py
#   /etc/systemd/system/x0x-test-runner.service
#   /etc/x0x-test-runner.env  (NODE_NAME=…, X0X_API_TOKEN=…)
# and runs:
#   systemctl daemon-reload && systemctl enable --now x0x-test-runner
```

Confirm the runner is healthy on every node:

```bash
for ip in 142.93.199.50 147.182.234.192 65.21.157.229 \
          116.203.101.172 152.42.210.67 170.64.176.102; do
  out=$(ssh -o BatchMode=yes root@$ip \
    "systemctl is-active x0x-test-runner; cat /etc/x0x-test-runner.env" \
    | tr '\n' ' ')
  echo "$ip: $out"
done
# Expect each line to start with "active NODE_NAME=…"
```

### Running the harness

```bash
# Live fleet (any node can be the anchor):
python3 tests/e2e_vps_mesh.py --anchor nyc --discover-secs 30 --settle-secs 60
python3 tests/e2e_vps_mesh.py --anchor sydney --local-port 22601

# Local 3-node smoke (no SSH, no VPS):
bash tests/e2e_local_mesh.sh
```

Reference Phase-A runs (v0.19.17 fleet, fresh deploy):

| Run | Anchor | Sent | Received | Send fails | Receive misses | Wall-clock |
|---|---|---|---|---|---|---|
| 1 | NYC | 29/30 | **30/30** | 1 (real `peer_disconnected`) | 0 | ~70 s |
| 2 | NYC | 29/30 | **30/30** | 1 (real `peer_disconnected`) | 0 | ~70 s |
| 3 | NYC | **30/30** | **30/30** | 0 | 0 | ~28 s |

Phase A's defining property: discover is bulletproof (6/6 every run,
including back-to-back) and **receives are 100%**. The only sends that
ever fail now are those mapped to a real cross-region QUIC supersede;
they surface as the structured `peer_disconnected` error from §6 of
[`docs/design/p2p-timeout-elimination.md`](docs/design/p2p-timeout-elimination.md),
not as harness flakes.

These three back-to-back runs satisfy criterion #1 of
[`docs/design/p2p-timeout-elimination.md`](docs/design/p2p-timeout-elimination.md)
("0/30 send fails and 0/30 receive misses on the live 6-VPS fleet, with no
harness timeout changes") with no harness flakes. The same fleet under
`e2e_vps.sh` reported 11/30 send fails + 14/30 receive misses purely from
SSH-layer noise.

### When to use which

| Scenario | Use |
|---|---|
| Release proof for cross-region DM correctness | **`e2e_vps_mesh.py`** |
| Proving REST/CLI/GUI surfaces all reach every endpoint on the live fleet | `e2e_vps.sh` (covers contacts, MLS, named groups, KV, presence, file transfer, constitution, upgrade — `e2e_vps_mesh.py` only covers the DM matrix at this writing) |
| `/loop`-able recurring fleet health probe | **`e2e_vps_mesh.py`** (~16 s, single SSH tunnel) |
| Investigating SSH-layer / harness flakes themselves | `e2e_vps.sh` |

### Local smoke

`tests/e2e_local_mesh.sh` boots three local daemons (`alice` / `bob` /
`charlie`), spawns a runner per daemon, and runs the orchestrator with
`--no-tunnel` against `alice`'s API. Useful for proving the protocol
without touching the VPS — the full 6-pair matrix completes in ~1 s.

### Extending the protocol

Add new actions in three places:

1. **`tests/runners/x0x_test_runner.py`** — handle the new `action` value
   in `_dispatch_command()` and publish a result with a new `kind`.
2. **`tests/e2e_vps_mesh.py`** — add a queue / route in `ResultsBus` and a
   collector method in the orchestrator.
3. **`docs/parity-matrix.md`** — link the new mesh assertion to its REST
   row so we can see at a glance which capabilities are mesh-tested.

Keep payloads small: every command/result envelope rides the gossip
fabric and counts toward the same drop-detection counters as application
traffic. Tests that need to push large payloads should use
`e2e_stress_gossip.sh` (§3) instead.

---

## 7c. Group + Contacts Dogfood — `e2e_vps_groups.py` / `e2e_dogfood_groups.sh`

**Path:** `tests/e2e_vps_groups.py` (live fleet) +
`tests/e2e_dogfood_groups.sh` (3-instance local) +
`tests/e2e_dogfood_groups.py` (orchestrator shared by both)

Phase B of the dogfood family. Where Phase A (§7b) tests the DM matrix,
Phase B tests **named groups + contacts** entirely through x0x's own
primitives. Every assertion is the result of:

- a direct DM round-trip (orchestrator → runner → orchestrator), or
- a group-message round-trip (anchor posts in a group, members reply
  in the same group, anchor reads `/groups/:id/messages`)

### Scenarios

| Scenario | Assertions per runner |
|---|---|
| Contacts lifecycle | add → list-contains → Trusted → Blocked → remove → list-no-longer-contains (4 assertions) |
| Group create / invite | anchor creates `public_open` group, mints one one-time `x0x://invite/...` link per joiner |
| Group join | each runner joins via its own invite (1/runner) |
| Local roster | each member's own `/groups/:id/members` shows themselves (1/runner) |
| Owner roster convergence | anchor's `/groups/:id/members` includes every joined runner before replies are sent |
| Group send | anchor posts kickoff, each runner posts reply (1+N) |
| Local/owner message cache | each member sees their own body; anchor sees every runner reply |
| Group leave | leaver's `/groups` no longer lists the group (1) |

For 6 fleet runners: up to **50+ blocking assertions per run** depending on
fleet size.

### Cross-member convergence — hard gate

The owner-side convergence check is now blocking. Joiners publish a signed
`MemberJoined` request, the original inviter consumes the one-time invite and
publishes an authority-signed `MemberAdded` commit, and the harness waits for
the anchor roster to converge before replies are sent. The anchor must then see
each member's reply in `/groups/:id/messages`.

### Running

```bash
# Local 3-instance smoke (alice + bob + charlie)
bash tests/e2e_dogfood_groups.sh                   # ~5 s

# Live 6-VPS fleet (after e2e_deploy.sh has installed the runner)
python3 tests/e2e_vps_groups.py --anchor nyc --discover-secs 45
```

### Resilience

Release mode is strict: every expected runner must be discovered and join.
For operational resilience drills, pass `--allow-skips` to validate the
reachable subset while logging skipped nodes distinctly in the JSON report.

---

## 7d. Deploy + Mesh Verification — `e2e_deploy.sh --mesh-verify`

**Path:** `tests/e2e_deploy.sh` (extended with the `--mesh-verify` flag
or `MESH_VERIFY=1` env)

Phase C of the dogfood family. After cross-compiling, uploading the
new `x0xd` binary, restarting the service, and running the existing
24 SSH+curl post-deploy checks, the script optionally fans out into
**both** mesh harnesses sharing a single SSH tunnel:

1. `e2e_vps_mesh.py` — Phase-A 30-pair DM matrix
2. `e2e_vps_groups.py` — Phase-B groups + contacts dogfood

The mesh-verify exit code is added to the deploy fail count, so a
deploy that succeeded at the SSH layer but produces matrix failures
(real cross-region churn) flips the overall result to non-zero.

```bash
# Deploy + integrated mesh verification
bash tests/e2e_deploy.sh --mesh-verify

# Or with a different anchor
MESH_ANCHOR=sydney bash tests/e2e_deploy.sh --mesh-verify

# Skip mesh-verify (default; legacy SSH-only verification)
bash tests/e2e_deploy.sh
```

### What this gives you

- Reduces the deploy verification surface from `4 metrics × 6 nodes = 24
  SSH+curl pairs` to **one** SSH tunnel + protocol DMs
- Turns deploy verification into a real cross-protocol round-trip — DMs,
  named-group create/invite/join/post, contacts CRUD — exercised on the
  freshly-deployed binary
- Surfaces real cross-region issues (e.g. a Helsinki↔Sydney supersede
  burst at deploy time) as the mesh-verify failure rather than as silent
  drift

### What it doesn't yet cover

The binary push itself still needs SSH (cold-start). True
gossip-coordinated rolling deploy is documented in
[`docs/design/x0x-self-update-deploy.md`](docs/design/x0x-self-update-deploy.md)
as a deferred follow-up — it requires daemon-side work (test-mode
trust-key support + an `x0x upgrade publish` CLI verb).

---

## 7e. Fast Pre-Commit Smoke — `e2e_dogfood_local.sh`

**Path:** `tests/e2e_dogfood_local.sh` + `tests/e2e_dogfood_local.py`

Phase D of the dogfood family. The single-fastest end-to-end protocol
test x0x has: boots **two** local daemons (alice + bob), starts one
runner on bob, drives every assertion as a DM via Phase-A protocol.
Targets a ~5 s wall-clock budget so it can run on every commit
without slowing the dev loop.

### Coverage in 19 assertions

- Identity: anchor `/agent` returns 64-hex agent_id
- Contacts: add → list → Trusted → Blocked → remove → list (7 assertions)
- DM round-trip: hop DM `x0xtest|hop|...` from anchor → bob's runner
  echoes `received_dm` back via DM with `digest_marker` preserved
  (2 assertions)
- Named group: create + invite + join + each member posts + each
  member sees own message in cache + leave + list-no-longer-lists
  (10 assertions)

### Running

```bash
# Build + run (pre-commit: cargo build --release && tests/e2e_dogfood_local.sh)
cargo build --release --bin x0xd
bash tests/e2e_dogfood_local.sh                    # ~5 s
```

### Why "Phase D" specifically

The legacy local smoke (`e2e_comprehensive.sh`, §2) takes ~2 minutes
because it walks **every** REST endpoint over curl. Phase D takes ~5 s
because it walks the **protocol** end-to-end with structured DMs and
group operations — the same coverage class real apps exercise. It's
the canonical "did I break the protocol" first-line test.

---

## 9. Live Network Test — `e2e_live_network.sh`

**Path:** `tests/e2e_live_network.sh`
**Scope:** Local node joins the real bootstrap mesh and exercises
bidirectional flows with VPS members (~66 assertions).

Covers:

- Direct messaging local ↔ VPS in both directions
- Pub/sub across the live mesh
- MLS groups with VPS members
- Named-group invites across the network
- Presence discovery from local through VPS

```bash
bash tests/e2e_live_network.sh                     # ~3 min (needs VPS up)
```

---

## 10. LAN Test — `e2e_lan.sh`

**Path:** `tests/e2e_lan.sh`
**Scope:** Two M3 Ultra Mac Studios with RDMA link, used for LAN /
mDNS / cross-host parity testing under realistic-but-controlled conditions.

```bash
bash tests/e2e_lan.sh                              # requires Mac Studio fleet
```

---

## 11. Master Orchestrator — `e2e_proof_runner.sh`

Single-command release proof. Each phase is opt-out-able; `--all`
runs the full battery and produces one machine-readable
`proofs/<timestamp>/proof-report.json` rolling up per-phase status.

```bash
# Full release proof (Mac with VPS + Studios access)
bash tests/e2e_proof_runner.sh --all

# Quick local-only sweep
bash tests/e2e_proof_runner.sh \
  --rust-tests --comprehensive --stress --chrome
```

Phases:

| Flag | Phase |
|------|-------|
| `--rust-tests` | `cargo nextest` workspace |
| `--comprehensive` | `e2e_comprehensive.sh` |
| `--dogfood-local` | `e2e_dogfood_local.sh` (~5 s, §7e) — pre-commit smoke |
| `--dogfood-groups` | `e2e_dogfood_groups.sh` (3-instance, §7c) |
| `--stress` | `e2e_stress_gossip.sh` |
| `--chrome` | `e2e_gui_chrome.mjs` |
| `--dioxus` | `e2e_communitas_dioxus.sh` |
| `--xcuitest` | `xcodebuild ... CommunitasUITests` (macOS only) |
| `--vps` | `e2e_vps.sh` (legacy SSH-per-call) |
| `--vps-mesh` | `e2e_vps_mesh.py` (mesh-relay, §7b — **recommended**) |
| `--vps-groups` | `e2e_vps_groups.py` (mesh groups + contacts, §7c) |
| `--lan` | `e2e_lan.sh` |
| `--all` | everything above |

> VPS phases require deployed runners and `tests/.vps-tokens.env` (or
> `X0X_TOKENS_FILE`). `e2e_vps_groups.py` is strict by default; pass
> `--allow-skips` only for resilience drills where validating a reachable
> subset is intentional.

---

## Health Checks (Quick Status)

```bash
# Quick VPS health
bash .deployment/health-check.sh                   # basic
bash .deployment/health-check.sh --extended        # with peer counts
```

---

## Currently Implemented Capabilities (Tested)

All capabilities below have round-trip coverage in the matrix; see
[`docs/parity-matrix.md`](docs/parity-matrix.md) for per-surface status.

**Network layer**
- QUIC transport (ant-quic 0.27.3 / 0.27.x, ML-DSA-65 / ML-KEM-768)
- ant-quic native first-party LAN discovery + UPnP
- NAT traversal via QUIC extension frames (`draft-seemann-quic-nat-traversal-02`),
  PUNCH_ME_NOW peer-ID hole-punching through coordinator
- MASQUE relay (RFC 9484)
- Address discovery (QUIC extension frames)
- Connection-supersede + lifecycle bus (`/peers/events`)

**Identity**
- MachineID (machine-bound; equals ant-quic PeerId)
- AgentID (portable, importable)
- UserID (optional, opt-in human identity)
- AgentCertificate binding agent ↔ user
- 4-word speakable identities (`four-word-networking`)
- `GET /introduction` with trust-gated service visibility

**Trust & contacts**
- ContactStore with `TrustLevel` and `IdentityType`
- TrustEvaluator (5 decision paths including Pinned)
- Machine pinning enforcement on every announcement

**Bootstrap**
- 6 hardcoded global nodes (port 5483)
- 3-round retry with exponential backoff
- Bootstrap cache enrichment from connections + presence beacons
- Quality-scored cache persistence

**Health & diagnostics**
- `GET /health`, `GET /agent`, `GET /agent/card`
- `GET /diagnostics/connectivity`
- `GET /diagnostics/gossip` (drop-detection counters at every pipeline stage)
- `GET /diagnostics/dm` (DM send/receive counters + per-peer RTT / path / lag state, this release)
- `/peers/events` SSE — connection lifecycle bus (Established / Replaced / Closing / Closed / ReaderExited)
- `dm.trace` correlation log (sender + receiver lines share a BLAKE3 `digest` field)
- 60-second NodeStatus journal snapshots

**Gossip**
- Pub/sub via epidemic broadcast
- CRDT task lists (OR-Set + LWW + RGA)
- CRDT KV stores with access control
- Presence beacons + FOAF discovery (Phi-Accrual lite, trust-scoped)
- Anti-entropy sync

**Encrypted groups**
- MLS group create / add / remove / re-add
- ChaCha20-Poly1305 encrypt / decrypt
- Welcome messages for new members

**Named groups**
- Create / invite / join / leave / rejoin
- Display names
- Policy (roles, bans)
- DHT-free discovery (social, tag shards, presence-social browsing)

**File transfer**
- Send / accept / reject offers
- Progress reporting

**Self-update**
- ML-DSA-65-signed release manifests
- Symmetric gossip propagation on `x0x/releases` topic
- GitHub fallback poll
- Atomic binary replacement with rollback
- Staged deterministic rollout

---

## Future Test Areas

These are **planned**, not yet wired into the proof runner:

- **Performance benchmarks** — message throughput, cross-continent latency,
  CRDT convergence time, memory under load
- **Stress amplification** — 1000s of concurrent tasks, 100s of agents
- **Chaos engineering** — random node failures, latency injection, packet
  loss, clock skew
- **Security testing** — explicit ML-DSA forgery / ML-KEM tamper /
  replay / Sybil suites (currently relies on `cargo audit` + crypto unit
  tests)

---

## Troubleshooting

### Service not running
```bash
ssh root@<IP> 'systemctl status x0xd'
ssh root@<IP> 'journalctl -u x0xd -n 50'
```

### Health endpoint unreachable
```bash
ssh root@<IP> 'curl http://127.0.0.1:12600/health'
ssh root@<IP> 'ss -tlpn | grep 12600'
```

### QUIC port not bound
```bash
ssh root@<IP> 'ss -ulpn | grep 5483'
ssh root@<IP> 'journalctl -u x0xd | grep "Bind address"'
```

### No peer connections
```bash
ssh root@<IP> 'journalctl -u x0xd --since "10 minutes ago" | grep -i connect'
```

### Drop detection
If `e2e_stress_gossip.sh` reports drops, query the live counter directly:

```bash
curl -s -H "Authorization: Bearer $TOKEN" \
  http://127.0.0.1:12700/diagnostics/gossip | jq .
```

The `decode_to_delivery_drops` field localises drops to the
network-recv → subscriber-channel hop. Per-pid logs are produced when
`X0X_LOG_DIR` is set.

For DM-specific issues (matrix-receive misses, unexplained timeouts) query
`/diagnostics/dm` instead — it exposes per-peer counters
(`outgoing_send_total`, `outgoing_send_failed`, `subscriber_channel_lagged`,
`subscriber_channel_closed`) plus per-peer state (`avg_rtt_ms`,
`last_send_ms_ago`, `preferred_path`):

```bash
curl -s -H "Authorization: Bearer $TOKEN" \
  http://127.0.0.1:12700/diagnostics/dm | jq .
# Or via CLI:
x0x diagnostics dm
```

### Mesh harness troubleshooting

`e2e_vps_mesh.py` reports `discover missing: [...]` — the runner is not
publishing on the results topic. Check, in order:

```bash
# 1. Is the runner alive?
ssh root@<node_ip> 'systemctl is-active x0x-test-runner'

# 2. Is its config pointing at a readable token?
ssh root@<node_ip> 'cat /etc/x0x-test-runner.env'

# 3. Has the runner subscribed to the control topic?
ssh root@<node_ip> 'journalctl -u x0x-test-runner -n 30 --no-pager'
# Expect: "subscribed to x0x.test.control.v1"

# 4. Is gossip flowing?
curl -s -H "Authorization: Bearer $TOKEN" \
  http://127.0.0.1:12600/diagnostics/gossip | jq .stats
```

If discovery works but `send_dm` results don't return, look at
`/diagnostics/dm` on the *sender* side and the receiver's `dm.trace`
INFO log lines (search by `digest_marker` from the orchestrator output to
correlate sender ↔ receiver).

---

## CI Integration

`.github/workflows/`:

- **ci.yml** — fmt, clippy, nextest, doc (symlinks `ant-quic` and
  `saorsa-gossip` from `.deps/`)
- **security.yml**`cargo audit`
- **release.yml** — multi-platform builds (7 targets), macOS code
  signing, ML-DSA-65 manifest signing, `crates.io` publish
- **build.yml** — PR validation
- **sign-skill.yml** — GPG-signs `SKILL.md`

The XCUITest target imports cleanly on Linux runners (`XCUITEST_SKIP=1`)
and only actually executes on macOS.

---

## Contributing

To add new tests:

1. Pick the right surface — REST/CLI parity goes in
   `tests/api_coverage.rs` or `tests/parity_cli.rs`; GUI in
   `tests/e2e_gui_chrome.mjs`; Dioxus in
   `../communitas/communitas-dioxus/tests/e2e/`; Apple in
   `CommunitasGoldenPathsUITests.swift`; cross-region matrix in
   `tests/e2e_vps_mesh.py` (preferred) or `tests/e2e_vps.sh` (legacy).
2. Update the corresponding row in [`docs/parity-matrix.md`]docs/parity-matrix.md
   from 🟡 / ❌ to ✅ once the test is green.
3. Wire the test into `e2e_proof_runner.sh` if it should be part of the
   release proof.
4. Document expected behaviour in the test header.
5. Run locally before pushing — every CI green light corresponds to a
   `proofs/<timestamp>/` artefact bundle.

Mesh-harness specific:

6. New protocol commands go through the three-place edit in §7b
   ("Extending the protocol"). Keep result envelopes small.
7. Bumping the runner script means re-running `tests/e2e_deploy.sh`
   (the deploy step pushes both the daemon binary *and* the runner).

---

## Support

- GitHub: https://github.com/saorsa-labs/x0x
- Email: david@saorsalabs.com
- Parity matrix: [`docs/parity-matrix.md`]docs/parity-matrix.md
- Architecture: [`CLAUDE.md`]CLAUDE.md