x0x 0.19.46

Agent-to-agent gossip network for AI systems — no winners, no losers, just cooperation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
# x0x exec — remote command execution over the mesh (Tier 1)

**Status:** Tier-1 implementation in progress — protocol, ACL, daemon routing, REST/CLI, diagnostics, and JSONL audit are wired behind an opt-in ACL.
**Filed:** 2026-05-01
**Owner:** dev team (assigned by lead)
**Trigger:** Replace SSH-per-call in `e2e_vps.sh` with a daemon-native execution path. Generalise the existing `tests/runners/x0x_test_runner.py` control-plane primitive into a first-class daemon feature.
**Companion doc:** [`x0x-terminal.md`](x0x-terminal.md) (Tier 2, deferred)

---

## 1. Goal

Allow a local x0x agent to run a strictly-allowlisted, non-interactive command on a remote x0x daemon, with stdout/stderr/exit-code streamed back over the existing direct-message channel.

**In scope (this doc):** non-interactive command execution. No PTY, no shell interpolation, no signal forwarding from the client beyond cancel.

**Out of scope:** interactive terminal, full shell, file editing, port forwarding. See `x0x-terminal.md` for the Tier 2 design that may or may not follow.

**Why now:** the existing `e2e_vps.sh` is dominated by SSH RTT to Singapore/Sydney (~4 s per call). The mesh-driven `e2e_vps_mesh.py` solved this for DM testing by running through x0x's own pubsub. `x0x exec` is the same trick generalised — drop the SSH dependency from the diagnostic and management paths too.

## 2. Constraints (locked by lead, 2026-05-01)

These are not open for negotiation in v1:

1. **ACL keys on (AgentId, MachineId) pairs.** A stolen agent key on a different machine is rejected at the trust layer before reaching the ACL check.
2. **Strict argv allowlist.** Each ACL entry specifies the exact argv vector the agent may execute. Limited templating (see §6.2). No regex, no shell globs.
3. **No shell interpolation, ever.** The remote daemon calls `tokio::process::Command::new(argv[0]).args(&argv[1..])`. Never `/bin/sh -c`. Never `bash -c`.
4. **Hard caps on output, duration, concurrency.** Cap breaches emit a `Warning` frame to the requester *and* are logged to `/diagnostics/exec` so an operator interrogating the remote machine sees them.
5. **Audit trail in a CRDT TaskList** (in addition to a local log file). Append-only, signed, replicated to the requesting agent.
6. **Client disconnect → SIGTERM** the remote process within 5 s, then SIGKILL.
7. **ACL lives at `/etc/x0x/exec-acl.toml`** and changes require a full daemon restart. No hot-reload.

## 3. Architecture

```text
local x0x CLI
  ↓ HTTP (POST /exec/run)
local x0xd
  ↓ signed/encrypted gossip-DM frames (ML-KEM + ChaCha20-Poly1305, ACK'd)
remote x0xd
  ↓ trust check: verified sender + TrustDecision::Accept
  ↓ ACL check: AgentId + MachineId + argv match
  ↓ tokio::process::Command::new(argv[0]).args(&argv[1..])
remote child process
```

No new transport. No new crypto. Reuses the existing direct-message envelope, which already gives us:

- ML-DSA-65 signature verification of the requester's `AgentId`.
- Trust-evaluated delivery (`TrustEvaluator`).
- ACK'd, in-order delivery on a per-peer channel.
- `/diagnostics/dm` visibility for the underlying transport.

The exec service is a new direct-message *kind* that the existing DM dispatcher routes to a new handler.

### 3.1 Why direct DMs and not a new QUIC stream

For Tier 1, the volume is small (≤16 MB stdout per session, ≤32 concurrent sessions) and latency is dominated by command execution time, not framing overhead. The existing DM path works. If profiling later shows the ACK-per-frame overhead is the bottleneck for streaming stdout, we can split: control frames over DMs, bulk stdout over a side QUIC unidirectional stream (mirroring how file transfer works). Don't pre-optimise.

The Tier 2 interactive path *will* need bidirectional QUIC streams for backpressure correctness — that's the whole point of separating the two tiers.

## 4. Wire protocol

Exec uses a stable payload prefix (`x0x-exec-v1\\0`) carried inside the existing encrypted DM plaintext. The prefix lets `dm_inbox` route exec frames before generic `/direct/events` fan-out without a backwards-incompatible envelope change. The bytes after the prefix are a bincode-encoded `ExecFrame`:

```rust
#[derive(Serialize, Deserialize)]
pub enum ExecFrame {
    /// Client → server: kick off a session.
    Request {
        request_id: Uuid,           // client-allocated, unique per local agent
        argv: Vec<String>,          // tokens, no shell metacharacters
        stdin: Option<Vec<u8>>,     // ≤ max_stdin_bytes (server cap)
        timeout_ms: u32,            // clamped to max_duration_secs
        cwd: Option<String>,        // v1 rejects requester-controlled cwd
    },

    /// Server → client: process spawned, here is its OS pid.
    Started { request_id: Uuid, pid: u32 },

    /// Server → client: stdout/stderr chunk. seq is monotonic per stream.
    Stdout { request_id: Uuid, seq: u32, data: Vec<u8> },
    Stderr { request_id: Uuid, seq: u32, data: Vec<u8> },

    /// Server → client: a soft event the operator should see (cap warning,
    /// truncation, etc). Does not terminate the session.
    Warning { request_id: Uuid, kind: WarningKind, message: String },

    /// Server → client: terminal frame. Always the last frame for a session.
    Exit {
        request_id: Uuid,
        code: Option<i32>,          // None if killed by signal
        signal: Option<i32>,        // Unix signal number, or None
        duration_ms: u64,
        stdout_bytes_total: u64,    // including any truncated bytes
        stderr_bytes_total: u64,
        truncated: bool,            // true if any cap was hit
        denial_reason: Option<DenialReason>,  // Some(_) → request never ran
    },

    /// Client → server: renew the short session lease.
    LeaseRenew { request_id: Uuid },

    /// Client → server: cancel an in-flight session.
    Cancel { request_id: Uuid },
}

#[derive(Serialize, Deserialize)]
pub enum WarningKind {
    StdoutCapHit,                   // bytes-per-stream limit reached
    StderrCapHit,
    DurationApproachingCap,         // emitted at warn_duration_secs
    StdoutApproachingCap,           // emitted at warn_stdout_bytes
}

#[derive(Serialize, Deserialize)]
pub enum DenialReason {
    ExecDisabled,                   // [exec].enabled = false
    AgentMachineNotInAcl,           // (agent_id, machine_id) pair has no entry
    ArgvNotAllowed,                 // argv didn't match any allowlist entry
    StdinTooLarge,
    TimeoutTooLarge,
    CwdNotAllowed,
    ConcurrencyLimitReached,
    ShellMetacharInArgv,            // an argv token contained a forbidden char
}
```

Frame ordering on a single direct channel is preserved by the existing DM path (per-peer mpsc), so `seq` exists only for client-side correlation across a possible reconnect (future-proofing — not needed for v1 but free to include).

## 5. Authorization flow

Server-side, for every received `ExecFrame::Request`:

1. **MachineId check.** The DM was delivered over a QUIC connection; `network.rs` already binds the connection to a verified peer `MachineId`. If the connection's MachineId mismatches the agent's announced MachineId, reject at the DM layer (existing `TrustEvaluator::RejectMachineMismatch` path) — never reaches exec.
2. **AgentId check.** The DM envelope is ML-DSA-65-signed by the sender's agent key; this is verified in the existing DM path. If signature fails → existing path rejects. The exec handler trusts that the AgentId on a delivered DM is authentic.
3. **ACL lookup.** Find the first `[[exec.allow]]` entry matching `(agent_id, machine_id)`. If none → respond with `Exit { denial_reason: Some(AgentMachineNotInAcl), ... }`. Tier 1 returns structured denial reasons on-wire for operator/testability; every denial is also recorded in the remote JSONL audit log and `/diagnostics/exec`.
4. **argv match.** Walk the matched entry's commands, accept the first match (§6.2). On no match → `ArgvNotAllowed`.
5. **Metachar reject.** Even though we don't shell out, every argv token is checked for forbidden characters: `;`, `|`, `&`, `>`, `<`, `` ` ``, `$`, newline, null byte. Catches operator confusion and gives one extra layer if the allowlist is mis-authored. On hit → `ShellMetacharInArgv`.
6. **Caps.** stdin size, timeout, and v1 cwd rejection. On breach → corresponding `DenialReason`.
7. **Concurrency check.** If active sessions for this AgentId ≥ `max_concurrent_per_agent`, or total ≥ `max_concurrent_total`, deny.
8. **Spawn.** `tokio::process::Command` with `kill_on_drop(true)`, `stdin/stdout/stderr` piped, `current_dir(default_cwd)` when configured, environment scrubbed to a minimal allowlist (see §6.3).
9. **Stream.** Read stdout/stderr concurrently, send `Stdout`/`Stderr` frames as data arrives. Watch caps. Emit `Warning` at the warn thresholds.
10. **Terminate.** On exit (or cancel, or duration cap), send `Exit` with full stats. Audit-log the close.

The on-wire denial response is **always** the same shape (an `Exit` frame with `denial_reason: Some(_)`); only the local audit log distinguishes the cases.

## 6. ACL file format

`/etc/x0x/exec-acl.toml` (Linux) / `/usr/local/etc/x0x/exec-acl.toml` (macOS). Override path with `x0xd --exec-acl <path>` (intended for tests).

If file is missing → exec is disabled (`enabled = false` is the implicit default).
If file is present but malformed → daemon **refuses to start**. Fail-closed.

### 6.1 Schema

```toml
[exec]
enabled = false                    # default; must be explicit-true to enable

# Hard caps. Server-side enforced. Requests exceeding any cap are denied
# (for stdin/timeout) or truncated (for stdout/stderr).
max_stdout_bytes        = 16_777_216     # 16 MB
max_stderr_bytes        = 16_777_216     # 16 MB
max_stdin_bytes         = 1_048_576      # 1 MB
max_duration_secs       = 300            # 5 min
max_concurrent_per_agent = 4
max_concurrent_total     = 32

# Warning thresholds. Emit a Warning frame and log to /diagnostics/exec
# when crossed. Process keeps running.
warn_stdout_bytes       = 8_388_608      # 8 MB (50% of cap)
warn_duration_secs      = 60             # 20% of default cap

# Default working directory if a request omits cwd. If unset and request
# omits cwd, the process inherits the daemon's cwd.
default_cwd             = "/var/lib/x0x"

# Audit log path. Always written. Required.
audit_log_path          = "/var/log/x0x/exec.log"

# Optional: CRDT TaskList ID to mirror audit entries to. If set, every
# exec event (Request, Started, Exit, Denial) appends a TaskItem to this
# list, replicated to the requesting agent automatically.
audit_tasklist_id       = "01HZX..."     # ULID of an existing TaskList

# --- ACL entries below ---

[[exec.allow]]
description  = "ops-laptop@admin (David)"   # human label, appears in audit
agent_id     = "abc123...64hex"
machine_id   = "def456...64hex"
# Optional: per-entry overrides for caps. None → use [exec] caps.
# max_duration_secs = 30

# Strict argv allowlist for THIS (agent, machine) pair.
# Each [[exec.allow.commands]] entry is one allowed call shape.
[[exec.allow.commands]]
argv = ["systemctl", "status", "x0xd"]      # exact match only

[[exec.allow.commands]]
argv = ["journalctl", "-u", "x0xd", "-n", "<INT>"]
# <INT> is a hardcoded template token: matches a positive integer
# (regex equivalent: ^[1-9][0-9]{0,5}$, max 999_999). No user-supplied
# regexes. Other tokens: <URL_PATH> = ^/[A-Za-z0-9/_.-]{0,256}$ (no "..").

[[exec.allow.commands]]
argv = ["curl", "-s", "http://127.0.0.1:12600<URL_PATH>"]

[[exec.allow.commands]]
argv = ["cat", "/etc/x0x/config.toml"]

# Multiple entries per agent are fine; first match wins.
[[exec.allow]]
description = "ci-runner@github (read-only)"
agent_id    = "ghi789..."
machine_id  = "jkl012..."

[[exec.allow.commands]]
argv = ["x0x", "diagnostics", "dm"]

[[exec.allow.commands]]
argv = ["x0x", "diagnostics", "gossip"]
```

### 6.2 argv matching algorithm

1. Lengths must match. `argv = ["a", "b"]` does not match a 3-token request.
2. Walk left-to-right. For each `(allow_token, request_token)`:
   - If `allow_token` is a literal string → must equal `request_token` byte-for-byte.
   - If `allow_token` is `<INT>``request_token` must match `^[1-9][0-9]{0,5}$` (1–999_999).
   - If `allow_token` is `<URL_PATH>``request_token` must match the URL_PATH regex above. If `<URL_PATH>` appears as a *suffix* of a literal token (e.g. `"http://127.0.0.1:12600<URL_PATH>"`), the request token must start with the literal prefix and the suffix must match the URL_PATH regex.
3. Only `<INT>` and `<URL_PATH>` exist as templates in v1. Any other `<...>` token in the ACL is a parse error (refuses daemon startup).
4. After argv match, every request token is **independently** checked against the metachar blacklist (§5.5). Defence in depth: even if `<URL_PATH>` regex has a bug, metachars are rejected separately.

This is deliberately restrictive. Operators wanting parameterised behaviour beyond `<INT>` / `<URL_PATH>` should write a wrapper script on the remote and allowlist *that* script, not extend the templating.

### 6.3 Environment scrubbing

The child process inherits a fixed minimal environment:

```text
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOME=<daemon's home>
LANG=C.UTF-8
LC_ALL=C.UTF-8
```

No other env vars from the daemon's environment leak through. The request cannot supply env vars in v1.

## 7. Hard cap behaviour

| Cap | Behaviour |
|---|---|
| `max_stdout_bytes` | Stop forwarding once total reaches cap. Send `Warning { kind: StdoutCapHit }`. Continue counting `stdout_bytes_total` for the final `Exit` frame, but discard data. Process keeps running. |
| `max_stderr_bytes` | Symmetric to stdout. |
| `max_stdin_bytes` | Request denied with `StdinTooLarge` before spawn. |
| `max_duration_secs` | At T=cap-5s, send SIGTERM. At T=cap, send SIGKILL. Send `Exit` with `signal: Some(SIGKILL)`. |
| `max_concurrent_per_agent` | Request denied with `ConcurrencyLimitReached`. |
| `max_concurrent_total` | Same. |
| `warn_stdout_bytes` | At first crossing, emit `Warning { kind: StdoutApproachingCap }`. Increment a counter visible in `/diagnostics/exec`. |
| `warn_duration_secs` | Same shape. |

Cap counters and last-N warning events are exposed at `GET /diagnostics/exec` (§9) so an operator interrogating a remote node sees a steady stream of these even if the requester ignores `Warning` frames.

## 8. Cancellation

Three triggers, one path:

1. **Client sends `ExecFrame::Cancel`** — the local CLI emits this on Ctrl-C, on `x0x exec --cancel <id>`, or on local timeout.
2. **Local x0xd loses its API client** — the HTTP/SSE/WebSocket connection from the CLI to local x0xd drops. Local x0xd auto-emits `Cancel` for every in-flight request from that client.
3. **Lease expiry** — the local daemon sends `LeaseRenew` while the API caller is connected. If renewals stop for the lease window, the remote session is cancelled. This covers gossip-DM delivery paths where no stable direct QUIC lifecycle event is visible.

On cancel, server: SIGTERM the child, wait 5 s, SIGKILL if still alive, send `Exit` (which the requester may not see if the channel is dead; that's fine — the request_id is locally garbage-collected after a 60 s grace period).

`tokio::process::Child::kill_on_drop(true)` is set as a final backstop in case the session task panics.

## 9. Diagnostics

`GET /diagnostics/exec` (mirrors `/diagnostics/dm`, `/diagnostics/gossip`). Bearer-token-protected. Returns:

```json
{
  "enabled": true,
  "active_sessions": 2,
  "active_per_agent": { "abc123...": 2 },
  "totals": {
    "requests_received": 18421,
    "requests_allowed": 18402,
    "requests_denied": 19,
    "denial_breakdown": {
      "agent_machine_not_in_acl": 4,
      "argv_not_allowed": 11,
      "shell_metachar_in_argv": 1,
      "concurrency_limit_reached": 3
    },
    "cap_breaches": {
      "stdout": 2,
      "stderr": 0,
      "duration": 1
    },
    "cap_warnings": {
      "stdout_approaching": 7,
      "duration_approaching": 14
    }
  },
  "recent_warnings": [
    {
      "ts": "2026-05-01T14:22:11Z",
      "kind": "StdoutApproachingCap",
      "agent_id": "abc123...",
      "request_id": "...",
      "argv_summary": "journalctl -u x0xd -n 999999",
      "bytes_at_warn": 8_400_000
    }
  ],
  "acl_summary": {
    "loaded_from": "/etc/x0x/exec-acl.toml",
    "loaded_at": "2026-05-01T09:00:00Z",
    "allow_entry_count": 4,
    "command_entry_count": 17
  }
}
```

The full ACL contents — agent IDs, machine IDs, full argvs — are deliberately *not* in the diagnostics output. They live in the file the operator owns.

CLI: `x0x diagnostics exec`.

## 10. Audit trail

Two layers, both required:

### 10.1 Local file audit

Append-only JSONL at `audit_log_path`. One line per event. Events:

```json
{"ts":"...","event":"request","request_id":"...","agent_id":"...","machine_id":"...","argv":["..."],"matched_acl":"ops-laptop@admin (David)","stdin_bytes":0,"timeout_ms":30000}
{"ts":"...","event":"started","request_id":"...","pid":12345}
{"ts":"...","event":"warning","request_id":"...","kind":"StdoutApproachingCap","bytes":8400000}
{"ts":"...","event":"exit","request_id":"...","code":0,"signal":null,"duration_ms":423,"stdout_bytes":1024,"stderr_bytes":0,"truncated":false}
{"ts":"...","event":"denial","request_id":"...","agent_id":"...","machine_id":"...","argv":["..."],"reason":"ArgvNotAllowed"}
```

File is opened with `O_APPEND`, fsynced on each entry. Operator-rotatable via standard `logrotate`. The daemon does not rotate.

### 10.2 CRDT TaskList audit (v1.1 waiver)

Tier 1 makes the local JSONL file the authoritative audit. The ACL field `audit_tasklist_id` is parsed and retained so deployed configs will not need another schema change, but the CRDT TaskList mirror is explicitly waived from v1 acceptance and deferred to v1.1.

Planned `TaskItem` schema for the v1.1 mirror:

```text
title: "exec [allowed|denied] <argv_summary> ← <agent_short>@<machine_short>"
state: Done (denials never enter Empty/Claimed; they are born Done)
metadata (LWW-Register):
  request_id: ULID
  ts_iso: "..."
  argv: ["..."]
  exit_code: i32 | null
  duration_ms: u64
  stdout_bytes: u64
  stderr_bytes: u64
  truncated: bool
  denial_reason: string | null
  warnings: ["StdoutApproachingCap", ...]
```

When implemented, the TaskList will provide a queryable, replicated, signed audit timeline. If the configured TaskList does not exist or is unreachable, exec will keep working — the local file remains authoritative and the CRDT mirror is best-effort. A future `audit_tasklist_unreachable` counter should show up in `/diagnostics/exec`.

For v1, operators must use `audit_log_path` and `/diagnostics/exec`; `audit_tasklist_id` has no runtime effect beyond being exposed through config parsing.

## 11. CLI

```bash
# Synchronous one-shot. Stdout/stderr stream to local terminal.
# Exit code = remote exit code (or 255 on transport / denial / cap kill).
x0x exec <agent> -- <argv...>

# With timeout (clamped to remote max_duration_secs).
x0x exec <agent> --timeout 30 -- journalctl -u x0xd -n 100

# With stdin from a file.
x0x exec <agent> --stdin-file payload.bin -- some-tool --consume-stdin

# Cancel an in-flight request.
x0x exec <agent> --cancel <request_id>

# List local in-flight requests (this client only).
x0x exec sessions

# Server-side diagnostics on a remote machine.
x0x diagnostics exec                 # local daemon's exec stats
x0x exec <agent> -- x0x diagnostics exec   # remote daemon's exec stats
                                          # (requires that argv to be allowlisted)
```

Agent identifier accepts: full hex `agent_id`, contact short-name (existing contact-store lookup), or VPS host short-name (e.g. `saorsa-7` → resolved via existing contact-name mapping).

## 12. REST API

| Method | Path | CLI | Notes |
|---|---|---|---|
| POST | `/exec/run` | `x0x exec` | Body: `{ agent_id, argv, stdin_b64?, timeout_ms?, cwd? }`. Current implementation blocks and returns the aggregated result `{ code, signal, stdout_b64, stderr_b64, duration_ms, denial_reason, ... }`; SSE streaming can be added later without changing the wire frames. |
| POST | `/exec/cancel` | `x0x exec --cancel` | Body: `{ request_id }`. |
| GET | `/exec/sessions` | `x0x exec sessions` | List sessions originated by this local daemon. |
| GET | `/diagnostics/exec` | `x0x diagnostics exec` | Server-side diagnostics. |

All four are added to `src/api/mod.rs` endpoint registry. `x0x routes` reflects them automatically. API manifest version bumps; total endpoint count rises from 124 to 128.

## 13. Daemon flags & config

```text
x0xd --exec-acl <PATH>     Override default /etc/x0x/exec-acl.toml.
                           Intended for tests; production should use the default.
```

No new config knobs — exec is gated entirely by the ACL file's presence and `[exec].enabled`.

`x0xd --check` is extended to validate the ACL file (parse, schema, agent/machine ID hex format) without starting the daemon.

`x0xd --doctor` reports exec status: enabled/disabled, ACL path, allow-entry count.

## 14. Files to create

```text
src/exec/mod.rs                    # public API: run_remote_exec(...) on Agent
src/exec/protocol.rs               # ExecFrame enum, bincode coding
src/exec/acl.rs                    # TOML schema, parser, argv matcher, fail-closed loader
src/exec/service.rs                # server: dispatcher, spawn, stream, cap enforcement
src/exec/client.rs                 # client: send Request, collect frames, surface to caller
src/exec/audit.rs                  # local JSONL log + optional CRDT TaskList bridge
src/exec/diagnostics.rs            # ExecDiagnostics counter struct + recent_warnings ring buffer
src/api/exec_handlers.rs           # POST /exec/run (sync + SSE), /exec/cancel, /exec/sessions, /diagnostics/exec
src/cli/commands/exec.rs           # x0x exec, x0x exec --cancel, x0x exec sessions
                                   # x0x diagnostics exec
tests/exec_acl_unit.rs             # ACL parsing, argv matching (positive + negative)
tests/exec_caps_unit.rs            # cap-enforcement state machine
tests/exec_integration.rs          # local 2-daemon end-to-end, denial paths, cancel
tests/e2e_exec.sh                  # local 3-daemon: alice→bob allowed, charlie→bob denied
docs/exec.md                       # operator guide: deploying ACL, examples, troubleshooting
```

## 15. Files to modify

```text
src/dm.rs                          # add DmKind::Exec variant; route to exec::service
src/api/mod.rs                     # register 4 new endpoints in EndpointRegistry
src/cli/mod.rs                     # register exec subcommand
src/lib.rs                         # re-export Agent::run_remote_exec
src/bin/x0xd.rs                    # parse --exec-acl flag; load ACL on startup;
                                   # fail-closed if malformed; wire ExecService
docs/design/api-manifest.json      # add 4 endpoints; bump endpoint count
CLAUDE.md                          # mention /etc/x0x/exec-acl.toml + 4 endpoints
TEST_SUITE_GUIDE.md                # document e2e_exec.sh
```

Optional follow-up (separate PR after exec lands):

```text
tests/e2e_vps.sh                   # convert SSH-based curl probes to x0x exec
                                   # to demonstrate the testing-ergonomics win
```

## 16. Test plan

Every PR landing this feature must include all of:

- **Unit, ACL** (`exec_acl_unit.rs`):
  - well-formed file → loads
  - missing required field → daemon refuses to start
  - invalid hex in agent_id/machine_id → refuses to start
  - unknown `<TEMPLATE>` token → refuses to start
  - exact-match argv accepts only exact match
  - `<INT>` accepts `1`, `12345`, rejects `0`, `-1`, `1e5`, `1.0`, ``5`a``
  - `<URL_PATH>` accepts `/health`, `/foo/bar`, rejects `/..`, `/foo bar`, `/foo;ls`
  - shell metachar in any token rejects regardless of allowlist
  - length mismatch rejects

- **Unit, caps** (`exec_caps_unit.rs`):
  - stdout cap → truncates, emits `Warning`, sets `truncated: true`, process unkilled
  - duration cap → SIGTERM sent at T-5s, SIGKILL at T, `Exit { signal: Some(SIGKILL) }`
  - concurrent cap → 5th request from same agent denied with `ConcurrencyLimitReached`
  - warn threshold crossed once → counter increments by 1, even if multiple Stdout chunks straddle it

- **Integration** (`exec_integration.rs`):
  - alice → bob allowed argv: stdout/stderr captured, exit_code matches, audit log line written
  - alice → bob denied argv: `Exit { denial_reason: Some(ArgvNotAllowed) }`, no process spawned
  - alice → charlie denied (no ACL entry): `AgentMachineNotInAcl`
  - bob client disconnect mid-exec: bob's child receives SIGTERM within 5s
  - alice cancel: child SIGTERMed, `Exit` frame received with `signal: Some(SIGTERM)`
  - audit TaskList: configured ID receives a TaskItem on every event; alice (the requester) sees it via existing CRDT sync

- **E2E** (`tests/e2e_exec.sh`, local, no VPS):
  - 3-daemon (alice, bob, charlie). Bob has ACL allowing alice for `["echo", "<INT>"]` and `["printenv", "PATH"]`.
  - `x0x exec bob -- echo 42` → stdout `42`, exit 0
  - `x0x exec bob -- echo hello` → denied (template mismatch)
  - `x0x exec bob -- echo 99 ; ls` → denied (metachar in token, even though we don't shell)
  - `x0x exec bob -- rm -rf /` → denied (not in allowlist)
  - `x0x exec bob -- printenv PATH` → returns `/usr/local/sbin:...` (env scrub verified)
  - charlie tries same allowed argv → denied (`AgentMachineNotInAcl`)
  - `x0x diagnostics exec` on bob shows the right counter increments

- **Build-validator**: 0 warnings, fmt clean, clippy `-D warnings` clean.

- **Test-runner**: full nextest suite remains 100 % passing; new tests add ≥ 30 assertions.

## 17. Acceptance criteria

Tier 1 ships when **all** of these hold:

1. `cargo nextest run --all-features --workspace` green, with the new tests counted.
2. `tests/e2e_exec.sh` green on a local mesh, two consecutive runs.
3. `x0xd --check --exec-acl /tmp/bad.toml` exits non-zero with a specific parse error for each of: missing required field, invalid hex, unknown template token.
4. `x0xd` with no `/etc/x0x/exec-acl.toml` starts cleanly with `enabled=false` reflected in `/diagnostics/exec`.
5. A request that breaches `max_stdout_bytes` is reflected in `/diagnostics/exec` `cap_breaches.stdout` counter on the *remote* node.
6. Killing the local CLI mid-exec causes the remote child process to receive SIGTERM within 5 s (verified by inspecting `ps` on the remote).
7. CRDT TaskList audit mirroring is either implemented or explicitly waived to v1.1. For v1 this document records the waiver and JSONL remains authoritative.
8. `docs/exec.md` exists with a working "deploy this ACL on saorsa-7, run this command from your laptop" example using real but redacted IDs.

## 18. Non-goals (explicit)

- **No PTY, no interactivity.** Use Tier 2 if it ever ships.
- **No environment variables in requests.** Add later if a concrete need appears.
- **No file upload.** Use existing `/files/send`.
- **No port forwarding.** Out of scope; possibly never in scope.
- **No PAM / OS password auth.** The trust boundary is the x0x identity + machine pin, not OS credentials.
- **No hot-reload of the ACL.** Restart the daemon. This is intentional — hot-reload of a security policy is a footgun.
- **No "any" wildcard in argv.** If an operator wants flexible commands, they wrap them in a script and allowlist that.
- **No regex in the ACL.** `<INT>` and `<URL_PATH>` are the only templates. New templates require a code change and a test.
- **No SSH replacement for end users.** This feature is for the testing fleet and the operator. End-user shell access is a separate product question.

## 19. Open questions (defer until after first PR lands)

- Is local-file JSONL audit + optional CRDT TaskList the right shape, or should the CRDT be mandatory? Land file-only first, see how operators use it.
- Should `Exit` frames carry the *first 1 KiB of stdout* even when the channel was healthy, so the audit log has self-contained context without a second roundtrip? Possibly yes. Needs a `first_stdout_preview` field; mark for v1.1.
- Does the CRDT audit TaskList need a separate MLS group per (requester ↔ remote) pair, or does the existing daemon-wide group suffice? Existing group is fine for v1; revisit if multiple operators share a remote.

## 20. Cross-references

- Existing precedent: `tests/runners/x0x_test_runner.py` is a primitive form of this. The systemd service file `tests/runners/x0x-test-runner.service` already lives on every VPS.
- Trust model: `docs/primers/trust.md`, `src/trust.rs::TrustEvaluator`.
- DM pipeline this rides on: `docs/design/dm-over-gossip.md`, `src/dm.rs`, `src/direct.rs`.
- Diagnostics pattern: `docs/diagnostics.md`, `src/api/handlers.rs::diagnostics_*`.
- Tier 2 deferred design: [`x0x-terminal.md`]x0x-terminal.md.