room-sandbox 1.5.0

Dockerized multi-agent sandbox for room
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
# room-sandbox — Design Document

## Overview

`room-sandbox` is a Rust CLI tool (`cargo install room-sandbox`) that sets up isolated, Dockerized multi-agent development environments using `room` and `room-ralph`. Each agent gets its own clone of a target repo, runs in a shared container, and communicates via a local room daemon.

It replaces the ad-hoc shell scripts in `/Volumes/Knox/tools/sandbox/` with a structured, team-friendly tool.

---

## Architecture

### Directory Layout (after `room-sandbox init`)

```
my-sandbox/                    # user creates this, cd's into it (or runs init inside an existing repo)
  sandbox.toml                 # source of truth — committed, shareable with team
  .room-sandbox/               # all generated artifacts — gitignored
    .sandbox-state.json        # last-applied config hashes
    .env                       # secrets (API keys, tokens)
    Dockerfile
    docker-compose.yml
    entrypoint.sh
    claude-wrapper.sh
    workspaces/
      r2d2/                    # independent git clone of target repo
      bumblebee/
      saphire/
```

`init` adds `.room-sandbox/` to `.gitignore` automatically. Only `sandbox.toml` is meant to be committed.

### Running inside an existing repo

If `room-sandbox init` is run inside a git repository, the wizard detects this and:

1. Infers `project.repo` from the existing remote (`git remote get-url origin`).
2. Auto-detects languages from marker files already present (`Cargo.toml`, `package.json`, etc.).
3. Tells the user: "Detected git repo `org/project` — using as sandbox target."
4. Still asks for confirmation / lets user override.
5. Writes `sandbox.toml` at the repo root, adds `.room-sandbox/` to `.gitignore`.

This way a team can check `sandbox.toml` into their project repo and anyone can `room-sandbox apply` to get a working sandbox.

### Config File — `sandbox.toml`

```toml
[project]
repo = "git@github.com:org/project.git"
container_name = "sandbox-my-sandbox"   # derived from directory name by default

[agents]
names = ["r2d2", "bumblebee", "saphire"]

[room]
default = "dev"

[auth]
method = "gh-cli"       # "gh-cli" | "pat" | "ssh"
mount_ssh = true        # whether to mount ~/.ssh read-only into the container

[environment]
languages = ["rust", "node"]        # "rust" | "node" | "python"
tools = ["claude", "gemini"]        # "claude" | "codex" | "gemini" | "glow"
```

---

## State Drift Detection — `check_state()`

A shared function used by all commands (except `init` and `clean`) to compare `sandbox.toml` against `.room-sandbox/.sandbox-state.json` and detect unapplied changes.

Returns which sections have drifted and what specifically changed. Commands use this contextually:

| Command type | Behavior on drift |
|---|---|
| **Read-only** (`agent list`, `logs`) | Warn: "sandbox.toml has unapplied changes — run `room-sandbox apply`" but proceed |
| **Runtime** (`shell`, `agent start`, `tui`, `up`) | Error if drift affects the requested operation (e.g. `shell r2d2` but r2d2 was just added and never cloned). Warn and proceed if drift is unrelated (e.g. environment change doesn't block shell access) |
| **Mutating** (`agent add`, `agent remove`) | Show the full picture: pending drift + the new mutation combined into one apply plan. Confirm before applying everything. User never gets a partial apply |

**Example — mutating command with existing drift:**

```
room-sandbox agent add wall-e

Unapplied changes detected:

  [environment]
  + python

Adding wall-e will also apply pending changes:

  Actions:
    1. Clone workspace for wall-e
    2. Write project-scoped CLAUDE.md for wall-e
    3. Rebuild container image (environment changed)
    4. Recreate container

Apply all changes? [y/N]
```

---

## Commands

### `room-sandbox init`

Interactive wizard that scaffolds a new sandbox environment.

**Repo URL normalization:**

The `--repo` flag (or wizard prompt) accepts three formats:

| Input | Resolved URL |
|---|---|
| `org/repo` (short form) | `git@github.com:org/repo.git` if auth is `ssh`, `https://github.com/org/repo.git` if auth is `gh-cli` or `pat` |
| `git@github.com:org/repo.git` | used as-is |
| `https://github.com/org/repo.git` | used as-is |

Short form assumes GitHub. Other hosts are not supported yet but the design leaves room for a future `[git] host` config field.

The resolved URL is what gets stored in `sandbox.toml`.

**Flow:**

1. **Detect context:**
   - If inside a git repo: infer `project.repo` from remote, auto-detect languages, inform the user.
   - If directory is empty: proceed with manual setup.
   - If directory is non-empty and not a git repo: refuse (ambiguous state).
   - If `sandbox.toml` already exists: refuse (use `apply` instead).
2. **Repo** — if not auto-detected, prompt for git URL (accepts short form `org/repo`, SSH, or HTTPS). Do a shallow clone (`--depth 1`) to a temp dir to auto-detect the stack.
3. **Environment — languages** — auto-detect from repo markers (`Cargo.toml` → rust, `package.json` → node, `requirements.txt`/`pyproject.toml` → python). Present detected + available, let user confirm/modify.
4. **Environment — tools** — multi-select from: claude, codex, gemini, glow. No auto-detection.
5. **Agents** — prompt for agent names. Default: `["r2d2", "bumblebee", "saphire"]`.
6. **Auth** — run `gh auth status` to show detected account. Options:
   - Use detected gh CLI token (show which account)
   - Provide a PAT manually
   - SSH only (no token)
   - If user picks gh-cli but has multiple accounts, warn and let them confirm or switch to PAT.
7. **SSH** — ask whether to mount `~/.ssh` into the container (default: yes if auth method is ssh, no if pat).
8. **Room** — prompt for default room name (default: `"dev"`).
9. Delete the temp shallow clone (if one was made).
10. Write `sandbox.toml` at the current directory root.
11. Create `.room-sandbox/` and add it to `.gitignore`.
12. Write `.room-sandbox/.env` (with secrets).
13. Generate Docker assets (`Dockerfile`, `docker-compose.yml`, `entrypoint.sh`, `claude-wrapper.sh`) into `.room-sandbox/` from templates + config.
14. Clone agent workspaces into `.room-sandbox/workspaces/<name>/`.
15. Write project-scoped CLAUDE.md for each agent (room usage instructions).
16. `docker compose build` + `docker compose up -d` (using `.room-sandbox/docker-compose.yml`).
17. Write `.room-sandbox/.sandbox-state.json` with applied config hashes.

**Non-interactive mode:**

```
room-sandbox init \
  --repo git@github.com:org/project.git \
  --agents r2d2,bumblebee \
  --languages rust,node \
  --tools claude \
  --auth gh-cli \
  --room dev
```

Skips the wizard, uses provided values + sensible defaults for anything omitted.

---

### `room-sandbox apply`

Reads `sandbox.toml`, diffs against `.room-sandbox/.sandbox-state.json`, shows a plan, and applies on confirmation.

**Diff logic per section:**

| Config change | Actions |
|---|---|
| `agents.names` added | Clone repo into `.room-sandbox/workspaces/<name>/`, write project-scoped CLAUDE.md |
| `agents.names` removed | Delete `.room-sandbox/workspaces/<name>/`, remove project-scoped CLAUDE.md |
| `environment` changed | Regenerate Dockerfile, rebuild Docker image, recreate container |
| `auth.method` changed | Update `.env` with new token, restart container |
| `auth.mount_ssh` changed | Regenerate `docker-compose.yml`, recreate container |
| `project.repo` changed | **Destructive** — wipe all workspaces, re-clone from new repo |
| `project.container_name` changed | Stop old container, start with new name |
| `room.default` changed | Update project-scoped CLAUDE.md templates only |

**Output format (terraform-style):**

```
room-sandbox apply

Changes detected:

  [environment]
  + python
  - rust

  [agents]
  + wall-e (clone git@github.com:org/project.git)

  Actions:
    1. Clone workspace for wall-e
    2. Write project-scoped CLAUDE.md for wall-e
    3. Rebuild container image (environment changed)
    4. Recreate container

Apply these changes? [y/N]
```

**Destructive changes get extra emphasis:**

```
  [project]
  ~ repo: git@github.com:org/old.git → git@github.com:org/new.git

  ⚠ This will DELETE all existing workspaces and re-clone.

  Actions:
    1. Remove workspaces: r2d2, bumblebee, saphire
    2. Clone 3 workspaces from new repo
    ...
```

Always requires explicit `y` confirmation. No `--yes` / `--force` flag.

---

### `room-sandbox agent add <name>`

Add an agent to the sandbox.

1. Validate agent name doesn't already exist in `sandbox.toml`.
2. Run `check_state()` — if there's existing drift, include it in the plan.
3. Update `sandbox.toml` — append name to `agents.names`.
4. Show combined plan (pending drift + new agent clone). Confirm with user.
5. Apply: clone repo into `.room-sandbox/workspaces/<name>/`, write project-scoped CLAUDE.md, plus any other pending changes.
6. Update `.sandbox-state.json`.

### `room-sandbox agent remove <name>`

Remove an agent from the sandbox.

1. Validate agent name exists in `sandbox.toml`.
2. Run `check_state()` — if there's existing drift, include it in the plan.
3. Update `sandbox.toml` — remove name from `agents.names`.
4. Show combined plan (pending drift + workspace deletion). Confirm with user.
5. Apply: delete `.room-sandbox/workspaces/<name>/`, remove project-scoped CLAUDE.md, plus any other pending changes.
6. Update `.sandbox-state.json`.

### `room-sandbox agent list`

List all agents defined in `sandbox.toml` with status:

```
NAME        STATUS
r2d2        running   (pid 1234)
bumblebee   ready
saphire     stopped
wall-e      missing   (run apply)
```

Statuses:
- **running** — ralph process active in container
- **ready** — workspace exists, not running
- **stopped** — workspace exists, was previously running
- **missing** — in toml but workspace not cloned (needs `apply`)

### `room-sandbox agent start <name> [names...] [-- extra-ralph-args...]`

Start ralph agent loops. Room is always read from `sandbox.toml` `room.default`.

1. Run `check_state()` — if drift exists, show plan, confirm, apply first.
2. Validate each agent exists in toml + workspace exists on disk.
3. Ensure container is running — auto-`up` if not.
4. For each agent, check if already running — error individually ("agent 'r2d2' is already running — use `agent restart r2d2`").
5. Start `room daemon` (idempotent).
6. Join agent, create room, subscribe (all idempotent).
7. Exec ralph loop: `room-ralph <room> <name> --allow-all [extra-args]`.

Example:
```
room-sandbox agent start r2d2
room-sandbox agent start r2d2 bumblebee saphire
room-sandbox agent start r2d2 -- --model sonnet --issue 42
```

Running detection: `docker exec <container> pgrep -f "room-ralph.*<name>"`.

### `room-sandbox agent stop <name> [names...]`

Kill the ralph process for the named agent(s) inside the container. No confirmation.

```
room-sandbox agent stop r2d2
room-sandbox agent stop r2d2 bumblebee
```

### `room-sandbox agent restart <name> [names...]`

Stop + start. Accepts the same `[-- extra-ralph-args...]` passthrough.

```
room-sandbox agent restart r2d2
room-sandbox agent restart r2d2 -- --model sonnet
```

---

### `room-sandbox tui [--as <user>]`

Open the room TUI inside the container. Room setup (daemon, room creation) is handled by `init`/`apply`, not here.

1. Run `check_state()` — warn on drift, but don't block.
2. Ensure container is running — auto-`up` if not.
3. `docker exec -it -u agent <container> room tui <room> --as <user>`.

Defaults: room from `sandbox.toml` `room.default`, user from `$(whoami)`.

```
room-sandbox tui              # join as $(whoami)
room-sandbox tui --as helix   # override username
```

---

### `room-sandbox shell [name] [--root]`

Open a bash shell inside the container.

```
room-sandbox shell              # shell as agent user, cwd /workspaces
room-sandbox shell r2d2         # shell as agent user, cwd /workspaces/r2d2
room-sandbox shell --root       # shell as root, cwd /workspaces
room-sandbox shell r2d2 --root  # shell as root, cwd /workspaces/r2d2
```

1. Ensure the container is running.
2. Run `check_state()` — warn on unrelated drift, error if the named agent was added but never cloned.
3. If `name` is provided, validate it exists in `sandbox.toml` AND workspace exists on disk. Error with available agents if not found.
4. `docker exec -it -u {root|agent} -w /workspaces/{name|.} <container> bash`

---

### `room-sandbox up`

Start the sandbox container (if not already running).

```
docker compose -f .room-sandbox/docker-compose.yml up -d
```

Fails if `.room-sandbox/` doesn't exist (not initialized).

### `room-sandbox down`

Stop the sandbox container without destroying anything.

```
docker compose -f .room-sandbox/docker-compose.yml down
```

Workspaces, config, and volumes are preserved. `up` resumes where you left off.

### `room-sandbox logs`

Tail the sandbox container logs.

```
docker compose -f .room-sandbox/docker-compose.yml logs -f
```

### `room-sandbox clean`

Remove all generated artifacts, keeping only `sandbox.toml`.

1. Show what will be deleted: `.room-sandbox/` (workspaces, Docker assets, state, env).
2. Stop and remove the container if running.
3. Confirm with user.
4. Delete `.room-sandbox/`.
5. `sandbox.toml` remains — user can `room-sandbox apply` to rebuild from scratch.

---

## Agent Instruction Injection

Each agent gets room usage instructions via Claude Code's project-scoped config at:
```
~/.claude/projects/<workspace-path-hash>/CLAUDE.md
```

This is written by the entrypoint (on container start) and by `agent add`. Content is templated from `sandbox.toml`:

```markdown
# Room Agent Instructions

You are agent `{agent_name}` in room `{room_name}`.

## Communication
- `room send <message>` — send a message to the room
- `room poll` — check for new messages
- ...
```

This approach has zero impact on the target repo — no files added, no git diff noise.

---

## Docker Assets

All Docker assets live inside `.room-sandbox/` and are generated from `sandbox.toml`.

### Dockerfile

Uses build args for feature flags. No commenting in/out — conditional `RUN` blocks:

```dockerfile
ARG INSTALL_RUST=false
ARG INSTALL_NODE=false
ARG INSTALL_PYTHON=false
ARG INSTALL_GLOW=false
ARG INSTALL_CLAUDE=false
ARG INSTALL_CODEX=false
ARG INSTALL_GEMINI=false

FROM rust:bookworm

# Base dependencies (always installed)
RUN apt-get update && apt-get install -y \
    git openssh-client curl jq tmux pkg-config libssl-dev gosu sqlite3 \
    && rm -rf /var/lib/apt/lists/*

# Conditional: Node.js
RUN if [ "$INSTALL_NODE" = "true" ]; then \
      curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
      && apt-get install -y nodejs \
      && corepack enable && corepack prepare yarn@1.22.22 --activate \
      && npm install -g turbo; \
    fi

# Conditional: Python
RUN if [ "$INSTALL_PYTHON" = "true" ]; then \
      apt-get update && apt-get install -y python3 python3-pip python3-venv \
      && rm -rf /var/lib/apt/lists/*; \
    fi

# Conditional: glow
RUN if [ "$INSTALL_GLOW" = "true" ]; then \
      curl -fsSL https://github.com/charmbracelet/glow/releases/download/v2.1.0/glow_2.1.0_linux_amd64.tar.gz \
      | tar xz -C /usr/local/bin glow; \
    fi

# Conditional: Claude Code
RUN if [ "$INSTALL_CLAUDE" = "true" ]; then \
      npm install -g @anthropic-ai/claude-code; \
    fi

# ... etc for codex, gemini

# room + room-ralph (always installed)
RUN cargo install room-cli room-ralph
```

The CLI generates `docker-compose.yml` with the correct `build.args` based on `sandbox.toml`.

### docker-compose.yml

Generated from config. Auth/SSH mounts are conditional:

```yaml
services:
  sandbox:
    build:
      context: .
      args:
        INSTALL_RUST: "true"
        INSTALL_NODE: "true"
        # ... from sandbox.toml [environment]
    container_name: sandbox-my-sandbox
    volumes:
      - ./workspaces:/mnt/sandbox-root
      - claude-data:/home/agent/.claude
      - cargo-cache:/usr/local/cargo/registry
      # conditional:
      - ~/.ssh:/home/agent/.ssh-host:ro    # only if mount_ssh = true
    env_file: .env
    # ...
```

---

## State Tracking — `.room-sandbox/.sandbox-state.json`

Written after every successful apply. Used to diff against current `sandbox.toml`.

```json
{
  "applied_at": "2026-03-19T18:50:00Z",
  "config_hashes": {
    "project": "a1b2c3",
    "agents": "d4e5f6",
    "room": "g7h8i9",
    "auth": "j0k1l2",
    "environment": "m3n4o5"
  }
}
```

Each hash is computed from the serialized section content. `apply` compares current hashes to stored hashes to determine what changed.