flocks 0.0.1-pre.1

# Flocks — Implementation Plan

## Current State (2026-03-14)

Working end-to-end: dispatches `claude -p` tasks to kap devcontainers via
`kap::container::exec`, with load balancing across containers, worktree isolation,
queue draining, and state persistence. 25 unit tests. Tested against nitrocop's
kap container.

### Architecture

```
flocks.toml → config.rs (parse) → dispatch.rs (orchestrate) → kap::container::exec
                                                                ↓
                                        worktrees on host (.worktrees/) ←→ /workspace/.worktrees/ in container
                                                                ↓
                                        claude -p --dangerously-skip-permissions
                                                                ↓
                                        .flocks/state.json (persist)
```

### Files

```
src/main.rs      — CLI (clap): up, run, status, stop, land, down
src/config.rs    — flocks.toml parsing, defaults, find_config walk-up
src/task.rs      — JSONL loading, script-based discovery
src/dispatch.rs  — dispatch loop, container load balancer, worktree lifecycle,
                   kap exec, prompt resolution, state persistence
```

### Commands

```bash
cargo check                    # fast compile check
cargo test                     # 25 unit tests
cargo run -- --help            # CLI help
cargo run -- run --help        # run subcommand help
cargo run --release -- run --tasks tasks.jsonl   # dispatch (release mode for speed)
cargo run -- status            # show agent status from .flocks/state.json
```

### Dependencies

- kap is a path dependency at `../kap` — both repos must be siblings
- kap was modified to add `src/lib.rs` (commit 7ca80c8 on kap's main)
- kap's `container::exec` no longer calls `process::exit()` (returns Result instead)

### Example flocks.toml

```toml
max_agents = 5
stagger_delay_secs = 2

[[credentials]]
name = "sub-1"
env = "CLAUDE_TOKEN_1"

[[containers]]
name = "local"
max_worktrees = 3
kap_project = "/Users/peter/oss/nitrocop"

[source]
type = "jsonl"
path = "tasks.jsonl"

[workspace]
branch_prefix = "flocks/"
land_target = "main"
```

### End-to-end testing

Requires a running kap container. Test with nitrocop's:
```bash
# 1. Ensure container is running
cd ~/oss/nitrocop && kap list   # should show nitrocop_devcontainer

# 2. Create test tasks
cat > /tmp/test-tasks.jsonl <<'EOF'
{"id": "test-1", "branch": "flocks/test-1", "prompt": "Create a file /tmp/proof.txt containing 'hello'. Do nothing else."}
EOF

# 3. Create flocks.toml (in flocks dir or target project dir)
# See example above

# 4. Run
cd ~/oss/flocks
cargo run --release -- run --tasks /tmp/test-tasks.jsonl

# 5. Check
cargo run -- status
cd ~/oss/nitrocop && devcontainer exec --workspace-folder . bash -lc "cat /tmp/proof.txt"

# 6. Clean up
cd ~/oss/nitrocop && git worktree prune && git branch -D flocks/test-1
rm -rf ~/oss/nitrocop/.worktrees
rm -rf ~/oss/flocks/.flocks
```

### Gotchas discovered during development

1. **Login shell required**: `bash -c` doesn't source `.bashrc`, so PATH misses
   `~/.local/bin/claude` and mise shims. Must use `bash -lc`. Fixed in dispatch.rs.

2. **Host vs container paths**: Worktrees are created on the host filesystem by
   host git. The container sees them via Docker volume mount, but at a different
   path. Host: `<project>/.worktrees/<name>`. Container: `/workspace/.worktrees/<name>`.
   The `cd` in the claude command must use the container path. Currently `/workspace`
   is hardcoded (TODO #1).

3. **Worktree cleanup**: If a previous run was interrupted, stale worktrees and
   branches may exist. `create_worktree()` proactively cleans these up (prune +
   branch -D) before creating new ones.

4. **`claude -p` without `--max-turns`**: By default claude loops with tools until
   done. A "simple" prompt in a nitrocop worktree caused claude to load CLAUDE.md,
   explore the codebase, and work for 10+ minutes. This is correct behavior for
   real tasks but confusing during testing. Use `--max-turns 1` in test prompts.

5. **kap::container::exec changes cwd**: The function expects `cwd` to be the
   project directory (it calls `workspace_folder()` which checks for
   `.devcontainer/devcontainer.json`). `kap_exec_sync()` saves/restores cwd around
   the call. This is process-global state — fine for `spawn_blocking` but be aware.

### Key decisions already made

- Worktrees go in `<kap_project>/.worktrees/` (host side) so Docker mount makes them
  visible at `/workspace/.worktrees/` inside the container
- `bash -lc` (login shell) for kap exec so PATH includes mise shims and claude
- `--dangerously-skip-permissions` on claude -p since agents are autonomous
- kap is a Rust library dependency (path = "../kap"), not a shell-out
- Stale worktrees/branches are cleaned up before each agent starts

---

## TODO Items (in priority order)

### 1. ~~Configurable workspace mount path~~ ✅ DONE

Added `workspace_mount` field to `[[containers]]` (default: `/workspace`).
Threaded through `ContainerSlots` into `run_agent()` to replace hardcoded path.

---

### 2. ~~Dry-run mode~~ ✅ DONE

`flocks run --dry-run` shows task→container assignments, branches, worktree
paths, and per-container slot allocation without executing anything.

---

### 3. ~~Credential injection~~ ✅ DONE

`credential` field on `[[containers]]` is now used. `"default"` = no injection
(use container's existing token). Any other value = env var name, read at dispatch
time and injected as `CLAUDE_CODE_OAUTH_TOKEN=<token>` prefix on the claude command.
No global env mutation — token is per-command inline.

---

### 4. ~~.worktrees/ in .gitignore~~ ✅ DONE

`create_worktree()` now best-effort appends `.worktrees/` to the project's
`.gitignore` if not already present.

---

### 5. ~~CLAUDE.md for flocks repo~~ ✅ DONE

Added in commit fe38377.

---

### 6. ~~Agent output capture / streaming~~ ✅ DONE

Bypassed `kap::container::exec` for the agent execution step — now runs
`devcontainer exec` directly via `std::process::Command` with stdout+stderr
redirected to `.flocks/logs/<task-id>.log`. Also eliminates the process-global
`set_current_dir` hack (passes `--workspace-folder` directly).

- `flocks logs <task-id>` shows agent output (`--tail N`, `--follow`)
- `flocks status` shows last 5 lines of log for failed agents
- `AgentState` has `log_path` field (backward-compatible with `serde(default)`)

---

### 7. TUI dashboard (ratatui)

**Problem:** `flocks status` prints a static snapshot. During a run, you want a
live-updating view.

**Fix:**
- Add `ratatui` and `crossterm` dependencies
- `flocks status` with no active run: static output (current behavior)
- `flocks status` during a run: live TUI showing:
  - Per-container utilization bars
  - Per-agent status (task id, container, elapsed time, status)
  - Queue depth
  - Overall progress (done/running/failed/queued)
- Poll `.flocks/state.json` every 1-2 seconds for updates
- Quit with `q` or Ctrl-C

**Files:** New `src/tui.rs`, `Cargo.toml` (add ratatui, crossterm)

---

### 8. ~~Agent timeout / stall detection + max_turns~~ ✅ DONE

Two complementary controls to prevent runaway agents:

- `agent_timeout_secs = 1800` (default 30 min, 0 = disabled) — wraps `run_agent`
  in `tokio::time::timeout`. On timeout, the future is dropped and the child
  process is killed. Agents get `AgentStatus::Timeout` in state.
- `max_turns = N` (optional) — maps to `claude -p --max-turns N`. Caps logical
  work without a hard wall-clock cutoff.

`flocks status` shows `TIMEOUT` status with log tails.

---

### 9. ~~Prompt templates~~ ✅ DONE

`[prompt] template = "flocks-prompt.md"` now works. Template file is loaded once,
then `{{prompt}}`, `{{task.id}}`, `{{task.branch}}`, `{{task.title}}`,
`{{task.description}}` are substituted per task. Falls back to raw prompt if no
template configured.

---

### 10. ~~`flocks land` improvements~~ ✅ DONE

Rewrote `cmd_land`:
- Runs git in the kap_project directory (was running in cwd — bug)
- Detects empty branches (no commits ahead of target) and skips them
- Shows confirmation with commit counts before landing (`--yes` / `-y` to skip)
- Continues on cherry-pick conflicts (abort + skip, not break)
- Runs `[validate] commands` after landing
- Summary shows landed count and lists conflicts with retry commands

---

### 11. ~~Structured output + status dashboard + retry~~ ✅ DONE

Three related improvements:
- **Structured output**: `claude -p --output-format json` captures `num_turns`,
  `duration_ms`, `total_cost_usd`, and agent's final response. Parsed and stored
  in `AgentState` for status display and retry context.
- **Status dashboard**: `flocks status` now shows grouped output (running sorted
  by elapsed desc for stall detection, done with avg time, failed/timed-out with
  result summaries), container utilization, aggregate usage stats, and stall warnings.
- **Retry with learning**: `flocks retry --all-failed` re-dispatches failed tasks
  with prior attempt context injected into the prompt (Ralph pattern). Agents learn
  from prior failures.

---

### 12. ~~Supervise mode~~ ✅ DONE

`flocks run --supervise` runs a mechanical retry loop: dispatch all tasks →
wait → retry failures with prior attempt context → repeat until all pass or
`max_retries` exhausted (default 3, configurable in flocks.toml or `--max-retries N`).
Extracted `build_retry_tasks()` shared between `cmd_retry` and the supervise loop.

---

## Future / Nice-to-have

- **Linear/GitHub/JIRA task sources** — implement the `type = "linear"` etc. source
  types that query external APIs for tasks
- **`flocks init`** — scaffold a flocks.toml for a project
- **Multi-machine** — dispatch to containers on remote machines (SSH + kap)
- **Webhook notifications** — Slack/email when a run completes
- **TUI dashboard** — ratatui live-updating view during runs