# Flocks — Implementation Plan
## Current State (2026-03-14)
Working end-to-end: dispatches `claude -p` tasks to kap devcontainers via
`kap::container::exec`, with load balancing across containers, worktree isolation,
queue draining, and state persistence. 25 unit tests. Tested against nitrocop's
kap container.
### Architecture
```
flocks.toml → config.rs (parse) → dispatch.rs (orchestrate) → kap::container::exec
↓
worktrees on host (.worktrees/) ←→ /workspace/.worktrees/ in container
↓
claude -p --dangerously-skip-permissions
↓
.flocks/state.json (persist)
```
### Files
```
src/main.rs — CLI (clap): up, run, status, stop, land, down
src/config.rs — flocks.toml parsing, defaults, find_config walk-up
src/task.rs — JSONL loading, script-based discovery
src/dispatch.rs — dispatch loop, container load balancer, worktree lifecycle,
kap exec, prompt resolution, state persistence
```
### Commands
```bash
cargo check # fast compile check
cargo test # 25 unit tests
cargo run -- --help # CLI help
cargo run -- run --help # run subcommand help
cargo run --release -- run --tasks tasks.jsonl # dispatch (release mode for speed)
cargo run -- status # show agent status from .flocks/state.json
```
### Dependencies
- kap is a path dependency at `../kap` — both repos must be siblings
- kap was modified to add `src/lib.rs` (commit 7ca80c8 on kap's main)
- kap's `container::exec` no longer calls `process::exit()` (returns Result instead)
### Example flocks.toml
```toml
max_agents = 5
stagger_delay_secs = 2
[[credentials]]
name = "sub-1"
env = "CLAUDE_TOKEN_1"
[[containers]]
name = "local"
max_worktrees = 3
kap_project = "/Users/peter/oss/nitrocop"
[source]
type = "jsonl"
path = "tasks.jsonl"
[workspace]
branch_prefix = "flocks/"
land_target = "main"
```
### End-to-end testing
Requires a running kap container. Test with nitrocop's:
```bash
# 1. Ensure container is running
cd ~/oss/nitrocop && kap list # should show nitrocop_devcontainer
# 2. Create test tasks
cat > /tmp/test-tasks.jsonl <<'EOF'
{"id": "test-1", "branch": "flocks/test-1", "prompt": "Create a file /tmp/proof.txt containing 'hello'. Do nothing else."}
EOF
# 3. Create flocks.toml (in flocks dir or target project dir)
# See example above
# 4. Run
cd ~/oss/flocks
cargo run --release -- run --tasks /tmp/test-tasks.jsonl
# 5. Check
cargo run -- status
cd ~/oss/nitrocop && devcontainer exec --workspace-folder . bash -lc "cat /tmp/proof.txt"
# 6. Clean up
cd ~/oss/nitrocop && git worktree prune && git branch -D flocks/test-1
rm -rf ~/oss/nitrocop/.worktrees
rm -rf ~/oss/flocks/.flocks
```
### Gotchas discovered during development
1. **Login shell required**: `bash -c` doesn't source `.bashrc`, so PATH misses
`~/.local/bin/claude` and mise shims. Must use `bash -lc`. Fixed in dispatch.rs.
2. **Host vs container paths**: Worktrees are created on the host filesystem by
host git. The container sees them via Docker volume mount, but at a different
path. Host: `<project>/.worktrees/<name>`. Container: `/workspace/.worktrees/<name>`.
The `cd` in the claude command must use the container path. Currently `/workspace`
is hardcoded (TODO #1).
3. **Worktree cleanup**: If a previous run was interrupted, stale worktrees and
branches may exist. `create_worktree()` proactively cleans these up (prune +
branch -D) before creating new ones.
4. **`claude -p` without `--max-turns`**: By default claude loops with tools until
done. A "simple" prompt in a nitrocop worktree caused claude to load CLAUDE.md,
explore the codebase, and work for 10+ minutes. This is correct behavior for
real tasks but confusing during testing. Use `--max-turns 1` in test prompts.
5. **kap::container::exec changes cwd**: The function expects `cwd` to be the
project directory (it calls `workspace_folder()` which checks for
`.devcontainer/devcontainer.json`). `kap_exec_sync()` saves/restores cwd around
the call. This is process-global state — fine for `spawn_blocking` but be aware.
### Key decisions already made
- Worktrees go in `<kap_project>/.worktrees/` (host side) so Docker mount makes them
visible at `/workspace/.worktrees/` inside the container
- `bash -lc` (login shell) for kap exec so PATH includes mise shims and claude
- `--dangerously-skip-permissions` on claude -p since agents are autonomous
- kap is a Rust library dependency (path = "../kap"), not a shell-out
- Stale worktrees/branches are cleaned up before each agent starts
---
## TODO Items (in priority order)
### 1. ~~Configurable workspace mount path~~ ✅ DONE
Added `workspace_mount` field to `[[containers]]` (default: `/workspace`).
Threaded through `ContainerSlots` into `run_agent()` to replace hardcoded path.
---
### 2. ~~Dry-run mode~~ ✅ DONE
`flocks run --dry-run` shows task→container assignments, branches, worktree
paths, and per-container slot allocation without executing anything.
---
### 3. ~~Credential injection~~ ✅ DONE
`credential` field on `[[containers]]` is now used. `"default"` = no injection
(use container's existing token). Any other value = env var name, read at dispatch
time and injected as `CLAUDE_CODE_OAUTH_TOKEN=<token>` prefix on the claude command.
No global env mutation — token is per-command inline.
---
### 4. ~~.worktrees/ in .gitignore~~ ✅ DONE
`create_worktree()` now best-effort appends `.worktrees/` to the project's
`.gitignore` if not already present.
---
### 5. ~~CLAUDE.md for flocks repo~~ ✅ DONE
Added in commit fe38377.
---
### 6. ~~Agent output capture / streaming~~ ✅ DONE
Bypassed `kap::container::exec` for the agent execution step — now runs
`devcontainer exec` directly via `std::process::Command` with stdout+stderr
redirected to `.flocks/logs/<task-id>.log`. Also eliminates the process-global
`set_current_dir` hack (passes `--workspace-folder` directly).
- `flocks logs <task-id>` shows agent output (`--tail N`, `--follow`)
- `flocks status` shows last 5 lines of log for failed agents
- `AgentState` has `log_path` field (backward-compatible with `serde(default)`)
---
### 7. TUI dashboard (ratatui)
**Problem:** `flocks status` prints a static snapshot. During a run, you want a
live-updating view.
**Fix:**
- Add `ratatui` and `crossterm` dependencies
- `flocks status` with no active run: static output (current behavior)
- `flocks status` during a run: live TUI showing:
- Per-container utilization bars
- Per-agent status (task id, container, elapsed time, status)
- Queue depth
- Overall progress (done/running/failed/queued)
- Poll `.flocks/state.json` every 1-2 seconds for updates
- Quit with `q` or Ctrl-C
**Files:** New `src/tui.rs`, `Cargo.toml` (add ratatui, crossterm)
---
### 8. ~~Agent timeout / stall detection + max_turns~~ ✅ DONE
Two complementary controls to prevent runaway agents:
- `agent_timeout_secs = 1800` (default 30 min, 0 = disabled) — wraps `run_agent`
in `tokio::time::timeout`. On timeout, the future is dropped and the child
process is killed. Agents get `AgentStatus::Timeout` in state.
- `max_turns = N` (optional) — maps to `claude -p --max-turns N`. Caps logical
work without a hard wall-clock cutoff.
`flocks status` shows `TIMEOUT` status with log tails.
---
### 9. ~~Prompt templates~~ ✅ DONE
`[prompt] template = "flocks-prompt.md"` now works. Template file is loaded once,
then `{{prompt}}`, `{{task.id}}`, `{{task.branch}}`, `{{task.title}}`,
`{{task.description}}` are substituted per task. Falls back to raw prompt if no
template configured.
---
### 10. ~~`flocks land` improvements~~ ✅ DONE
Rewrote `cmd_land`:
- Runs git in the kap_project directory (was running in cwd — bug)
- Detects empty branches (no commits ahead of target) and skips them
- Shows confirmation with commit counts before landing (`--yes` / `-y` to skip)
- Continues on cherry-pick conflicts (abort + skip, not break)
- Runs `[validate] commands` after landing
- Summary shows landed count and lists conflicts with retry commands
---
### 11. ~~Structured output + status dashboard + retry~~ ✅ DONE
Three related improvements:
- **Structured output**: `claude -p --output-format json` captures `num_turns`,
`duration_ms`, `total_cost_usd`, and agent's final response. Parsed and stored
in `AgentState` for status display and retry context.
- **Status dashboard**: `flocks status` now shows grouped output (running sorted
by elapsed desc for stall detection, done with avg time, failed/timed-out with
result summaries), container utilization, aggregate usage stats, and stall warnings.
- **Retry with learning**: `flocks retry --all-failed` re-dispatches failed tasks
with prior attempt context injected into the prompt (Ralph pattern). Agents learn
from prior failures.
---
### 12. ~~Supervise mode~~ ✅ DONE
`flocks run --supervise` runs a mechanical retry loop: dispatch all tasks →
wait → retry failures with prior attempt context → repeat until all pass or
`max_retries` exhausted (default 3, configurable in flocks.toml or `--max-retries N`).
Extracted `build_retry_tasks()` shared between `cmd_retry` and the supervise loop.
---
## Future / Nice-to-have
- **Linear/GitHub/JIRA task sources** — implement the `type = "linear"` etc. source
types that query external APIs for tasks
- **`flocks init`** — scaffold a flocks.toml for a project
- **Multi-machine** — dispatch to containers on remote machines (SSH + kap)
- **Webhook notifications** — Slack/email when a run completes
- **TUI dashboard** — ratatui live-updating view during runs