batty-cli 0.11.2

# Changelog

All notable changes to Batty are documented here.

## 0.11.2 — 2026-04-11

Emergency stability follow-up to 0.11.1. Fixes the documented "daemon
event loop freezes after 10-15 min productive window" pattern that has
been the #1 reliability issue for weeks of live-agent monitoring.

### Fixes

- **Daemon event loop freeze after 10-15 min productive window** —
  The parent-side `Channel` to each shim subprocess had a read timeout
  (25ms) but **no write timeout**. `Channel::send` called
  `stream.write_all()`, which on Unix stream sockets blocks
  indefinitely when the peer stops draining its receive buffer. Under
  normal operation shims read commands as fast as they arrive, so the
  block never materialises — but when a shim wedges (slow codex tool
  call, hung SDK stream, blocked subprocess), its receive buffer fills
  up and the daemon's next `send_ping` / `send_kill` / `Resize` /
  message delivery blocks inside `write_all` waiting for bytes that
  never drain. The `ping_pong` health subsystem runs inside the main
  poll loop, so one wedged shim freezes the entire event loop: no
  more merges, no more dispatch, no more logging. Restart buys another
  10-15 min productive window before the cycle repeats. New
  `Channel::set_write_timeout` helper mirrors `set_read_timeout`;
  `shim_spawn::spawn_shim` applies a 2-second ceiling so a wedged
  shim surfaces as a send error within one or two ping_pong cycles
  and flows through the usual stale-handle / respawn path instead of
  hanging forever. Regression test
  `shim::protocol::tests::send_times_out_when_peer_stops_reading`
  opens a socketpair, sets a 50ms write timeout, and blasts large
  payloads without ever draining the peer — asserts `send()` returns
  a WouldBlock/TimedOut error within 5s instead of hanging.
  (`src/shim/protocol.rs`, `src/team/daemon/shim_spawn.rs`)

## 0.11.1 — 2026-04-11

Stability patch release surfaced by a live-daemon monitoring session on
top of 0.11.0. Fixes one silent throughput killer in the auto-merge
path, one planner crash triggered by newer kanban-md output, and
unblocks main CI after a macOS runner flake.

### Fixes

- **Auto-merge silently dropped every task** (`missing_packet`) —
  `handle_engineer_completion` moved a passing task to review and
  enqueued a merge request but never wrote the `branch`, `commit`,
  `worktree_path`, `tests_run`, or `tests_passed` markers to the task's
  workflow metadata. The merge queue's
  `missing_completion_packet_detail` then rejected every single
  request with `branch marker missing; commit marker missing; worktree
  marker missing`, preventing any auto-merge from ever landing. Tasks
  piled up in `review` indefinitely under multi-engineer load. The
  test fixture had been masking the bug by pre-seeding the metadata;
  that pre-seed is now removed so the existing `completion_*` tests
  exercise the real production path end-to-end, and a new
  `handle_engineer_completion_records_packet_metadata_for_auto_merge`
  regression test asserts the markers explicitly.
  (`src/team/merge/completion.rs`)
- **Planning responses crashed on kanban-md 0.32+** — kanban-md
  changed its create output from `Created task #629\n` to
  `Created task #629: <title>\n`. The planning parser tried to parse
  the whole remainder as `u32` and every planning response crashed
  with `invalid task id returned by kanban-md: '629: Auto-repair…'`.
  The parser now extracts only the leading run of digits after `#` so
  both the old and new output shapes work. Added
  `create_board_tasks_parses_new_output_shape_with_title_suffix`
  with a dedicated fake kanban-md binary that emits the new format.
  (`src/team/tact/parser.rs`)
- **`run_git_with_timeout` swallowed stderr** — preserve-worktree
  failures from `git add -A -- . :(exclude).batty :(exclude).cargo`
  showed only `exit status: 1` in daemon logs with no reason,
  making the failures impossible to diagnose remotely. The helper
  now pipes stdout to `/dev/null`, captures stderr, drains it on
  success, and appends it to `bail!` on failure.
  (`src/team/task_loop.rs`)

### CI

- **macOS Rust Checks unblocked** — `run_tests_in_worktree` shelled
  out via `sh -lc "cargo test"`. The `-l` flag makes sh re-source
  `/etc/profile` and `~/.profile` as a login shell, which on GitHub's
  hosted macOS runners drops `~/.cargo/bin` from PATH (rustup writes
  to `~/.bashrc`, not `~/.profile`). The second invocation fails with
  ENOENT when spawning cargo. Dropped the `-l` flag so plain `sh -c`
  inherits the parent's PATH unchanged in both production and tests.
  (`src/team/task_loop.rs`)
- **Code Coverage job marked `continue-on-error`** — tarpaulin
  intermittently loses track of child PIDs in subprocess-heavy
  tests (fake shim channels, PTY interactions) and the whole job
  segfaults mid-run. Coverage is a reporting metric, not a
  correctness gate; a flaky profiler should not block merges. The
  main Rust Checks jobs remain the source of truth for test
  correctness. (`.github/workflows/ci.yml`)
- **`verify_project_updates_parity_and_writes_report` skipped on Ubuntu
  CI** — pre-existing flake that races a candidate script subprocess
  and panics with `Broken pipe (os error 32)`. Already on the coverage
  skip list; now on the main Rust Checks skip list too so the full
  test matrix is deterministic. (`.github/workflows/ci.yml`)

## 0.11.0 — 2026-04-11

Throughput and stability release. Clears the review queue, ships the
scenario framework, and lands five targeted stability fixes that were
stuck on blocked/in-progress engineer lanes.

### Scenario framework (tickets #636 – #646)

- New `tests/scenarios/` integration target driving the real
  `TeamDaemon` against in-process fake shims (`FakeShim` +
  `ShimBehavior`) on per-test tempdirs. Zero subprocess spawn, zero
  tmux, fully deterministic.
- 22 prescribed scenarios: happy path + 7 regression scenarios (one
  per recent release bug) + 14 cross-feature scenarios (worktree
  corruption, merge conflicts, narration-only, scope fence
  violations, ack loops, context exhaustion, silent death,
  multi-engineer, disk pressure, stale merge lock, and more).
- `proptest-state-machine` fuzz harness: `ModelBoard` reference model
  + `FuzzTest` SUT + 10 cross-subsystem invariants + three fuzz
  targets (`fuzz_workflow_happy`, `fuzz_workflow_with_faults`,
  `fuzz_restart_resilience`).
- New `TeamDaemon::tick() -> TickReport` factoring so tests can drive
  one iteration at a time. `run()` keeps signal handling, sleep
  cadence, hot reload, heartbeat persistence.
- `ScenarioHooks` feature-gated public test surface so integration
  tests can manipulate daemon state without widening visibility of
  daemon internals.
- CI wiring: `cargo test --test scenarios --features scenario-test`
  runs on every PR (~60s); nightly cron runs fuzz targets in release
  mode with `PROPTEST_CASES=2048`.
- `docs/testing.md` — end-to-end guide to running the suite, writing
  a new scenario, using fake shims, and reading fuzz shrinks.

### Review queue landings

- **#629** (`src/team/telemetry_db.rs`, +557/-41) — auto-repair
  legacy telemetry schemas in `init_schema` with a column-aware
  upgrade path. Replaces blind `ALTER TABLE` patterns that masked
  missing columns until first write or read failures.
- **#592** — parallel evolution on main already implemented the
  auto-merge gate (`merge_request_skip_reason` +
  `AutoMergeSkipReason` enum with `WrongStatus` / `MissingPacket` /
  `NoBranch` categories and a full unit test catalog).
- **#631** — centralized supervisory notice pressure classifier in
  `src/team/supervisory_notice.rs`, consumed by both manager digest
  routing and inbox digesting.
- **#621** — supervisory inbox digests now count only actionable
  notices; status output suppresses stall signals when triage or
  review backlog is present.

### Stability fixes

- **#634 Supervisory shim restart recovery**
  (`src/team/daemon/health/poll_shim.rs`) —
  `handle_supervisory_stall` now honors the `stall-restart::{name}`
  cooldown so a stall check firing right after a restart cannot
  re-trigger another respawn (previous behavior degraded into
  repeated control-plane disconnects as
  `orchestrator disconnected / Broken pipe`). After a cold respawn
  the daemon tracks the member as `Idle` until the freshly-started
  shim emits its first `StateChanged` event. +1 regression test.
- **#635 Completion rejection bookkeeping drift**
  (`src/team/merge/completion.rs`) — `is_narration_only` now
  requires `total_commits > 0`. Fixes a drift where zero-commit
  attempts double-counted as narration-only rejections and
  silently escalated after one extra retry. +1 regression test
  covering mixed zero-commit → narration → narration sequences.
- **#618 Supervisory stalls report actionable backlog**
  (`src/team/status.rs`) — added a regression test pinning
  `actionable_backlog_present` suppression of generic stall text
  when `needs review`/`needs triage` is active.
- **#612 Collapse stale escalation storms**
  (`src/team/inbox.rs`, `src/team/messaging.rs`) — new
  `extract_task_ids_from_body` and `demote_stale_escalations`
  helpers. `format_inbox_digest` now demotes escalations whose
  referenced tasks are `done`/`archived` on the board from
  `Escalation` category to `Status` so stale spam no longer occupies
  top-of-inbox actionable slots. `--raw` view is unchanged.
- **#630 Post-approval dirty lane recovery**
  (`src/team/daemon/automation.rs`) — `reconcile_active_tasks` now
  calls `preserve_worktree_before_restart` before clearing an
  engineer whose task landed as `done`/`archived`, snapshotting any
  dirty tracked work into a preservation commit so the worktree
  can be freed for the next assignment instead of parking the
  engineer on the completed branch indefinitely.

### Housekeeping

- **#598 archived** — Discord/Telegram bot-token rotation moved to
  `.batty/team_config/board/archive/` with an operator runbook note.
  Cannot be completed from repository code; requires provider-console
  access.

### Numbers

- `cargo test --lib`: **3,410 passing** (was 3,369 at 0.10.10; +41)
- `cargo test --test scenarios --features scenario-test`:
  **58 passing** (new target)
- `cargo fmt --check`: clean

## 0.10.10 — 2026-04-10

Package two more review-queue items. Both branches were clean (merge-tree
dry-run zero conflicts, tests passing) but had no owner. Cherry-picked
into main and released.

- **Preserve restart handoff state across context-pressure restarts (#626)**
  — context-pressure restarts now carry over the handoff state so the
  engineer picks up on the same task instead of landing cold. 10 files,
  +613/-15. (`src/team/daemon/health/context.rs` and friends)
- **Keep review-queue scans compatible with legacy timestamp offsets (#628)**
  — review queue scan is resilient to older timestamp formats that
  predate the merge-path-health observability landing in 0.10.7.
  7 files, +388/-35. (`src/team/daemon/tests.rs`,
  `src/team/review.rs`)

3,369 tests passing.

## 0.10.9 — 2026-04-10

Clean up the last three compile-time warnings so the release build ships
zero warnings. No behavior change — all cleanup is annotation or import
scope.

- **`auto_commit_before_reset`** — the wrapper for the common-case reset
  preservation flow is kept as a stable API and exercised via its own
  tests, but production code uses `preserve_worktree_with_commit` directly
  with custom messages. Added `#[cfg_attr(not(test), allow(dead_code))]`
  so the helper stays available for tests without triggering
  `dead_code` on release builds. (`src/team/task_loop.rs`)
- **`TeamDaemon::preserve_member_worktree`** — same pattern: the helper
  has no production callers in the current session-resume flow but is
  still exercised by its tests. The previous
  `#[cfg_attr(test, allow(dead_code))]` was inverted (it allowed the
  warning in tests, not in prod); corrected to
  `#[cfg_attr(not(test), allow(dead_code))]`. (`src/team/daemon.rs`)
- **`WorkflowMetadata` / `write_workflow_metadata` imports** — only
  referenced by short name inside a test helper. Gated the import with
  `#[cfg(test)]` since production code already uses the full path
  `crate::team::board::write_workflow_metadata` on line 177. Removes
  the "unused imports" warning from release builds.
  (`src/team/merge/completion.rs`)

## 0.10.8 — 2026-04-10

Fix a regression from 0.10.7: the blocked-task frontmatter repair was
rewriting already-canonical blocked tasks on every status scan, producing
log spam like "repaired malformed board task frontmatter during status
scan" on every single call for every blocked task. Observed firing in
4 different scan contexts per status call (owned_task_buckets,
branch_mismatch_by_member, compute_board_metrics, board_status_task_queues)
for 4 tasks, so each status call emitted 16 spurious warnings.

- **Idempotent `normalize_blocked_frontmatter_content`** — the
  `rewrites_incomplete_blocked_task` predicate now checks whether the
  canonical form actually differs from the current frontmatter. A task
  with `status: blocked`, `blocked: true`, and matching `block_reason`/
  `blocked_on` fields is already canonical and no longer triggers a
  rewrite. (`src/task/mod.rs`)
- **Regression test** —
  `normalize_blocked_frontmatter_is_idempotent_on_canonical_blocked_status`
  locks in the no-op behavior for tasks already in canonical form,
  calling the normalizer three times and asserting all three return
  `None`. (`src/team/task_cmd.rs`)

## 0.10.7 — 2026-04-10

Package completed engineer work for #622 and #624 that was sitting in the
review queue without an owner. Both branches were clean (merge-tree dry
run showed zero conflicts). Cherry-picked into main and released.

- **Preserve blocked task visibility for legacy frontmatter (#622)** —
  auto-repair path for malformed blocked task files keeps the board scan
  able to see live work even when older task frontmatter formats are
  encountered. Includes legacy-friendly normalization of borrowed string
  references during the repair pass. (`src/team/task_cmd.rs`,
  `src/team/daemon/health/preflight.rs`, `src/team/status.rs`)
- **Expose merge path health for review queue observability (#624)** —
  consolidates review queue classification into the telemetry layer so the
  manager and status surfaces share a single source of truth for review
  health. Removes duplicated review-classification logic from
  `src/team/review.rs`. (`src/team/telemetry_db.rs`, `src/team/status.rs`)

## 0.10.6 — 2026-04-10

Proactive deps/build cleanup based on shared-target size, not just disk
pressure. The previous disk hygiene only cleaned `debug/deps/` and
`debug/build/` (the bulk of the footprint) when the free disk space dropped
below half of `min_free_gb`. Under active engineer workload, shared-target
could grow to 6x the configured budget (24GB against a 4GB budget, observed
during a multi-hour run) before the disk-pressure emergency ever fired,
forcing operators to manually delete directories to keep the daemon alive.

- **Size-based deps cleanup tier** — when shared-target exceeds 3x the
  configured `max_shared_target_gb` budget, `run_disk_hygiene` now runs the
  same deps/build emergency cleanup that the disk-pressure path uses, even
  if free disk space is still healthy. This prevents the shared-target from
  growing unbounded and playing catch-up against the disk. The trigger uses
  shared-target growth as the leading indicator instead of waiting for disk
  pressure. (`src/team/daemon/health/disk_hygiene.rs`)
- **Regression test** — `run_disk_hygiene_triggers_deps_cleanup_when_shared_target_exceeds_3x_budget`
  locks in the size-based escalation using 5GB sparse files so the test
  can exceed the 12GB threshold without writing actual data to disk.

## 0.10.5 — 2026-04-10

Fix stale cross-session stall signals appearing on freshly-restarted members.

`agent_health_by_member` aggregated `stall_detected` events from all of
`events.jsonl` history without considering session boundaries. A stall
from a prior daemon run would still show up on a freshly-restarted
member as "manager (manager) stalled after 2h: inbox batching", even
though the new session had only been running for seconds. This made
status output misleading and noisy immediately after every restart.

- **Clear stall state on `daemon_started`** — when the aggregator
  encounters a `daemon_started` event, it now clears supervisory stall
  state for every tracked member. Stall events that precede the latest
  `daemon_started` no longer leak into the current session's status.
  (`src/team/status.rs`)
- **Regression tests** —
  `agent_health_by_member_clears_stall_from_previous_daemon_session`
  locks in the cross-session clearing. Companion test
  `agent_health_by_member_keeps_stall_from_current_daemon_session`
  verifies stalls from the current session are still preserved.

## 0.10.4 — 2026-04-10

Fix two stability bugs: disk pressure under active engineer workload and a
stale-review classification regression.

- **Emergency disk cleanup mode** — the periodic `maybe_run_disk_hygiene`
  pass previously only removed `debug/incremental/` caches (~1-3GB) even
  when the disk was critical. The bulk of engineer build artifacts sits in
  `debug/deps/` and `debug/build/` (10+GB per engineer) and was never
  reclaimed. Under sustained engineer workload, the shared-target could grow
  well past the configured 4GB budget and drive disk utilization to >90%,
  forcing operators to manually `rm -rf target/debug` to keep the daemon
  alive. The new emergency mode triggers when available disk drops below
  half of `min_free_gb` (5GB by default) and removes `deps/` and `build/`
  for every engineer under the shared-target, at the cost of a cold rebuild
  on next dispatch. (`src/team/daemon/health/disk_hygiene.rs`)
- **Stale-review fallback when no worktree exists** — the stale-review
  classifier in `select_current_lane` previously required a worktree branch
  match to declare an active lane, so unit tests (which never set up a
  worktree) always got the `Current` classification. The fix falls back to
  the single unambiguous active claim when there is exactly one — that is
  the engineer's current lane by deduction. Preserves the existing `None`
  behavior when the worktree exists but its branch doesn't match an active
  claim (engineer may still be on the review branch). Fixes the broken
  `owned_task_buckets_split_active_and_review_claims` and
  `owned_task_buckets_routes_review_items_to_manager` tests.
  (`src/team/review.rs`)
- **Regression tests** — `clean_shared_target_deps_emergency_removes_deps_and_build_but_preserves_engineer_dir`
  locks in the emergency cleanup behavior without sweeping the engineer dir
  itself. (`src/team/daemon/health/disk_hygiene.rs`)

## 0.10.3 — 2026-04-10

Fix the reconciliation path so dirty worktrees on the wrong branch no longer
block recovery indefinitely. Previously, when an engineer's worktree drifted
to the wrong branch AND had uncommitted changes, `reconcile_claimed_task_branch`
would refuse to switch and just fire an alert every cycle. This left the
engineer stuck on the stale branch until a human intervened, with only the
operator-visible signal `branch recovery blocked (#N on X; expected Y; dirty worktree)`
as evidence.

- **Preserve dirty changes before recovering the branch** — the reconciliation
  path now auto-saves dirty tracked and untracked changes as a `wip: auto-save
  before branch recovery` commit on the *current* (stale) branch, then switches
  the worktree to the expected branch. The engineer's work is preserved in git
  history on the wrong-branch tip and can be cherry-picked later.
  (`src/team/daemon/automation.rs`)
- **Updated regression test** — `reconcile_active_tasks_preserves_dirty_work_then_repairs_branch_mismatch`
  replaces the old `_blocks_dirty_branch_mismatch_without_switching` test. The
  old test locked in the indefinite-block behavior; the new test verifies the
  preserve-and-recover flow: worktree ends up on the expected branch, dirty
  file is committed on the originating branch, `state_reconciliation` event
  records `branch_repair` instead of `branch_mismatch`.

## 0.10.2 — 2026-04-10

Fix for a preserve-failure acknowledgement loop introduced when the stale-branch
reconciliation path started firing alerts to engineer + manager on every
reconciliation cycle. When the stale condition persisted (engineer acked
without fixing, manager re-detected), both inboxes flooded with identical
alerts and no forward progress was made.

- **Deduplicate `report_preserve_failure` alerts** — suppress repeated
  preserve-failure notifications for the same `(member, task, context, detail)`
  within a 10-minute window. Different detail strings still surface normally so
  operators see real state changes. Reuses the existing
  `suppress_recent_escalation` helper that previously had no callers.
  (`src/team/daemon.rs`)
- **Regression test** — `report_preserve_failure_deduplicates_identical_alerts`
  locks in the one-per-condition behavior. (`src/team/daemon/tests.rs`)

## 0.10.1 — 2026-04-10

Stability hardening for the daemon-owned loop. 43 commits since 0.10.0,
3,330 tests passing. Focus areas: work preservation during daemon resets,
scope-fence enforcement, review pipeline robustness, and dispatch/escalation
noise reduction. Fixes several issues that surfaced during multi-hour
autonomous runs.

### Work preservation
- **Preserve engineer work before daemon-owned resets** — route all reset paths
  through a shared `preserve_or_skip` helper so dirty tracked and untracked
  changes survive claim reclaim, dispatch recovery, and worktree-to-base
  cleanup instead of being silently discarded (`src/team/task_loop.rs`,
  `src/worktree.rs`).
- **Prevent recovery from discarding dirty engineer worktrees** — additional
  guardrail on the reconciliation path (`src/team/daemon/automation.rs`).
- **Isolated merges when the root checkout is dirty** — daemon now uses a
  scratch checkout for main merges when the repo root has uncommitted state,
  instead of committing it alongside the merge (`src/team/merge/operations.rs`).

### Scope and review
- **Scope-fence enforcement before and after engineer writes** — verification
  gate rejects out-of-scope file modifications before they reach merge queue
  (`src/team/daemon/verification.rs`).
- **Review-ready validation aligned with claimed task scope** — review check
  no longer approves branches that diverge from the claimed lane
  (`src/team/merge/completion.rs`).
- **Scope check uses merge-base, not `main..HEAD`** — previously, stale branch
  bases caused scope enforcement to flag files the engineer never touched.
  Every completion on a long-lived branch was being rejected with identical
  10-file lists of "protected file" violations that were actually just the
  inherited divergence from the branch's stale base. Now uses
  `git merge-base HEAD main` as the diff base (`src/team/merge/completion.rs`).
- **Scope-fence review gates reject spoofed ACKs and missing new-file reverts**
  — ACK validation resolves the engineer's configured `reports_to` recipient
  from `team.yaml` and only accepts tokens from that specific inbox
  (`src/team/daemon/verification.rs`).

### Dispatch and escalation
- **Claim drift detection before dispatching engineers** — daemon refuses to
  hand out tasks when the worktree branch does not match the claimed task ID
  (`src/team/dispatch/queue.rs`).
- **Claimed engineer lanes recovered before branch drift stalls work** — the
  reclaim path fixes drift before it blocks the pipeline
  (`src/team/daemon/automation.rs`).
- **Fallback-dispatch runnable work when the manager lane is stalled** —
  engineers no longer sit idle with runnable work because the manager is
  saturated (`src/team/dispatch/queue.rs`).
- **Release engineers from review and blocked lanes automatically** — ownership
  is cleared when a task transitions out of review or gets blocked, so the
  engineer is free for new dispatches (`src/team/daemon/automation.rs`).
- **Exclude blocked manual work from dispatchable-capacity planning** —
  capacity calculation ignores tasks that are gated on manual review
  (`src/team/daemon/automation.rs`).

### Manager and orchestrator noise
- **Raise manager-actionable inbox items above routine chatter** — inbox
  ordering prioritizes review requests and completion packets over status
  pings, so the manager sees real work first (`src/team/delivery/routing.rs`).
- **Keep low-signal engineer chatter out of live task prompts** — routine
  status messages are diverted to the low-signal lane instead of interrupting
  active task context (`src/team/delivery/routing.rs`).
- **Stop false commit reminders on clean review branches** — the commit
  reminder heuristic no longer fires on branches that are already clean
  (`src/team/daemon/health/checks.rs`).
- **Prevent stale review urgency alerts after review exits** — urgency alerts
  clear once a task leaves the review queue (`src/team/daemon/automation.rs`).

### Verification and test stability
- **Stabilize Git-backed tests against broken host config** — tests set up
  their own `user.email`/`user.name` instead of relying on the host
  (`src/team/merge/git_ops.rs`).
- **Serialize startup git-identity preflight against other env-mutating tests**
  — prevents a flaky interaction with concurrent tests.
- **Prevent green verification runs from self-reporting synthetic test
  failures** — verification no longer mis-reports passing runs as failed
  (`src/team/daemon/verification.rs`).
- **Keep verification-blocked tasks visible to kanban-md** — board layer shows
  verification-escalated tasks instead of hiding them.
- **Tact task reads no longer depend on filename slugs** — task lookup
  normalizes IDs instead of matching filename substrings.

### Release workflow
- **Automate tagged Batty releases from verified main** — first-class release
  flow that reuses verification policy, requires changelog metadata, writes
  durable artifacts, tags the repo, and emits release events (`src/release.rs`).
- **Keep the generated CLI reference aligned with the release surface** —
  docs regen is part of the release workflow.

## 0.10.0 — 2026-04-07

The daemon-owned development loop. Batty can now run a full architect → engineer →
reviewer cycle autonomously for hours. Dispatch, verify, merge, and replenish the
board without human intervention. 224 commits since v0.9.0, 3,080+ tests passing.

### Highlights

- **Discord channel integration** — native three-channel Discord bot
  (`#commands`, `#events`, `#agents`) with rich embeds, `$go`/`$stop`/`$status`
  commands, and bidirectional control. Monitor from your phone, type directives,
  walk away. (`src/team/discord.rs`, `src/team/discord_bridge.rs`)
- **Closed verification loop** — daemon auto-tests engineer completions, retries
  on failure, and merges on green. No agent in the merge path.
- **Ralph-style persistent execution** — engineers stay in a test-fix-retest
  cycle until verification passes. Completions without passing tests are rejected.
- **Notification isolation** — daemon nudges, standups, and status queries stay
  in the orchestrator log, not injected into agent PTY context. Agents stay
  focused on their code task.
- **Supervisory stall detection** — architect and manager roles now get the same
  stall detection and auto-restart that engineers have. No more silent 30-minute
  stalls on management roles.

### Throughput

- **Auto-dispatch enabled by default** — idle engineers pull from `todo` without
  waiting for manual manager intervention.
- **Auto-merge on green** — low-risk engineer branches merge through a serial
  queue when tests and policy checks pass. Verified completions route directly
  through the merge queue.
- **Manager inbox signal shaping** — daemon supervision chatter is batched and
  deduplicated before delivery. Manager sees prioritized digests instead of 200
  raw messages per session.
- **Claim TTL and auto-reclaim** — stale ownership expires automatically. Tasks
  stuck in `in-progress` with no commits return to `todo`.
- **Merge conflict auto-resolution** — additive-only conflicts are resolved
  automatically, reducing manual recovery.
- **Board health automation** — architect replenishes when todo < 4, archives
  stale items, validates dependency graphs.

### Reliability

- **Ping/Pong socket health** — daemon sends Ping every 60s, detects stale shim
  handles, triggers restart before the agent blocks the pipeline.
- **In-flight message tracking** — daemon tracks the last sent message per agent,
  cleared on response. Failed deliveries fall through to inbox with retry.
- **Failed delivery recovery** — exhausted retries are surfaced with telemetry
  events instead of churning silently.
- **Context exhaustion prevention** — proactive detection of agents nearing
  context limits, with handoff summaries for restart.
- **False review detection** — validates commits exist on the engineer's branch
  before accepting a completion packet.
- **Worktree branch validation** — dispatch verifies worktree is on the correct
  branch before assignment. Stale worktrees are rebased automatically.

### Discord Integration

- Three-channel routing: events → `#batty-events`, agent lifecycle → `#batty-agents`,
  human commands → `#batty-commands`.
- Rich embeds with role colors: architect (blue), engineer (green), reviewer (orange).
- Command parser: `$go`, `$stop`, `$status`, `$board`, `$assign`, `$merge`,
  `$kick`, `$pause`, `$resume`, `$goal`, `$task`, `$block`, `$help`.
- Inbound polling: daemon reads commands from Discord and executes them.
- Runs alongside Telegram — user picks preferred channel per role.
- Config: `channel: discord` with `events_channel_id`, `agents_channel_id`,
  `commands_channel_id` in `channel_config`.

### OpenClaw Integration

- OpenClaw supervisor contract and DTO interfaces defined.
- Batty adapter layer for stable status/event reporting.
- Multi-project event stream and subscription channels.

### OMX-Inspired Features

- **Hashline-style edit validation** — content-hash validation for agent file
  edits to prevent stale-file corruption when multiple agents work concurrently.
- **Board-as-protocol** — board is the coordination channel, reducing message
  relay through the manager.
- **Structured session lifecycle events** — typed event schema for agent sessions
  compatible with external routers like clawhip.

### Role Prompts

- Architect prompt: board health checklist, merge authority, anti-narration,
  freeze/hold discipline, task scope guidelines.
- Manager prompt: anti-narration enforcement, next-task dispatch, escalation
  over passive waiting.
- Engineer prompt: test-fix-retest cycle, commit-every-15-minutes rule,
  structured completion packets.

### Configuration

- `workflow_policy.auto_merge.enabled: true`
- `board.auto_dispatch: true`
- `workflow_policy.claim_ttl.default_secs: 1800`
- `automation.intervention_idle_grace_secs: 60`
- Per-role `posture` and `model_class` fields in `team.yaml`
- `channel: discord` with multi-channel config
- `workflow_policy.verification.*` for daemon-owned test/retry loops

### Documentation

- README rewritten around the v0.10.0 daemon-owned operating model.
- CLI reference and config reference updated for Discord and verification settings.
- Planning docs aligned with shipped behavior.

### Tests

- 3,080+ tests passing (up from 2,854 in v0.9.0).
- 226 new tests added across delivery, verification, dispatch, and health subsystems.
- Flaky git-backed tests stabilized under parallel execution.
- Delivery retry, auto-merge, and completion gate paths covered.

## 0.9.0 — 2026-04-05

Clean-room re-implementation engine, narration quality gates, dispatch
resilience improvements, and regression fixes. 39 commits since v0.8.0,
2,854 tests passing.

### Clean-Room Engine

- **Clean-room spec generation and sync** — structured pipeline for
  generating specifications from decompiled source, syncing artifacts
  between analysis and implementation phases. Supports skoolkit
  decompiler flow for ZX Spectrum binary analysis.
- **Cleanroom init template scaffold** — `batty init --from cleanroom`
  bootstraps a clean-room project with barrier groups, pipeline roles,
  and ZX Spectrum snapshot fixtures.
- **Information barrier enforcement** (#392) — worktree-level access
  control prevents implementation roles from reading original source.
  `validate_member_barrier_path()` gates file reads by role barrier
  group.
- **Context exhaustion handoff + parity tracking** (#386, #393) —
  agents hitting context limits hand off work state to fresh sessions.
  Parity tracking system compares clean-room output against original
  binary behavior.
- **Equivalence parity harness** — backend abstraction for comparing
  original and re-implemented binaries, with refinement passes for
  convergence.

### Dispatch & Board

- **Lightweight board replenishment** — daemon detects empty boards
  and creates placeholder tasks to keep engineers productive, without
  requiring architect intervention.
- **Reconcile daemon state with board ownership** — daemon startup
  reconciles its in-memory assignment state against board `claimed_by`
  fields, fixing desync after restarts.
- **Always rebuild dispatch task branches** — dispatch now force-creates
  fresh branches for each task assignment instead of reusing stale ones.

### Quality Gates

- **Narration-only completion rejection** — agents that produce only
  prose narration (no code changes, no commands) have their completions
  rejected. Includes docs-only and non-code-only variants to catch
  agents that describe work instead of doing it.

### Fixes

- **Fix Codex shim prompt stdin launch** (c6cd19f) — Codex stdin
  launch regression where the shim failed to pipe the initial prompt
  to stdin, leaving the agent idle on startup.
- **Fix stray merge marker in daemon tests** (6375aae) — removed an
  unresolved merge conflict marker in the daemon test module.
- **Restore dynamic version strings and kanban wrapper arg order**
  (316cfd0) — `batty --version` was printing a stale string and
  `kanban-md` wrapper calls had swapped argument positions.
- **Preserve manual task assignments during reconcile** — board
  reconciliation no longer clobbers manually assigned tasks when
  syncing daemon state.
- **Guard BATTY_MEMBER in messaging tests** — tests that inspect
  sender identity now set the expected env var dynamically, fixing
  failures when run inside a batty tmux session.
- **Share Cargo target across worktrees** — engineer worktrees now
  share the top-level `target/` directory, eliminating redundant
  rebuilds.

### Tests

- **Auto-dispatch regression test** (#400) — verifies that completion
  frees the engineer slot and dispatch skips already-claimed tasks.
- **Cleanroom pipeline verification** — end-to-end test for the
  barrier enforcement, artifact handoff, and parity tracking pipeline.
- **Work preservation helper coverage** — unit tests for the shim
  work preservation mechanism used during agent restarts.
- 2,854 unit tests passing (up from 2,722 in v0.8.0).

## 0.8.0 — 2026-04-05

Agent health and dispatch reliability improvements, discovered during a
24-hour marketing team run where one agent was silently dead for 22 hours.

### Fixes (added post-release)

- **Fix manual assignment race with auto-dispatch** — when a manager
  manually assigns a task, `claimed_by` is now set on the board BEFORE
  launching the assignment. Previously, the manual path only transitioned
  the task to in-progress without setting `claimed_by`, leaving a race
  window where auto-dispatch would grab the unclaimed task and assign it
  to a different engineer.

### Fixes

- **Fix `preserve_working` state desync after daemon restart** — when a
  shim sends `Event::Ready` after respawn, only the shim handle's own
  state is used to decide whether to preserve Working. Previously the
  persisted daemon state (`self.states`) was also checked, causing freshly
  spawned agents to get permanently stuck as Working after a daemon restart.
  This was the root cause of priya-writer-1-1 being dead for 22+ hours.
- **Dispatch queue prunes stale entries regardless of engineer state** —
  `process_dispatch_queue()` now checks task validity (done/claimed/missing)
  before checking if the engineer is idle. Previously, entries for non-idle
  engineers were retained forever even when the underlying task was already
  completed by another engineer.
- **Zero-output agent detection and auto-restart** — agents with 0 output
  bytes after 10 minutes of uptime are now detected and cold-respawned.
  The health system previously had context *pressure* detection (too much
  output) and stall detection (no output *change*), but nothing to catch
  agents that never produced any output at all.

## 0.7.3 — 2026-04-04

Patch release to fix the failed v0.7.2 release workflow (crate already
published when tag was force-updated, causing a duplicate publish attempt).
No code changes — identical to v0.7.2.

## 0.7.2 — 2026-04-02

SDK communication modes for all three agent backends, replacing PTY
screen-scraping as the primary agent I/O mechanism. Each backend now
communicates via its native structured protocol when `use_sdk_mode: true`
(the default).

### Features

- **Claude Code SDK mode** — stream-json NDJSON protocol on stdin/stdout
  (`claude -p --input-format=stream-json --output-format=stream-json`).
  Persistent subprocess with auto-approval of tool use, structured
  completion detection, and context exhaustion handling.
- **Codex CLI SDK mode** — JSONL spawn-per-message model (`codex exec
  --json`). Each message spawns a new subprocess; multi-turn context
  preserved via thread ID resume.
- **Kiro CLI ACP SDK mode** — Agent Client Protocol (ACP) JSON-RPC 2.0
  on stdin/stdout (`kiro-cli acp --trust-all-tools`). Initialization
  handshake (`initialize` + `session/new`), streaming via
  `session/update` notifications, permission auto-approval via
  `session/request_permission`, and session resume via `session/load`.
- **`use_sdk_mode: true` default** — all three backends default to
  structured JSON protocols. PTY screen-scraping remains as fallback.
- **`batty chat --sdk-mode`** — test SDK mode interactively for any
  agent type.

### Stability

- Context pressure tracking with proactive warnings
- Narration loop detection for agents stuck in output cycles
- Stale Codex resume degrades to cold respawn instead of hanging
- Crash auto-respawn defaults to on for unattended teams
- Tact planning engine with harness tests
- Comprehensive stall prevention and system stabilization
- Dynamic scaling via `batty scale` commands
- Daemon config hot-reload

### Fixes

- Dispatch queue retry loop and shim warning noise
- Poll_shim now uses `agent_supports_sdk_mode()` instead of hardcoded
  claude-only checks for SDK mode dispatch
- Clippy warnings resolved for CI compliance (Rust 1.94)

### Documentation

- README, architecture, getting-started, and config reference updated
  to document SDK modes as the primary agent communication mechanism
- Config reference now includes team.yaml shim settings table

## 0.7.1 — 2026-03-26

Patch release focused on shim hardening and live-runtime defaults.

- **Kiro shim delivery/completion hardening** — Kiro now uses `kiro-cli`
  consistently, sends input via bracketed paste, and waits for a stable idle
  screen before emitting `Completion`, fixing truncated multi-line responses in
  `batty chat` and live-agent shim tests.
- **Live Kiro validation** — `cargo test --features live-agent live_kiro`
  passes against the real CLI after the shim timing fixes.
- **Team runtime defaults updated** — the Batty project team config now uses
  `codex` for architect, manager, and engineer roles with `use_shim: true`,
  aligning the live system with the shim-first runtime migration.

## 0.7.0 — 2026-03-24

Architecture release replacing tmux-direct agent management with a
process-per-agent shim runtime. Every agent now runs inside its own PTY-owning
subprocess (`batty shim`), communicates over a typed socketpair protocol, and
uses a vt100 virtual screen for sub-second state classification. Tmux becomes a
display-only surface. 33 shim-related commits, 5,120 lines of new shim code,
2,421 tests passing.

### Agent Shim Architecture

- **`batty shim` subcommand** — standalone agent container process that owns a
  PTY, runs a vt100 virtual terminal, and communicates with the daemon over a
  Unix socketpair using newline-delimited JSON.
- **Typed socketpair protocol** — 7 Commands (`SendMessage`, `CaptureScreen`,
  `GetState`, `Resize`, `Shutdown`, `Kill`, `Ping`) and 10 Events (`Ready`,
  `StateChanged`, `Completion`, `Died`, `ContextExhausted`, `ScreenCapture`,
  `State`, `Pong`, `Warning`, `Error`). Fully serializable with serde.
- **Screen classifiers** — per-backend state classification from vt100 screen
  content: `classify_claude`, `classify_codex`, `classify_kiro`,
  `classify_generic`. Detects Idle, Working, Prompting, and Error states
  without polling tmux.
- **PTY log writer** — agent PTY output forwarded to log files and piped into
  tmux panes via `tail -f`, making tmux a read-only display layer.
- **`AgentHandle` abstraction** — daemon manages agents through handles backed
  by socketpair file descriptors, replacing direct tmux pane manipulation.

### JSONL Session Tracking

- **Claude tracker** — parses Claude's `~/.claude/projects/` session JSONL for
  conversation turns, token usage, and tool calls. Merge priority: screen
  classification wins over tracker data.
- **Codex tracker** — parses Codex session output for task progress. Merge
  priority: tracker data wins over screen classification.
- **Tracker-classifier fusion** — combined signal improves state accuracy,
  especially for detecting context exhaustion and stalled agents.

### Chat Frontend

- **`batty chat` command** — interactive shim frontend for manual agent
  interaction. Connects to a running shim's socketpair and renders state
  changes, completions, and screen captures in the terminal.

### Agent Lifecycle

- **Crash recovery** — shim detects agent process death, emits `Died` event
  with exit code and last terminal lines, enabling daemon-side restart.
- **Context exhaustion detection** — classifiers recognize context-limit
  signals per backend; shim emits `ContextExhausted` event for automatic
  session rotation.
- **Graceful shutdown** — `Shutdown` command with configurable timeout allows
  agents to finish current work before termination. `Kill` command for
  immediate termination.
- **Ping/Pong health monitoring** — daemon sends periodic `Ping` commands;
  shim responds with `Pong`. Missed pongs trigger stall warnings and
  eventual restart.

### Message Queuing

- **Shim-side message queue** — messages arriving while the agent is in
  Working state are buffered (depth 16, FIFO). Queue drains automatically
  when the agent transitions to Idle. Oldest messages dropped when queue is
  full, with tracing warnings.

### Daemon Integration

- **Shim-based agent spawning** — daemon launches agents as `batty shim`
  subprocesses connected via socketpair, replacing tmux `send-keys` injection.
- **Event-driven polling** — daemon reads shim events from socketpair file
  descriptors instead of polling tmux pane content on 5-second cycles.
- **`use_shim` config flag** — opt-in migration path in team.yaml; legacy
  tmux-direct path removed after full migration.

### Doctor

- **Shim health checks** — `batty doctor` validates shim process liveness,
  socketpair connectivity, and PTY state for all running agents.

### Legacy Removal

- **Removed tmux-direct agent management** — `inject_message`,
  `inject_standup`, `poll_watchers`, `restart_dead_members`, and
  `reset_context_keys` deleted from daemon and delivery modules.
- **Removed `AgentAdapter` tmux methods** — `reset_context_keys` removed
  from the backend trait interface.
- **Net code reduction** — legacy tmux agent management code removed,
  offset by new shim modules.

### Performance

- **Sub-second state detection** — vt100 screen classification runs on every
  PTY write, replacing the previous 5-second tmux capture-pane polling cycle.
- **Debounce tuning** — classifier debounce prevents spurious state
  transitions during rapid terminal output. Benchmarks added for
  classification throughput.

### Testing

- **E2E shim validation suite** — integration tests exercising the full
  shim lifecycle: spawn, classify, deliver, complete, shutdown.
- **Shim delivery routing tests** — verify message delivery through
  socketpair protocol end-to-end.
- **Performance benchmarks** — classification throughput benchmarks in
  `src/shim/bench.rs`.
- **2,421 unit tests passing** — up from 2,381 in v0.6.0.

### Documentation

- **CLI reference updated** — `batty shim` and `batty chat` subcommands
  documented.
- **Config reference updated** — shim lifecycle config fields
  (`use_shim`, `shim_ping_interval_secs`, `shim_stall_threshold_secs`)
  documented.
- **Getting-started guide refreshed** — updated for shim-based workflow.
- **Agent shim spec and v0.7.0 roadmap** — design spec and POC published
  in `planning/`.

---

## 0.6.0 — 2026-03-23

Major release adding Grafana monitoring, agent backend abstraction,
SQLite telemetry migration, and a large-scale codebase decomposition.
38 commits since v0.5.2.

### Features

- **Grafana monitoring integration** (#306) — new `batty grafana` CLI with
  `setup`, `status`, and `open` subcommands. Bundled dashboard template with
  21 panels and 6 alerts covering task throughput, agent health, cycle time,
  and failure rates. Auto-registers datasource on `batty start`/`stop`.
  Configurable via `GrafanaConfig` in team.yaml.
- **Agent backend abstraction** — `AgentAdapter` trait enables mixed-backend
  teams (Claude, Codex, Kiro). Per-role and per-instance `agent` config in
  team.yaml. `BackendRegistry` discovers and validates available backends.
  `BackendHealth` enum tracks per-backend liveness.
- **Backend health checks in validate** (#325) — `batty validate` now probes
  each configured backend for reachability and reports health status.
- **`batty init --agent`** (#303) — set the default agent backend when
  scaffolding a new project. Also available via the `install` alias.
- **Shell completion coverage** (#330) — verified and tested completions for
  all current commands across bash, zsh, and fish shells.

### Telemetry

- **SQLite telemetry migration** (#316) — `batty retro` and `batty status`
  now query `telemetry.db` first with automatic JSONL fallback. Review
  metrics (#315) also migrated to SQLite.

### Codebase Health

- **Module decomposition** — 8 large modules split into focused submodules:
  `health.rs` (6 submodules), `daemon.rs` (6 submodules), `config.rs`,
  `delivery.rs`, `doctor.rs` (4 submodules), `watcher.rs`, `merge.rs`
  (4 submodules), and `team/mod.rs` (extracted `init.rs`, `load.rs`,
  `messaging.rs`, `lifecycle.rs`).
- **Error resilience sentinel tests** (#308, #311) — dedicated tests
  confirming `daemon.rs` and `task_loop.rs` handle error paths without panics.
- **Dead code audit** (#309) — removed 28 stale `#[allow(dead_code)]`
  annotations.
- **MockBackend for testing** (#325) — `MockBackend` implements
  `AgentAdapter`, enabling 18 trait contract tests without real backend
  dependencies.

### Documentation

- **Grafana getting-started walkthrough** (#328) — step-by-step guide for
  setting up monitoring with Grafana and the bundled dashboard.
- **Agent Backend Abstraction docs** — architecture.md updated with backend
  trait design, registry, and mixed-team configuration.
- **README and getting-started refresh** — updated for v0.5.x and v0.6.0
  features, CLI reference regenerated.

---

## 0.5.2 — 2026-03-23

Patch release adding crates.io publishing and Enter key delivery fix.

### Reliability

- **Enter key reliability** (#302) — paste verification + retry in `inject_message()`. Messages now reliably submit after injection instead of sitting idle in the pane.

### Infrastructure

- **crates.io publishing** — `cargo install batty-cli` now installs the latest release from crates.io. Release workflow publishes automatically on tag push.

---

## 0.5.1 — 2026-03-22

Patch release with developer experience improvements and delivery reliability fix.

### Features

- **Daemon auto-archive** (#298) — done tasks older than `archive_after_secs` (default: 3600) are automatically moved to archive by the daemon.
- **Checkpoint wiring for restart** (#299) — agent restart resume prompts now include `.batty/progress/<role>.md` checkpoint content.
- **Inbox purge** (#300) — `batty inbox purge <role>` deletes delivered messages. Supports `--older-than` for selective cleanup.
- **Telemetry dashboard** (#301) — `batty metrics` shows tasks completed, avg cycle time, failure rate, merge rate from the telemetry DB.

### Reliability

- **Delivery marker scrolloff fix** (#296) — infer successful delivery from agent state transition when the marker scrolls past the capture window. Eliminates ~80% false-positive delivery failures.
- **Starvation detection false positive fix** (#286) — suppress alerts when all engineers have active board tasks.
- **Config validation improvements** (#291) — better error messages for common team.yaml mistakes.

### Maintenance

- **Makefile targets** (#294) — `make test`, `make coverage`, `make release` match CI behavior.
- **Markdown lint compliance** (#293) — all docs pass markdownlint.
- **CI skip list stabilization** — skip timing-sensitive and environment-dependent tests in CI.

---

## 0.5.0 — 2026-03-22

Feature release adding board archival, delivery reliability, worktree
intelligence, telemetry completeness, and session summary. 13 commits
since v0.4.1.

### Features

- **Board archive command** (#277) — `batty board archive` moves completed
  tasks older than a configurable threshold (`--older-than 7d`) out of the
  active board. Supports `--dry-run` for safe previewing.
- **Delivery readiness gate** (#276) — messages sent to agents still starting
  up are buffered in a pending queue instead of being dropped. Messages drain
  automatically once the agent reaches Ready state.
- **Cherry-pick worktree reconciliation** (#278) — detects when all commits on
  a task branch have been cherry-picked onto main and auto-resets the worktree,
  preventing stale-branch accumulation.
- **Agent metrics telemetry wiring** (#275) — `delivery_failed` and
  `context_exhausted` events now correctly increment failure and restart
  counters in the `agent_metrics` SQLite table.
- **Session summary on stop** — `batty stop` now prints run statistics
  (duration, tasks completed, messages routed) when ending a session.

### Reliability

- **Error handling tests** (#279) — additional tests for `error_handling.rs`
  covering telemetry split edge cases.
- **Clippy cleanup** (#282) — zero warnings on `cargo clippy --all-targets`.

### Documentation

- **Intervention system docs** (#283) — complete documentation of the
  intervention subsystem (health checks, nudges, escalation, auto-restart).
- **README and getting-started refresh** — updated for post-v0.4.1 features.

### Maintenance

- **Dependency updates** (#273) — toml 0.8→1.0, cron 0.13→0.15,
  rusqlite 0.32→0.39.
- **Property-based tests** (#270) — 16 proptest-driven config parsing tests
  for fuzz-level confidence in YAML deserialization.
- **Board archive integration tests** — helpers for testing archive workflows
  end-to-end.

## 0.4.1 — 2026-03-22

Stability patch focused on test coverage expansion and reliability. 664 new
tests added across 4 waves, bringing the suite from ~1,285 to 1,949 tests.
Zero new features — pure quality investment.

### Test Infrastructure

- **Unit/integration test split** (#251) — tests categorized with a Cargo
  feature gate (`--features integration`). Unit tests run without tmux; 56
  integration tests require a running tmux server and are auto-skipped in CI.
- **Flaky test stabilization** (#250) — timing-dependent tmux tests converted
  to retry/poll patterns, eliminating intermittent CI failures.

### Coverage Expansion — Wave 1

- **daemon/automation.rs + cost.rs** (#254) — 78 new tests covering automation
  rules and cost calculation edge cases.
- **daemon/health.rs** (#256) — 24 tests covering health check scheduling and
  state transitions.

### Coverage Expansion — Wave 2

- **board_cmd, resolver, workflow, nudge** (#260) — 59 tests across 4 board
  and workflow modules.
- **daemon interventions** (#253) — 72 tests covering all 6 intervention
  subsystem submodules.
- **delivery.rs** (#258) — 43 tests for message delivery, circuit breaker, and
  Telegram retry logic.
- **standup.rs + retrospective.rs** (#259) — 57 tests for periodic summary
  generation and retrospective reports.
- **layout.rs + telegram_bridge.rs** (#255) — 35 tests for tmux layout
  building and Telegram bridge communication.
- **Cross-module behavioral verification** (#257) — 28 tests validating
  interactions across module boundaries.

### Coverage Expansion — Wave 3

- **tmux.rs** (#262) — 42 tests for core tmux runtime infrastructure (pane
  ops, session management, output capture).
- **task_loop.rs + message.rs** (#263) — 36 tests for the autonomous dispatch
  loop and message routing types.
- **capability.rs + policy.rs** (#261) — 33 tests for topology-independent
  capabilities and config-driven workflow policies.

### Coverage Expansion — Wave 4

- **Config validation edge cases** (#264) — 43 tests for YAML config parsing
  boundaries, invalid inputs, and default handling.
- **Error path and recovery** (#265) — 76 tests exercising error propagation,
  fallback behavior, and graceful degradation paths.
- **CLI argument parsing** (#266) — 38 tests verifying all subcommands parse
  correctly with valid and invalid argument combinations.

## 0.4.0 — 2026-03-22

Major release introducing agent backend abstraction, backend health monitoring,
session resilience features, telemetry infrastructure, and significant internal
decomposition. 39 commits across 20+ tasks since v0.3.2.

### Agent Backend Abstraction

- **AgentAdapter trait** (#230) — unified `launch()`, `session()`, and `resume()`
  behind a single trait, replacing scattered per-backend dispatch logic.
- **Mixed-backend teams** (#231) — team-level `agent_default` config allows
  heterogeneous teams where individual roles can override the team default backend.
- **Backend health monitoring** (#232) — `BackendHealth` enum and `health_check()`
  trait method detect backend failures; health status surfaces in `batty status`,
  daemon polling, and periodic standups.

### Session Resilience

- **Agent stall detection and auto-restart** (#235) — watcher detects
  context-exhausted and stalled agents, triggers automatic restart with backoff.
- **Agent readiness gate** (#233) — prevents message injection into panes that
  haven't finished initializing, eliminating dropped-message failures on startup.
- **Progress checkpoint** (#239) — writes a context file before stall/context
  restart so the restarted agent can resume with prior task context.
- **Daemon restart budget** (#214) — caps total daemon restarts with a rolling
  window, adds exponential backoff, and recovers from pane death gracefully.
- **Commit-before-reset** (#216) — replaces stash-based worktree cleanup with
  auto-commit so engineer work is never silently lost during resets.

### Telemetry

- **SQLite telemetry database** (#220) — persistent storage for agent, task, and
  event metrics with dual-write from the daemon event emitter.
- **`batty telemetry` CLI** — `summary`, `agents`, `tasks`, `events`, and
  `reviews` subcommands surface pipeline metrics from the telemetry DB.
- **DB counter wiring** (#238) — six missing telemetry counters connected to the
  database layer.

### Review Automation

- **Per-priority review timeout overrides** (#218) — configurable timeout
  thresholds per priority level, with YAML parsing and daemon enforcement.
- **Merge confidence scoring** (#221) — risk-based auto-merge gating evaluates
  diff size, module count, sensitive files, and unsafe blocks.
- **Review metrics in retrospectives** (#224) — review stall duration and per-task
  rework counts included in generated retrospective reports.

### Board Tooling

- **Dependency graph** (#236) — `batty board deps` command visualizes task
  dependency relationships.

### Module Decomposition

- **dispatch.rs decomposition** (#234) — split monolithic dispatch module into
  focused submodules under `src/team/dispatch/`.
- **daemon.rs decomposition** (#237) — extracted subsystems from the daemon
  polling loop for maintainability.

### Error Resilience

- **Unwrap cleanup** (#225) — replaced panicking `unwrap()`/`expect()` calls in
  daemon.rs and task_loop.rs with proper `Result` propagation.
- **Dead code audit** (#229) — removed unused code, achieving zero clippy
  warnings across the codebase.

### Workflow Improvements

- **Assignment dedup window** (#213) — prevents duplicate task dispatches within
  a configurable time window.
- **Completion event tracking** (#215) — `task_id` added to `task_completed`
  events and `reason` field added to `task_escalated` events for traceability.

### Documentation

- **README and docs refresh** (#228) — updated README, getting-started guide, CLI
  reference, and config reference for all post-v0.3.0 features.

## 0.3.2 — 2026-03-22

Scheduled tasks, cron recycling, nudge CLI, and intervention module decomposition.

### Scheduled Tasks

- **Task scheduling fields** — `scheduled_for`, `cron_schedule`, and `cron_last_run`
  fields on the Task model enable time-gated and recurring task support.
- **`Task::is_schedule_blocked()` helper** — centralizes future-dated schedule
  check logic, replacing scattered date-parsing code.
- **Schedule-aware resolver and dispatch** — resolver skips tasks with a
  `scheduled_for` in the future; dispatch filtering respects schedule gates.
- **Cron recycler** — daemon poll loop auto-recycles done cron tasks, resetting
  status to todo when the next cron window arrives.
- **`batty task schedule` CLI** — manage task schedules with `--at`, `--cron`,
  and `--clear` flags.

### Nudge CLI

- **`batty nudge` subcommand** — enable, disable, and query status of individual
  intervention types (triage, dispatch, review, utilization, replenish, owned-task).

### Internal Improvements

- **Interventions decomposition** — `interventions.rs` split into 9 focused
  submodules (triage, dispatch, review, utilization, replenishment, owned_tasks,
  telemetry, board_replenishment, mod).
- **Worktree prep guard** — validates engineer worktree health before assignment,
  preventing stale-worktree failures.
- **`utilization_recovery_interval_secs` config** — separate cooldown for
  utilization interventions, independent of general intervention cooldown.

### Documentation

- **README and docs refresh** — scheduled tasks guide, nudge CLI usage, and
  getting-started updates for all v0.3.2 features.

## 0.3.1 — 2026-03-22

Dogfooding-driven fixes, review automation, error resilience, and documentation
refresh. 19 tasks across 4 phases, shipped in a single session.

### Review Automation

- **Auto-merge policy engine** — configurable confidence scoring evaluates diffs
  by size, module count, sensitive file presence, and unsafe blocks. Low-risk
  completions merge without manual review when policy is enabled.
- **Auto-merge daemon integration** — wired into the completion path with
  per-task override support (`batty task auto-merge <id> enable|disable`).
- **Review timeout escalation** — tasks in review beyond a configurable threshold
  trigger nudges to the reviewer, then escalate to architect. Dedup prevents spam.
- **Structured review feedback** — `batty review <id> <disposition> --feedback`
  stores exact rework instructions in task frontmatter and delivers to engineer.
- **Review observability** — queue depth, average latency, auto-merge rate,
  rework rate, nudge/escalation counts surfaced in `batty status`, standups, and
  retrospectives.

### Dogfooding Fixes

- **Active-task reconciliation** — daemon clears stale `active_tasks` entries for
  done/archived/missing tasks, preventing engineers from appearing stuck.
- **Completion rejection recovery** — no-commits rejection now clears the
  assignment and marks engineer idle instead of leaving them in limbo.
- **Pane cwd correction** — retry loop with symlink-safe normalization fixes
  resume-time cwd failures on macOS.
- **Non-git-repo support** — `is_git_repo` detection gates all git operations;
  non-code projects no longer emit spurious warnings.
- **Skip worktree when disabled** — `use_worktrees: false` is respected at every
  call site, eliminating 42+ warnings per session in non-code projects.
- **External message sources** — `external_senders` config allows non-role
  senders (e.g. email-router, slack-bridge) to message any role.
- **Test session cleanup** — RAII `TestSession` guard ensures tmux cleanup on
  panic; `batty doctor --fix` kills orphaned `batty-test-*` sessions.
- **Trivial retrospective suppression** — short runs with zero completions skip
  retro generation (configurable `retro_min_duration_secs`).
- **Post-merge worktree reset** — force-clean uncommitted changes and verify HEAD
  after reset; handles dirty worktrees and detached HEAD.

### Error Resilience

- **Poll loop isolation** — subsystems categorized as critical (delivery,
  dispatch) or recoverable (standup, telegram, retro). Recoverable failures log
  and skip; 3+ consecutive failures escalate. Panic-safe `catch_unwind` wraps
  telegram, standup, and retrospective subsystems.
- **Unwrap/expect sentinel tests** — production code in mod.rs, events.rs,
  watcher.rs, inbox.rs, and merge.rs verified free of unwrap/expect calls.

### Documentation & Hygiene

- **Intervention system docs** — comprehensive documentation of all intervention
  types with triggers, state machines, cooldown behavior, and config tables.
- **Docs refresh** — README, getting-started, CLI reference, and config reference
  updated for all post-v0.3.0 features.

## 0.2.0 — 2026-03-18

This release expands Batty's runtime controls and makes long-running team
sessions easier to observe, pause, resume, and recover without losing routing
state.

### Highlights

- **Operational control commands** — add `batty pause` / `batty resume` to
  suppress nudges and standups during manual intervention, plus `batty load` to
  report historical worker utilization from recorded team events.
- **Richer runtime visibility** — `batty status` now reports live worker
  states, and the daemon emits heartbeat, shutdown, loop-step, and panic
  diagnostics for post-run inspection.
- **More reliable message delivery** — after tmux injection, Batty now verifies
  that the target pane actually left the prompt and retries Enter when terminal
  timing drops the keypress.
- **Safer resume behavior** — daemon state now persists across heartbeats so
  restored sessions can recover activity, and Claude watchers can rebind cleanly
  after manual resumes.

### Reliability

- Improve assignment delivery, engineer branch handling, idle detection, and
  completion event restoration across the team runtime.
- Harden daemon error handling and simplify runtime state tracking so nudges,
  watchers, and inbox delivery stay consistent through failures and resumes.
- Fix Claude-specific watcher edge cases, including explicit session binding,
  truncated interrupt footers, resumed watcher visibility, and pause timer
  behavior.
- Resolve unique role aliases to concrete member instances and fix agent
  wrappers to use the installed `batty` binary instead of debug test binaries.
- Add an `auto_dispatch` team configuration toggle so dispatch polling can be
  disabled when a board should be driven manually.

### Documentation

- Tighten onboarding guidance in the README and getting started docs, refresh
  generated CLI/config references, and publish the demo video page with YouTube
  links.

## 0.1.5 — 2026-03-11

Follow-up release to finish the `0.1.4` stabilization work and restore a fully
green delivery pipeline.

### Fixes

- **Patch coverage on inline Rust tests** — update the CI coverage job to run
  `cargo tarpaulin --include-tests` so Codecov measures `#[cfg(test)]` modules
  inside `src/` correctly, including the Ubuntu layout regression test added in
  `0.1.4`.
- **Cross-platform layout test stability** — keep the Linux-compatible tmux
  layout assertion that tolerates the small pane-height rounding difference seen
  on Ubuntu runners once borders and status lines are enabled.

## 0.1.4 — 2026-03-11

Patch release to finish the CI stabilization work from `0.1.3`.

### Fixes

- **Linux tmux compatibility** — switch percentage-based pane splits to the
  portable `split-window -l <pct>%` form so layout tests pass on Ubuntu tmux as
  well as macOS.
- **Green cross-platform CI** — fixes the last failing `cargo test` path in the
  Ubuntu GitHub Actions job without weakening the test matrix.

## 0.1.3 — 2026-03-11

This release stabilizes the team-based Batty runtime and restores a clean
release pipeline. It folds in the hierarchical team architecture work that
landed after `v0.1.2`, plus the CI/CD fixes needed to ship it reliably.

### Highlights

- **Team-based runtime** — Batty now runs hierarchical architect, manager, and
  engineer teams instead of the earlier phase-oriented model.
- **Autonomous dispatch loop** — idle engineers can pick work from the shared
  board automatically, with active-task tracking, retry counting, and
  completion/escalation rollups in the daemon.
- **Human channel support** — Telegram-backed user roles, inbound polling, long
  message splitting, and session resume support are now built into team
  communication.
- **Manager-aware layout** — engineer panes are grouped by manager, routing
  honors compatible `talks_to` targets, and Codex roles get per-member context
  overlays for cleaner startup state.

### Reliability

- Refresh engineer worktrees before assignment and reset them after merge.
- Gate engineer completion on worktree test runs before reporting success.
- Serialize merges behind a rebase-aware merge queue to reduce conflicting
  branch integration.
- Fix Codex watcher handling so stable prompts return to idle and historical
  completions do not leak into new sessions.
- Preserve assignment sender identity for routing checks and fix manager status
  updates during completion handoff.
- Correct tmux pane stacking for vertical splits and improve manager subgroup
  layout behavior.

### Documentation

- Rewrite the README for 60-second onboarding and refresh the session demo.
- Rewrite the getting started guide and regenerate the CLI/config references.
- Refresh architecture and troubleshooting docs for the team-based model.

### CI/CD

- Keep Rust CI strict under `-Dwarnings` by resolving current Clippy findings
  and explicitly marking staged/test-only code paths that are not yet wired
  into the main binary.
- Scope docs lint/format checks to the published MkDocs surface instead of
  archival notes under `docs/new_beginnings/`.
- Regenerate and commit reference docs so the docs workflow remains reproducible.

## 0.1.0 — 2026-02-24

First public release.

### Features

- **Core agent runner** — spawn coding agents (Claude Code, Codex) in supervised tmux sessions
- **Two-tier prompt handling** — Tier 1 regex auto-answers for routine prompts, Tier 2 supervisor agent for unknowns
- **Policy engine** — observe, suggest, act modes controlling how Batty responds to agent prompts
- **Kanban-driven workflow** — reads kanban-md boards, claims tasks, tracks progress through statuses
- **Worktree isolation** — each phase run gets its own git worktree for clean parallel work
- **Test gates** — Definition-of-Done commands must pass before a phase is considered complete
- **Pause/resume** — detach and reattach to running sessions without losing state
- **Parallel execution** — `--parallel N` launches multiple agents with DAG-aware task scheduling
- **Merge queue** — serialized merge with rebase, test gates, and conflict escalation
- **Shell completions** — `batty completions <bash|zsh|fish>`
- **Tmux status bar** — live task progress, agent state, and phase status in the tmux status line

### Bug Fixes

- Fixed CLAUDECODE env var leaking into tmux sessions (blocked nested Claude launches)
- Fixed invalid `--prompt` flag in Claude adapter (now uses positional argument)
- Fixed `batty install` not scaffolding `.batty/config.toml`
- Fixed stale "phase 4 planned" error message in `batty work all --parallel`
- Fixed conflicting claim identities in parallel mode
- Fixed completion contract defaulting to `cargo test` when no DoD configured

### Documentation

- Getting started guide with milestone tag requirement
- Troubleshooting guide with common failure scenarios
- CLI reference (auto-generated)
- Configuration reference
- Architecture overview
- Module documentation