hh-cli 0.1.2 - Docs.rs

# Parallel Sub-Agents Implementation Plan

## Goals

1. Allow the LLM to spawn sub-agents using registered agents where `mode = subagent`.
2. Maintain and render state for all running/completed sub-agents (tree model, sidebar UI).
3. Run sub-agents in parallel in the background.
4. Reuse main agent runtime code for sub-agents (including skill loading).

## Scope and Non-Goals

### In scope

- Runtime orchestration for parallel sub-agents.
- `task` tool for spawning/resuming sub-agents (polling is out of scope for v1).
- Persistent sub-agent lifecycle events.
- TUI state and navigation for parent/sub-agent threads.
- Refactor to share runtime builder between main and sub-agents.

### Out of scope (first release)

- Multi-workspace distributed scheduling.
- Full cancellation/resume of in-flight subprocess handles across process restarts.
- Conflict-free concurrent write merging.

## Current Repository Baseline

- Agent modes already exist: `Primary` and `Subagent` in `src/agent/config.rs`.
- Registry filtering for primary agents exists in `src/agent/registry.rs`.
- Session events already include:
  - `SubAgentStart`
  - `SubAgentProgress`
  - `SubAgentResult`
  in `src/session/types.rs`.
- Core domain types already include `SubAgentCall` and `SubAgentResult` in `src/core/types.rs`.
- `sub_agent_max_depth` already exists in `src/config/settings.rs`.
- Missing pieces:
  - No `task` tool implementation in `src/tool/`.
  - No runtime manager/state store for live sub-agents.
  - No sidebar tree rendering/navigation for sub-agent sessions.
  - No parent/sub-agent orchestration glue in runtime loop.

## Design Principles

- Keep canonical LLM semantics in core domain types.
- Keep provider details out of core state and events.
- Make orchestration additive and reversible.
- Prefer explicit state transitions over implicit behavior.
- Keep sub-agent outputs summarized in parent thread to reduce context pollution.

## V1 Normative Decisions (Lock Before Coding)

To avoid ambiguity during implementation, treat the following as the first-release contract:

1. `task` tool supports **start/resume only** in v1.
   - Start: no `task_id` provided, new child thread is created.
   - Resume: `task_id` provided, existing child thread is continued.
   - Poll/status calls are not part of v1 tool input surface; status is obtained from emitted events and manager/TUI state.
2. Task IDs are generated as UUIDv7 strings and are globally unique.
3. Lifecycle ordering is strict and deterministic:
   - exactly one `SubAgentStart`
   - zero or more `SubAgentProgress`
   - exactly one terminal `SubAgentResult`
   - v1 queue semantics: `SubAgentStart` is emitted when a task is accepted into manager state (queued or running), not only when execution begins.
4. Restart reconciliation is deterministic:
   - any persisted non-terminal child (`Pending`/`Running`) without terminal event is reconstructed as `Failed` with reason `interrupted_by_restart`.
5. Parent context receives child summary only (bounded), while full child transcript/output remains in child session.
6. All new persisted fields are additive with serde defaults for backward compatibility.
7. Child sessions are hidden from the top-level resume-session list.
   - Only parent/root sessions appear in global resume discovery.
   - Child sessions are discoverable only through their parent sub-agent tree/navigation.
8. Resume safety scope:
   - resume by `task_id` is only valid within the current parent session context.
   - implementation resolves child by `(parent_session_id, task_id)` to prevent cross-session collisions.
9. `Cancelled` status is reserved in v1 for internal/system transitions only; user-facing cancellation workflow is out of scope.

## High-Level Architecture

### 1) Runtime orchestration service

Add a `SubagentManager` that owns in-memory execution state and concurrency control.

Responsibilities:

- Track sub-agent metadata by `task_id`.
- Track parent/child relationships and expose a tree view.
- Start sub-agent background tasks.
- Serve internal status/result lookups for replay and TUI state.
- Emit lifecycle events to session store and TUI.
- Reconcile interrupted in-flight tasks on startup/replay.

Suggested shape:

- `SubagentManager` (Arc, shared)
- `SubagentState` map keyed by `task_id`
- parent->children adjacency index for fast tree rendering
- optional join-handle map for live tasks
- bounded work queue + semaphore for parallelism caps

### 2) Task tool contract

Add `task` tool in `src/tool/` and register in `src/tool/registry.rs`.

Initial API:

- Inputs:
  - `description: string`
  - `prompt: string`
  - `subagent_type: string`
  - `task_id?: string` (resume existing child thread)
- Behavior:
  - Validate `subagent_type` exists and `mode == subagent`.
  - Enforce depth and concurrency limits.
  - Spawn asynchronously and return `task_id` immediately.
  - If `task_id` provided, continue existing child session.
- Output:
  - structured JSON: `{ task_id, status, message }`

V1 semantics:

- This API is start/resume only.
- `status` in the output indicates acceptance of request (`queued` or `running`), not final child completion.
- Final child completion is delivered through session events (`SubAgentResult`) and reconstructed manager state.
- `queued` means accepted into manager queue but not yet executing; `running` means worker execution has started.
- Resume lookup must be scoped to current parent session.
- Parent-ingest contract for child outcomes is bounded and structured: `{ task_id, status, summary, error? }`.

Future extension (optional): `wait` flag and bounded polling timeout.

### 3) Shared agent execution path

Refactor `src/cli/chat.rs` to extract reusable runtime creation/execution helpers.

Target outcome:

- Main agent and sub-agents use same execution builder:
  - provider
  - tool registry
  - approval matcher
  - session store
  - system prompt
- Sub-agents can load skills because they reuse existing `ToolRegistry` (`skill` tool already registered).

### 4) Persistence model

Use existing session event types and extend payloads if needed.

Required persisted fields:

- `task_id`
- `parent_task_id` (optional for nested sub-agent tasks)
- `parent_session_id` (required for root-session scoping and discovery filtering)
- `depth`
- `agent_name`
- child `session_id`
- result status/output

Recommended additional persisted metadata:

- `created_at` / `updated_at`
- `failure_reason` (structured, stable enum + freeform detail)
- optional bounded `result_summary`

This must allow complete reconstruction of tree/state when resuming sessions.

Session discovery rule:

- Mark child sessions with parent linkage metadata (for example `parent_session_id` or equivalent) so listing APIs can exclude them from top-level resume lists.
- Resume list shows only root sessions; child sessions are loaded lazily when opening a root session and reconstructing its sub-agent tree.
- Generic session-list APIs should default to root sessions only; child inclusion requires an explicit opt-in flag (for example `include_children=true`).

Event invariants (must hold in persistence and replay):

- A `task_id` has exactly one start event and exactly one terminal result event.
- Progress events are monotonic by per-task sequence number and only valid between start and terminal.
- Replay is idempotent; duplicate events do not create duplicate nodes or transition terminal nodes.

### 5) TUI model

Sidebar should include a `Subagents` section rendered as a tree.

Per-node display:

- agent display name
- status (`running`, `done`, `error`)
- short prompt/result summary

Interaction:

- Click sub-agent node => main panel shows that sub-agent thread.
- While viewing sub-agent thread, input box is disabled and clearly labeled read-only.
- Parent thread remains selectable.

## Detailed Phases

## Phase 0: Prep and seams

1. Introduce orchestration module location:
   - `src/core/agent/subagent_manager.rs` (or `src/core/subagent/`)
2. Define internal state structs and status enum.
3. Add minimal interfaces for:
   - spawn
   - query
   - list tree
4. Wire dependency injection path from CLI runtime setup.

Deliverable: compile-time scaffolding with no behavior change.

## Phase 1: Task tool and spawn path

1. Implement `TaskTool` in `src/tool/task.rs`.
2. Register schema in tool registry.
3. Add capability mapping and permissions:
   - new capability `task`
   - settings field and matcher handling
4. On execute:
   - validate agent mode
   - validate depth <= `sub_agent_max_depth`
   - spawn background run via `tokio::spawn`
   - emit `SubAgentStart`
   - return `task_id`
5. Add API contract tests for `task` JSON schema and backward compatibility snapshots.

Deliverable: parent can request sub-agent spawn through tool call.

## Phase 2: Parallel execution and lifecycle

1. Implement worker execution function:
   - accepts parent context and child config
   - creates child session store
   - runs shared agent loop
2. Emit lifecycle events:
   - `SubAgentProgress` for meaningful milestones
   - `SubAgentResult` at completion
3. Ensure parent loop is non-blocking while children run.
4. Enforce optional max parallel children.
5. Define and enforce output bounds:
   - max persisted bytes per progress/result payload
   - truncation marker for oversized payloads
   - child-to-parent summary size cap
6. Define progress event rate limiting/coalescing to avoid persistence/TUI spam.

Deliverable: multiple children can run concurrently and report status.

## Phase 3: Session replay and state reconstruction

1. Extend replay logic in `handle_session_selection` (`src/cli/chat.rs`) to ingest sub-agent events.
2. Rebuild tree from persisted events on resume.
3. Preserve compatibility with old sessions lacking sub-agent events.
4. Reconcile interrupted tasks as deterministic failures (`interrupted_by_restart`).
5. Update resume-session query/list logic to return root sessions only (exclude child sessions).

Deliverable: restart/resume restores sub-agent topology and statuses.

## Phase 4: TUI rendering and navigation

1. Extend `ChatApp` state (`src/cli/tui/app.rs`) with:
   - sub-agent tree items
   - selected thread id
   - view mode (parent vs selected child)
2. Render sidebar section in `src/cli/tui/ui.rs`.
3. Add click hit-testing for sub-agent rows in `src/cli/chat.rs` mouse handlers.
4. Display selected thread messages in main panel.
5. Disable input UI and submission while in child view.

Deliverable: UX requirement achieved (click child -> read-only thread view).

## Phase 5: Hardening and guardrails

1. Prevent recursive runaway:
   - depth check
   - max parallel check
2. Ensure deterministic failure behavior:
   - child failure does not crash parent
   - structured error result surfaced
3. Respect permission policy for `task` and child tools.
4. Sanitize child outputs before storing/rendering.
5. Add contention/race protections:
   - never hold manager write locks while awaiting I/O
   - idempotent terminal transition helper
   - safe handling for duplicate completion signals

Deliverable: safe behavior under load and failure conditions.

## Data Model Proposal

## Runtime state

```rust
enum SubagentStatus {
    Pending,
    Running,
    Completed,
    Failed,
    Cancelled,
}

struct SubagentNode {
    task_id: String,
    parent_task_id: Option<String>,
    agent_name: String,
    prompt: String,
    depth: usize,
    session_id: String,
    status: SubagentStatus,
    started_at: u64,
    updated_at: u64,
    summary: Option<String>,
    error: Option<String>,
}
```

## State Transition Rules

Allowed transitions:

- `Pending -> Running | Failed | Cancelled`
- `Running -> Completed | Failed | Cancelled`
- terminal states are immutable (`Completed`, `Failed`, `Cancelled`)

Operational rules:

- Start event writes `Pending`; `Pending` may persist while queued, and transitions to `Running` when worker execution actually begins.
- Exactly one terminal transition is accepted; subsequent terminal updates are ignored and logged.
- Terminal transition always updates `updated_at` and emits exactly one `SubAgentResult`.
- Progress events carry per-task sequence number (`seq`) assigned by manager at emit time.

## Event payloads

If needed, enrich existing `SessionEvent` sub-agent variants with:

- `agent_name`
- `session_id`
- `status`
- `seq` (for progress ordering)
- optional `summary`

Failure reason enum (v1 recommended baseline):

- `tool_error`
- `approval_denied`
- `runtime_error`
- `interrupted_by_restart`
- `unknown`

Do this additively with serde defaults to keep backward compatibility.

## Permissions and Policy

Add a first-class `task` permission capability.

Recommended defaults:

- Main build agent: `task = allow`.
- Main plan agent: `task = allow` (read-focused subagents still useful).
- Subagents: tool permissions inherited from selected subagent config.

Optional advanced policy (later): per-agent task allowlist with glob matching.

Inheritance and nesting rules:

- Parent must be allowed to call `task`.
- Spawned child uses its own configured capabilities (not implicit full inheritance from parent).
- Nested spawn requires both:
  - child agent capability allows `task`
  - depth check passes (`depth < sub_agent_max_depth`)
- Child execution uses the active approval policy snapshot resolved at child start (explicit, reproducible behavior).

## UI Behavior Specification

Sidebar `Subagents` section:

- Hierarchical indentation for depth.
- Status marker:
  - `...` running
  - `ok` completed
  - `err` failed
- Highlight currently selected node.

Main panel behavior:

- Parent selected: normal chat behavior.
- Child selected:
  - render child session transcript
  - disable input editing/submission
  - status line indicates read-only child thread

Tree ordering behavior:

- Siblings are sorted by `started_at`, then `task_id` for deterministic replay/UI stability.

Keyboard extension (optional in first pass):

- cycle parent/child threads via keybinds.

## Testing Plan

## Unit tests

- `task` tool validates unknown/non-subagent mode.
- depth and max-parallel checks.
- manager tree insert/update semantics.
- permission decision for `task` capability.
- state transition idempotency and duplicate terminal events.
- restart reconciliation (`Running` -> `Failed(interrupted_by_restart)`).
- resume lookup is scoped to `(parent_session_id, task_id)`.
- progress sequence ordering is monotonic even under concurrent emits.

## Integration tests

- parent spawns N children in parallel and receives all results.
- one child fails, others succeed, parent remains healthy.
- resumed session reconstructs tree/status correctly.
- parent exits/restarts with in-flight children, replay marks interrupted tasks deterministically.
- high-concurrency completion burst does not deadlock or corrupt tree.
- queued-to-running transition emits expected status semantics.
- progress rate limiting/coalescing prevents unbounded event growth.

## TUI tests

- sidebar renders subagent section and statuses.
- click mapping selects expected child thread.
- input disabled in child view and enabled in parent view.
- resume-session list excludes child sessions while parent thread still reconstructs and displays them.

## Rollout Strategy

1. Ship behind config flag (recommended):
   - `agent.parallel_subagents = false` default initially.
   - `agent.max_parallel_subagents = <small default>` (global process-wide cap)
   - `agent.max_parallel_subagents_per_parent = <small default>` (fairness cap per parent session)
2. Enable in dev builds and gather feedback.
3. Promote to default after stability and UX validation.

Suggested observability gates before default-on:

- spawn success rate
- child failure rate by reason
- queue depth and wait time percentiles
- replay reconciliation count

## Risks and Mitigations

- Context pollution in parent thread:
  - Mitigation: store full child output in child session; pass compact summaries to parent.
- Parallel write conflicts:
  - Mitigation: recommend read-focused subagents first; keep write-heavy child agents opt-in.
- Resource spikes with many children:
  - Mitigation: max parallel cap and bounded queue.
- Inconsistent replay state:
  - Mitigation: event-driven reconstruction only from persisted lifecycle events.

## Milestone Checklist

- [ ] Runtime manager module and interfaces.
- [ ] `task` tool implemented and registered.
- [ ] Shared runtime builder used by main + subagents.
- [ ] Background parallel execution path working.
- [ ] Session events persisted for full lifecycle.
- [ ] Resume reconstructs tree/state.
- [ ] Sidebar tree rendering + click navigation.
- [ ] Child thread read-only view with disabled input.
- [ ] Unit + integration + TUI tests passing.
- [ ] Documentation updates for config and usage.
- [ ] Event ordering and transition invariants documented and enforced.
- [ ] Restart reconciliation behavior implemented and tested.
- [ ] Payload bounding/truncation policy implemented and tested.

## Notes from External Product Patterns

The plan aligns with current Codex/OpenCode multi-agent patterns:

- role-based sub-agent spawning
- parent orchestration with parallel children
- depth/thread guardrails
- thread/session navigation for child workflows
- summary-first parent context strategy

This gives a practical path to feature parity while preserving this repository's existing architecture boundaries.