holon 0.14.1 - Docs.rs

---
title: RFC: ExecCommand Batch
date: 2026-04-26
status: draft
---

# RFC: ExecCommand Batch

## Summary

This RFC defines `ExecCommandBatch`, a model-visible command-family tool for
running multiple `ExecCommand`-like startup requests in one tool call.

V1 is deliberately narrow:

- batch only
- sequential execution
- `ExecCommand` startup semantics only
- no parallelism
- no arbitrary nested tool calls
- no new read-only promise beyond the existing execution boundary

The goal is to reduce model rounds and prompt churn during clustered command
work without changing Holon's command execution safety model.

## Problem

Benchmark runs show that Holon often performs many small shell inspections one
tool call at a time:

- search one symbol
- read one file slice
- inspect status
- search another symbol
- read another slice

This has three costs:

- extra model rounds
- repeated context re-injection
- higher latency from serial model/tool/model loops

The obvious workaround is to concatenate commands through shell separators, but
that weakens structure:

- item boundaries become ad hoc text
- per-command status is harder to inspect
- metrics cannot distinguish model-visible tool calls from actual shell
  commands
- failure handling becomes a shell-script concern instead of a runtime concern

Holon should provide a structured command-family batching primitive.

## Goals

- reduce model-visible tool calls for clustered command work
- preserve per-command boundaries in canonical and model-visible output
- reuse the existing `ExecCommand` execution boundary
- avoid adding a second command safety model
- keep benchmark metrics explicit about model calls versus shell command items

## Non-goals

- do not add parallel execution in V1
- do not implement a general `multi_tool_use` equivalent
- do not allow batching `ApplyPatch`, `Sleep`, `Enqueue`, `SpawnAgent`,
  `Task*`, `UpdateWorkItem`, or any other non-command tool
- do not introduce a new read-only command guarantee
- do not support interactive command continuation inside a batch
- do not start or manage long-running background command tasks from a batch

## Tool Name

The V1 tool name should make the boundary explicit.

Preferred name:

- `ExecCommandBatch`

Acceptable alternatives:

- `BatchExecCommand`
- `ExecBatch`

Names to avoid:

- `BatchInspect`, because V1 does not provide a read-only inspection guarantee
- `RunReadOnlyBatch`, for the same reason
- `MultiToolUse`, because V1 is not a general nested-tool batch surface

## Input Contract

`ExecCommandBatch` accepts a bounded list of command items.

Each item is a restricted `ExecCommand` startup request:

```json
{
  "items": [
    {
      "cmd": "rg -n \"ToolExecutionRecord\" src",
      "max_output_tokens": 1200
    },
    {
      "cmd": "sed -n '1,180p' src/tool/tools/exec_command.rs",
      "workdir": "src",
      "yield_time_ms": 10000,
      "max_output_tokens": 1600
    }
  ],
  "stop_on_error": false
}
```

Allowed item fields:

- `cmd`
- `workdir`
- `shell`
- `login`
- `yield_time_ms`
- `max_output_tokens`

Disallowed item fields:

- `tty`
- `accepts_input`

The disallowed fields are interactive command-task fields. They are appropriate
for `ExecCommand`, but not for a batch whose V1 contract is "run these bounded
commands and return one grouped receipt."

V1 should also enforce a small maximum item count. The exact value is an
implementation choice, but it should be low enough to keep one batch receipt
readable and bounded.

## Execution Semantics

V1 executes items sequentially in item order.

For each item, the runtime should reuse the same effective execution boundary
that `ExecCommand` would use for a non-interactive command:

- current agent trust
- current workspace / execution root
- workdir validation
- shell selection
- output truncation
- timeout behavior
- structured command result envelope

`ExecCommandBatch` should not introduce a new command allowlist or denylist as
its primary safety boundary.

If an item fails validation, fails to spawn, exits non-zero, or times out, the
batch records that item as failed and continues by default.

`stop_on_error = true` changes only control flow:

- run items in order
- stop after the first rejected or failed item
- mark later items as skipped

## Command Task Boundary

`ExecCommand` may promote long-running commands into `command_task`.

`ExecCommandBatch` V1 should not do that.

Batch items are intended for bounded command execution with a grouped receipt.
If a command needs:

- interactive input
- tty behavior
- background execution
- blocking continuation on later task result

then the model should call `ExecCommand` directly and use the command-family
task tools (`TaskStatus`, `TaskOutput`, `TaskInput`, `TaskStop`) as needed.

## Output Contract

The canonical result should preserve per-item metadata.

Suggested shape:

```json
{
  "item_count": 3,
  "completed_count": 2,
  "failed_count": 1,
  "skipped_count": 0,
  "stop_on_error": false,
  "items": [
    {
      "index": 1,
      "cmd": "rg -n \"ToolExecutionRecord\" src",
      "status": "completed",
      "exit_status": 0,
      "stdout_preview": "...",
      "stderr_preview": null,
      "truncated": false,
      "duration_ms": 42
    }
  ],
  "summary_text": "ExecCommandBatch completed 2/3 items"
}
```

The model-visible receipt should be compact text, not a raw JSON dump:

```text
ExecCommandBatch completed 2/3 items

[1] rg -n "ToolExecutionRecord" src
exit=0
stdout:
...

[2] sed -n '1,180p' src/tool/tools/exec_command.rs
exit=0
stdout:
...

[3] cargo test slow_suite
exit=124
stderr:
...
```

The receipt should keep item boundaries visible even when output is truncated.

## Metrics

`ExecCommandBatch` changes tool metrics because one model-visible tool call may
represent many shell command executions.

Metrics should distinguish:

- model-visible tool calls
- direct `ExecCommand` calls
- batched command items
- total command executions

Suggested metric names:

- `tool_calls`
- `shell_commands`
- `exec_command_items`
- `batched_exec_command_items`

Benchmark reporting should not count one `ExecCommandBatch` call as one shell
inspection. It should count the batch as one model-visible tool call and count
each item as one command execution.

## Prompt Guidance

Tool guidance should teach the model to use `ExecCommandBatch` for clustered
bounded commands whose outputs are useful before the next reasoning step.

Guidance should not encourage shell command concatenation as the primary
solution.

Suggested wording:

```text
Use ExecCommandBatch when several bounded shell commands should run before the
next decision and do not require interactive input, background task management,
or command-task continuation. Each item uses restricted ExecCommand startup
fields and returns an itemized receipt. Do not use ExecCommandBatch for edits,
interactive commands, long-running commands, or arbitrary nested tools.
```

`ExecCommand` guidance should remain primary for:

- one-off commands
- verification commands where a single result drives the next step
- interactive or tty commands
- long-running commands that may need task supervision

## Future Work

Future versions may add separate capabilities, but they should not be smuggled
into V1:

- parallel command execution
- read-only sandbox-backed inspection batches
- configured parallel-safe tool metadata
- arbitrary tool batching
- richer per-item approval or policy hooks

Those require their own design because they change safety, ordering, and
lifecycle semantics.