holon 0.14.1

A headless, event-driven runtime for long-lived agents
Documentation
# Holon Coding Roadmap

This document defines the roadmap for turning `Holon` from a general long-lived
runtime into a coding-capable agent runtime.

The design reference is primarily the Claude Code runtime shape:

- query loop
- tool use / tool result
- task and subagent orchestration
- context management and compaction
- policy and sandbox boundaries
- explicit user-facing output

The goal is not to clone Claude Code feature-for-feature. The goal is to reach
the point where `Holon` can reliably complete coding-oriented tasks.

## Status

The first full pass of `R1` through `R8` is now implemented in the repository.

Current implementation highlights:

- Claude-style `tool_use` / `tool_result` runtime loop
- shell-first repo inspection plus local file mutation tools
- host-owned workspace registry plus explicit agent workspace attachment
- explicit `workspace_anchor`, `active_workspace_entry`, `execution_root`, and
  `cwd` binding
- explicit workspace projection and single-writer occupancy semantics
- runtime-derived closure outcome and waiting-reason surfaces
- typed continuation resolution with blocking `TaskResult` rejoin
- explicit objective state with `current_objective`, `last_delta`, and
  `acceptance_boundary`
- coding-oriented runtime prompt and explicit `Sleep` handling
- background task and bounded delegated child execution
- recent message / brief / tool-result context plus deterministic compaction
- explicit origin/trust/delivery-surface/admission-context marking
- trust-aware tool exposure with final restriction policy still deferred
- explicit `host_local` execution-policy snapshots and honest non-sandbox
  capability reporting
- explicit agent-level model override with effective-model inspection
- unit, integration, live-provider, and fixture-based regression tests
- a thin local TUI for driving and observing the long-lived runtime while
  dogfooding coding flows
- a chat-first TUI interaction model with direct composer input and overlay
  views for agents, transcript, and tasks
- an explicit `holon daemon` lifecycle surface for starting, stopping, and
  inspecting the local long-lived runtime
- concise daemon activity reporting that distinguishes healthy-idle,
  healthy-waiting, and healthy-processing runtime states
- daemon-status visibility into the most recent runtime failure summary
- one documented local troubleshooting path that tells operators when to use
  `run`, `daemon status`, `daemon logs`, `status` / `transcript`, `tui`, and
  foreground `serve`
- provider attempt timelines that preserve retry, fail-fast, and fallback
  behavior on transcript and audit surfaces
- structured token usage on `run` / `status` plus per-turn token metadata on
  transcript and audit surfaces

This does not mean the coding runtime is “finished”. It means the first
end-to-end version of the roadmap now exists in code and has automated
verification.

## Coding Goal

`Holon` should eventually be able to:

- accept a coding task
- inspect a workspace
- edit files
- run commands
- observe failures
- inspect daemon lifecycle failures through a local-first CLI surface
- retry or refine changes
- report progress and final results clearly
- split work into subtasks when needed

## R1: Tool Runtime

### Goal

Implement a Claude Code-style tool runtime based on `tool_use` and
`tool_result`.

### Includes

- define a `Tool` trait
- add a tool registry and schema surface
- teach the provider/runtime loop to handle:
  - model response
  - tool invocation
  - tool execution
  - tool result re-entry
- preserve tool executions in the audit log and active context

### Definition Of Done

- the runtime can complete a loop involving at least one real tool call
- tool results re-enter the same session loop
- `brief` output still stays separate from internal tool traces

### Implemented Notes

- `AnthropicProvider` now supports native tool schemas and `tool_use` blocks
- `RuntimeHandle` runs an iterative tool loop with `tool_result` re-entry
- tool executions are persisted to `tools.jsonl` and included in active context

### Initial Tool Set

- `Sleep`
- `GetSessionState`
- `Enqueue`
- managed task inspection/control only; no public `CreateTask`

## R2: Workspace Base Tools

### Goal

Add the minimum file and workspace tools needed for coding tasks.

### Includes

- `ListFiles`
- `SearchText`
- `ReadFile`
- `ApplyPatch`

### Definition Of Done

- the agent can inspect a codebase
- the agent can make targeted edits to files
- workspace changes are visible in subsequent turns

### Notes

This stage should avoid shell access. The goal is to first prove the coding loop
through file tools alone.

### Implemented Notes

- `ListFiles`, `SearchText`, `ReadFile`, and `ApplyPatch` are live
- file tools are scoped to the configured workspace root
- workspace mutations are visible to later turns and regression tests
- the normal model-facing inspection surface now retires repo-native
  `Read` / `Glob` / `Grep` in favor of shell-first `exec_command`

## R3: Command Execution And Safety

### Goal

Enable real engineering tasks that require commands and validation.

### Includes

- `ExecCommand`
- `KillCommand`
- stdout/stderr capture
- command result persistence and context injection
- policy gating for shell access

### Definition Of Done

- the agent can run tests or build commands
- command output becomes available to later reasoning
- command execution is policy-controlled and auditable

### Notes

Policy should ship before or alongside shell access. Shell should not arrive as
an unrestricted convenience feature.

### Implemented Notes

- `ExecCommand` and `KillCommand` are live
- command output is persisted through tool execution records and fed back into
  later turns
- shell tools are only exposed to trusted operator/system inputs

## R4: Coding Task Loop

### Goal

Make the runtime behave like a coding agent rather than a generic event-driven
assistant.

### Includes

- coding-focused runtime prompt
- plan / act / verify loop shape
- explicit progress and final-result `brief` behavior
- working-set awareness for files and recent tool results

### Definition Of Done

- the agent can complete a small bugfix task end-to-end
- it can inspect code, edit it, run validation, and summarize the result

### Implemented Notes

- the runtime prompt is explicitly coding-oriented
- `Sleep` is treated as a first-class runtime termination signal
- integration tests now cover file-edit and shell-validation loops

## R5: Task And Subagent Runtime

### Goal

Upgrade `Task` into a real orchestration primitive for coding work.

### Includes

- task kinds beyond the current background placeholder jobs
- `ChildAgentTask`
- context handoff into subagent work
- result and status aggregation back to the parent session

### Definition Of Done

- the runtime can delegate bounded coding subtasks
- the parent session can continue while background subtasks run
- task results rejoin the main queue without losing provenance

### Notes

This should follow the Claude Code pattern where subagents reuse the same core
runtime rather than becoming a separate architecture.

### Implemented Notes

- short session-local waiting now belongs to `Sleep(duration_ms)`
- managed command execution is created publicly through `exec_command`
- bounded child-agent creation now belongs to `SpawnAgent`
- parent-supervised child execution still rejoins through task/result surfaces
- parent supervision now uses a structured child observability snapshot across
  `TaskStatus` and `AgentGet.active_children` instead of relying on raw output
  polling for in-flight delegated progress

## R6: Coding Context Management

### Goal

Make long-running coding sessions hold onto the right information.

### Includes

- recent message window
- recent tool result window
- working-set file references
- deterministic summary / compaction
- explicit separation between durable history and model-visible context

### Definition Of Done

- the agent does not immediately forget prior edits or command results
- important file changes and decisions survive longer coding sessions
- context growth remains bounded

### Implemented Notes

- model-visible context now includes recent messages, briefs, and tool
  executions
- structured working memory is now derived from durable runtime
  evidence and rendered ahead of volatile turn context
- durable episode memory now accumulates terminal-turn evidence into one active
  builder and freezes immutable archived work chunks on semantic boundaries
- prompt-visible working-memory deltas are emitted on the next turn after a
  durable revision change
- deterministic compaction still keeps a bounded legacy summary of older
  messages as a fallback path
- the latest completed result is surfaced explicitly in context for follow-up
  questions

## R7: Coding Trust Model

### Goal

Define a trust and permission model suitable for coding agents.

### Includes

- separate permission boundaries for:
  - file tools
  - shell tools
  - task/subagent tools
- operator versus external ingress distinctions
- stable provenance and admission marks on coding turns
- policy hooks for sensitive actions
- optional workspace/path scoping

### Definition Of Done

- external triggers do not silently gain coding authority
- file and shell actions are inspectable and policy-gated
- the runtime can explain both how a trigger was admitted and why an action was
  or was not allowed

### Implemented Notes

- tool exposure is trust-aware
- policy gates remain in place for task creation, timers, control actions, and
  origin/kind compatibility
- file path resolution rejects writes outside the configured workspace root

## R8: Test And Verification Matrix

### Goal

Make coding behavior durable by building a full verification layer.

### Includes

- unit tests for:
  - tool schema and dispatch
  - context building
  - policy decisions
  - task lifecycle
  - compaction behavior
- integration tests for:
  - file edit loops
  - shell execution loops
  - multi-session isolation
  - remote ingress
  - task/subagent rejoin
- live provider tests using real provider configuration
- fixture-based regression cases for longer coding tasks

### Definition Of Done

- each major coding subsystem has direct unit coverage
- each major runtime behavior has integration coverage
- live provider tests continue to pass when run explicitly
- regressions in context, tool use, or policy are detectable before release

### Implemented Notes

- unit tests cover queue, config, policy, storage, context compaction, and
  runtime basics
- integration tests cover tool loops, file edits, shell execution, task rejoin,
  remote ingress, and multi-session isolation
- live provider tests exist for Anthropic, OpenAI, and OpenAI Codex and run
  against local real configuration when invoked explicitly
- fixture-based regression coverage now exists under `tests/fixtures/`

## Recommended Order

The recommended order remains:

1. `R1`
2. `R2`
3. `R3`
4. `R4`
5. `R5`
6. `R6`
7. `R7`
8. `R8`

That order keeps the project honest:

- first build the execution model
- then add coding primitives
- then add delegation and memory
- finally harden with policy and tests