zeph-orchestration

DAG-based task orchestration with failure propagation, LLM planning, and SQLite persistence for Zeph.

Overview

Implements the multi-agent task orchestration pipeline extracted from zeph-core. Decomposes high-level goals into directed acyclic graphs of sub-tasks, executes them via a tick-based scheduler, routes tasks to sub-agents, aggregates results through a final LLM synthesis call, and persists graph state in SQLite for resume and retry. Includes plan template caching for repeated goals.

Key modules

Module	Description
`graph`	`TaskGraph`, `TaskNode`, `TaskId`, `GraphId` typed identifiers; `TaskStatus`, `GraphStatus`, `FailureStrategy` (abort/retry/skip/ask)
`dag`	DAG validation (cycle detection via topological sort), `ready_tasks`, `propagate_failure`, `reset_for_retry`
`scheduler`	`DagScheduler` tick-based execution engine; `SchedulerAction` command pattern; `TaskEvent`, `TaskOutcome`
`planner`	`Planner` trait + `LlmPlanner` — goal decomposition via `chat_typed` structured output; maps string task IDs to `TaskId`
`aggregator`	`Aggregator` trait + `LlmAggregator` — synthesizes completed task outputs; per-task character budget; content-sanitized before injection
`router`	`AgentRouter` trait + `RuleBasedRouter` — 3-step fallback task-to-agent routing
`plan_cache`	`PlanCache` — caches plan templates by normalized goal hash; `PlanTemplate` captures task structure from a `TaskGraph` for reuse; `normalize_goal` + `goal_hash` for deterministic cache keys
`command`	`PlanCommand` parser for `/plan` CLI slash commands
`error`	`OrchestrationError` unified error type

Usage

Orchestration is triggered via /plan commands in the agent chat:

/plan analyze the codebase and write a test report
/plan confirm           # confirm and start execution
/plan status            # show DAG progress
/plan list              # list recent graphs
/plan cancel            # cancel active graph
/plan retry             # re-queue failed tasks
/plan resume            # resume a paused graph (Ask failure strategy)

[!NOTE] When confirm_before_execute = true (default), /plan <goal> creates the graph and pauses for confirmation. Run /plan confirm to start execution or /plan cancel to discard.

Configuration

[orchestration]
# planner_provider = "quality"     # provider name from [[llm.providers]] for planning; empty = primary provider
planner_max_tokens = 4096          # LLM token budget for goal decomposition
dependency_context_budget = 16384  # chars of cross-task context injected per task
confirm_before_execute = true      # require /plan confirm before starting
aggregator_max_tokens = 4096       # token budget for LlmAggregator synthesis call

Failure strategies

Strategy	Behavior when a task fails
`Abort`	Cancel all remaining tasks and mark the graph failed
`Retry`	Re-queue the failed task up to `max_retries` times
`Skip`	Mark the task skipped and continue with dependents
`Ask`	Pause the graph and wait for `/plan resume` from the user

Plan template caching

When a goal is decomposed into a task graph, the resulting structure is cached as a PlanTemplate keyed by a normalized goal hash. Subsequent requests with semantically equivalent goals reuse the cached template instead of invoking the LLM planner, reducing latency and token costs for repeated orchestration patterns.

Integration points

zeph-core integrates DagScheduler and LlmPlanner into the agent loop via the orchestration module
zeph-memory::RawGraphStore / TaskGraphStore persists graph state
zeph-sanitizer::ContentSanitizer wraps cross-task context before injection
zeph-subagent::SubAgentManager::spawn_for_task() spawns sub-agents per task

Installation

cargo add zeph-orchestration

Enabled via the orchestration feature flag on the root zeph crate.

Documentation

Full documentation: https://bug-ops.github.io/zeph/

License

MIT

zeph-orchestration 0.21.4