zeph-orchestration
DAG-based task orchestration with failure propagation, LLM planning, and SQLite persistence for Zeph.
Overview
Implements the multi-agent task orchestration pipeline extracted from zeph-core. Decomposes high-level goals into directed acyclic graphs of sub-tasks, executes them via a tick-based scheduler, routes tasks to sub-agents, aggregates results through a final LLM synthesis call, and persists graph state in SQLite for resume and retry. Includes plan template caching for repeated goals.
Key modules
| Module | Description |
|---|---|
graph |
TaskGraph, TaskNode, TaskId, GraphId typed identifiers; TaskStatus, GraphStatus, FailureStrategy (abort/retry/skip/ask) |
dag |
DAG validation (cycle detection via topological sort), ready_tasks, propagate_failure, reset_for_retry |
scheduler |
DagScheduler tick-based execution engine; SchedulerAction command pattern; TaskEvent, TaskOutcome |
planner |
Planner trait + LlmPlanner — goal decomposition via chat_typed structured output; maps string task IDs to TaskId |
aggregator |
Aggregator trait + LlmAggregator — synthesizes completed task outputs; per-task character budget; content-sanitized before injection |
router |
AgentRouter trait + RuleBasedRouter — 3-step fallback task-to-agent routing |
plan_cache |
PlanCache — caches plan templates by normalized goal hash; PlanTemplate captures task structure from a TaskGraph for reuse; normalize_goal + goal_hash for deterministic cache keys |
command |
PlanCommand parser for /plan CLI slash commands |
error |
OrchestrationError unified error type |
Usage
Orchestration is triggered via /plan commands in the agent chat:
/plan analyze the codebase and write a test report
/plan confirm # confirm and start execution
/plan status # show DAG progress
/plan list # list recent graphs
/plan cancel # cancel active graph
/plan retry # re-queue failed tasks
/plan resume # resume a paused graph (Ask failure strategy)
[!NOTE] When
confirm_before_execute = true(default),/plan <goal>creates the graph and pauses for confirmation. Run/plan confirmto start execution or/plan cancelto discard.
Configuration
[]
# planner_provider = "quality" # provider name from [[llm.providers]] for planning; empty = primary provider
= 4096 # LLM token budget for goal decomposition
= 16384 # chars of cross-task context injected per task
= true # require /plan confirm before starting
= 4096 # token budget for LlmAggregator synthesis call
Failure strategies
| Strategy | Behavior when a task fails |
|---|---|
Abort |
Cancel all remaining tasks and mark the graph failed |
Retry |
Re-queue the failed task up to max_retries times |
Skip |
Mark the task skipped and continue with dependents |
Ask |
Pause the graph and wait for /plan resume from the user |
Plan template caching
When a goal is decomposed into a task graph, the resulting structure is cached as a PlanTemplate keyed by a normalized goal hash. Subsequent requests with semantically equivalent goals reuse the cached template instead of invoking the LLM planner, reducing latency and token costs for repeated orchestration patterns.
Integration points
zeph-coreintegratesDagSchedulerandLlmPlannerinto the agent loop via theorchestrationmodulezeph-memory::RawGraphStore/SqliteGraphStorepersists graph statezeph-sanitizer::ContentSanitizerwraps cross-task context before injectionzeph-subagent::SubAgentManager::spawn_for_task()spawns sub-agents per task
Installation
Enabled via the orchestration feature flag on the root zeph crate.
Documentation
Full documentation: https://bug-ops.github.io/zeph/
License
MIT