zeph-orchestration 0.18.1

Task orchestration: DAG execution, failure propagation, and persistence for Zeph
Documentation

zeph-orchestration

Crates.io docs.rs License: MIT MSRV

DAG-based task orchestration with failure propagation, LLM planning, and SQLite persistence for Zeph.

Overview

Implements the multi-agent task orchestration pipeline extracted from zeph-core. Decomposes high-level goals into directed acyclic graphs of sub-tasks, executes them via a tick-based scheduler, routes tasks to sub-agents, aggregates results through a final LLM synthesis call, and persists graph state in SQLite for resume and retry. Includes plan template caching for repeated goals.

Key modules

Module Description
graph TaskGraph, TaskNode, TaskId, GraphId typed identifiers; TaskStatus, GraphStatus, FailureStrategy (abort/retry/skip/ask)
dag DAG validation (cycle detection via topological sort), ready_tasks, propagate_failure, reset_for_retry
scheduler DagScheduler tick-based execution engine; SchedulerAction command pattern; TaskEvent, TaskOutcome
planner Planner trait + LlmPlanner — goal decomposition via chat_typed structured output; maps string task IDs to TaskId
aggregator Aggregator trait + LlmAggregator — synthesizes completed task outputs; per-task character budget; content-sanitized before injection
router AgentRouter trait + RuleBasedRouter — 3-step fallback task-to-agent routing
plan_cache PlanCache — caches plan templates by normalized goal hash; PlanTemplate captures task structure from a TaskGraph for reuse; normalize_goal + goal_hash for deterministic cache keys
command PlanCommand parser for /plan CLI slash commands
error OrchestrationError unified error type

Usage

Orchestration is triggered via /plan commands in the agent chat:

/plan analyze the codebase and write a test report
/plan confirm           # confirm and start execution
/plan status            # show DAG progress
/plan list              # list recent graphs
/plan cancel            # cancel active graph
/plan retry             # re-queue failed tasks
/plan resume            # resume a paused graph (Ask failure strategy)

[!NOTE] When confirm_before_execute = true (default), /plan <goal> creates the graph and pauses for confirmation. Run /plan confirm to start execution or /plan cancel to discard.

Configuration

[orchestration]
# planner_provider = "quality"     # provider name from [[llm.providers]] for planning; empty = primary provider
planner_max_tokens = 4096          # LLM token budget for goal decomposition
dependency_context_budget = 16384  # chars of cross-task context injected per task
confirm_before_execute = true      # require /plan confirm before starting
aggregator_max_tokens = 4096       # token budget for LlmAggregator synthesis call

Failure strategies

Strategy Behavior when a task fails
Abort Cancel all remaining tasks and mark the graph failed
Retry Re-queue the failed task up to max_retries times
Skip Mark the task skipped and continue with dependents
Ask Pause the graph and wait for /plan resume from the user

Plan template caching

When a goal is decomposed into a task graph, the resulting structure is cached as a PlanTemplate keyed by a normalized goal hash. Subsequent requests with semantically equivalent goals reuse the cached template instead of invoking the LLM planner, reducing latency and token costs for repeated orchestration patterns.

Integration points

  • zeph-core integrates DagScheduler and LlmPlanner into the agent loop via the orchestration module
  • zeph-memory::RawGraphStore / SqliteGraphStore persists graph state
  • zeph-sanitizer::ContentSanitizer wraps cross-task context before injection
  • zeph-subagent::SubAgentManager::spawn_for_task() spawns sub-agents per task

Installation

cargo add zeph-orchestration

Enabled via the orchestration feature flag on the root zeph crate.

Documentation

Full documentation: https://bug-ops.github.io/zeph/

License

MIT