durable-rust
Lightweight durable job execution engine backed by Postgres.
No external services, no replay log, no orchestrator. Just checkpointed steps in plain Rust.
Quick example
use ;
async
If the process crashes at shard 2, on restart it skips shards 0-1 (results saved) and resumes from 2.
Schema
Two tables in a durable schema:
task-- unified row for workflows, steps, and child workflows. Self-referential viaparent_id. Idempotent creation viaUNIQUE(parent_id, name).task_queue-- optional concurrency and rate-limit controls.
See doc/schema.md for the full DDL, indexes, and status lifecycle.
Getting started
-
Start Postgres:
-
Add the dependency:
[] = { = "durable-rust", = "0.1" } -
Initialize in your app -- connects and runs migrations automatically:
let db = init.await?; -
Write workflows using
Ctx:Ctx::start(&db, name, input)-- create or resume a root workflowctx.step(name, closure)-- run-once step with saved outputctx.child(name, input)-- spawn a nested child workflowctx.complete(output)-- mark workflow done
Crate structure
| Crate | Path | Purpose |
|---|---|---|
durable-rust |
crates/durable |
SDK -- Ctx, DurableError, Executor, proc-macro re-exports. The only crate users add. |
durable-db |
crates/durable-db |
SeaORM migrations for the durable schema (pulled in transitively) |
durable-macros |
crates/durable-macros |
#[durable::workflow] and #[durable::step] proc macros (pulled in transitively) |
Run the example
The nested-etl example runs a parent ETL workflow that spawns child workflows per data source, then demonstrates crash recovery by resuming mid-run.
Run tests
Multi-node setup
Multiple workers can share one Postgres database. Each worker sets a unique executor_id:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Worker A │ │ Worker B │ │ Worker C │
│ executor_id │ │ executor_id │ │ executor_id │
│ = "a1b2" │ │ = "c3d4" │ │ = "e5f6" │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────┬────────┴────────┬────────┘
│ │
▼ ▼
┌─────────────────────────────────┐
│ Postgres │
│ durable.task (all state) │
│ durable.task_queue (limits) │
└─────────────────────────────────┘
How it works:
- Workers dequeue tasks with
FOR UPDATE SKIP LOCKED-- no double-processing - Each task's
executor_idcolumn tracks which worker owns it - On startup, a worker resets its own stale
RUNNINGtasks toPENDING(single-worker recovery) - For cross-worker recovery (worker A detects worker B crashed), enable the executor heartbeat (see #7)
Queue concurrency control:
-- task_queue limits how many tasks run concurrently per queue
INSERT INTO durable.task_queue (name, max_concurrency) VALUES ('ingest', 4);
-- Workers respect the limit across all nodes
-- If 4 ingest tasks are RUNNING, the 5th worker waits
Design docs
- doc/api.md -- API design with proc macros (
#[durable::workflow],#[durable::step]) - doc/dataflow.md -- dataflow diagrams for direct, queued, scheduled, and nested execution
- doc/schema.md -- full schema DDL and status lifecycle
Design principles
- Postgres is the source of truth -- no WAL, no event log, no separate replay mechanism
- Steps are idempotent by design -- if a step completed, its saved result is returned; the closure is never re-executed
- No orchestrator -- the job runner is just your application code calling
ctx.step() - No serialization framework -- uses
serde_jsonfor input/output - Crash safe -- incomplete steps are detected on resume; completed steps replay from saved output
- Observable -- query the
durable.tasktable directly for monitoring and debugging