dag_exec
Sync DAG executor for CPU-heavy pipelines: bounded parallelism + partial evaluation (std-only).
Why
When you have a dependency graph of expensive work (hashing, decoding, proof steps, indexing stages), you often want:
- compute only requested outputs (prune everything else)
- run in parallel without bringing an async runtime
- control backpressure (cap in-flight work)
dag_exec is a small, std-only crate focused on that niche.
Features
- Partial evaluation: computes only requested outputs (prunes unused subgraph)
- Sequential executor: Kahn-style topo scheduling + cycle detection
- Parallel executor (std threads):
- worker pool with RAII shutdown
- bounded dispatch via
max_in_flight+ per-worker bounded queues (worker_queue_cap)
- Build-time validation: missing deps, duplicate keys, empty graph
Status
Published as v0.1.0. API may change while pre-1.0.
Minimal example
use ;
use Arc;
let mut b = new;
b.add_source?;
b.add_source?;
b.add_task?;
let dag = b.build?;
let exec = new;
let out = exec.run_sequential?;
assert_eq!;
# Ok::
Examples
DAG_EXEC_MAX_WORKERS=4 DAG_EXEC_HASH_ITERS=200000
Benchmarks
Notes:
chain_*has no inherent parallelism (measures overhead).fanout_*_heavydemonstrates parallel speedups for CPU-heavy nodes.
Design notes
max_in_flightbounds queued + running work in the parallel scheduler.- Physical maximum is
n_workers * (worker_queue_cap + 1); effective cap is the minimum of both.
ExecutorConfig knobs
max_workers: worker threads in the parallel executor (default: available CPU threads).max_in_flight: global bound on queued + running tasks (backpressure).worker_queue_cap: per-worker queue capacity (boundedsync_channel); physical max in-flight ismax_workers * (worker_queue_cap + 1)and the effective cap ismin(max_in_flight, physical_max).
Tip: for benchmarks/demos, set DAG_EXEC_MAX_WORKERS=4 to avoid oversubscribing small DAGs.
Notes on performance
- For tiny per-node work, the parallel executor can be slower (thread + channel overhead dominates).
- Use
DAG_EXEC_HASH_ITERSinrollup/pipelineexamples to simulate CPU-heavy nodes and see speedups. - The pruning win (partial evaluation) is deterministic: fewer requested outputs => fewer executed nodes.
Next
- Error hardening + invariant tightening (see issue #6)
- More docs/diagrams and additional DAG examples (as needed)