dag_exec 0.1.1

Sync DAG executor for CPU-heavy pipelines: bounded parallelism + partial evaluation.
Documentation
# dag_exec

Sync DAG executor for CPU-heavy pipelines: **bounded parallelism** + **partial evaluation** (std-only).

## Why

When you have a dependency graph of expensive work (hashing, decoding, proof steps, indexing stages),
you often want:

- **compute only requested outputs** (prune everything else)
- **run in parallel** without bringing an async runtime
- **control backpressure** (cap in-flight work)

`dag_exec` is a small, std-only crate focused on that niche.

## Features

- **Partial evaluation**: computes only requested outputs (prunes unused subgraph)
- **Sequential executor**: Kahn-style topo scheduling + cycle detection
- **Parallel executor (std threads)**:
  - worker pool with RAII shutdown
  - bounded dispatch via `max_in_flight` + per-worker bounded queues (`worker_queue_cap`)
- Build-time validation: missing deps, duplicate keys, empty graph

## Status

Published as v0.1.0. API may change while pre-1.0.

## Minimal example

```rust
use dag_exec::{DagBuilder, Executor, ExecutorConfig};
use std::sync::Arc;

let mut b = DagBuilder::<String, i32, ()>::new();
b.add_source("a".into(), 1)?;
b.add_source("b".into(), 2)?;
b.add_task("c".into(), vec!["a".into(), "b".into()], |xs: &[Arc<i32>]| Ok(*xs[0] + *xs[1]))?;
let dag = b.build()?;

let exec = Executor::new(ExecutorConfig::default());
let out = exec.run_sequential(&dag, ["c".to_string()])?;
assert_eq!(*out["c"], 3);
# Ok::<(), dag_exec::BuildError<String>>(())
```

## Examples

```bash
cargo run --example pipeline
cargo run --example merkle
cargo run --release --example rollup
DAG_EXEC_MAX_WORKERS=4 DAG_EXEC_HASH_ITERS=200000 cargo run --release --example rollup
```

## Benchmarks

```bash
cargo bench
```

Notes:

- `chain_*` has no inherent parallelism (measures overhead).
- `fanout_*_heavy` demonstrates parallel speedups for CPU-heavy nodes.

## Design notes

- `max_in_flight` bounds **queued + running** work in the parallel scheduler.
- Physical maximum is `n_workers * (worker_queue_cap + 1)`; effective cap is the minimum of both.

### ExecutorConfig knobs

- `max_workers`: worker threads in the parallel executor (default: available CPU threads).
- `max_in_flight`: global bound on queued + running tasks (backpressure).
- `worker_queue_cap`: per-worker queue capacity (bounded `sync_channel`); physical max in-flight is
  `max_workers * (worker_queue_cap + 1)` and the effective cap is `min(max_in_flight, physical_max)`.

Tip: for benchmarks/demos, set `DAG_EXEC_MAX_WORKERS=4` to avoid oversubscribing small DAGs.

### Notes on performance

- For tiny per-node work, the parallel executor can be slower (thread + channel overhead dominates).
- Use `DAG_EXEC_HASH_ITERS` in `rollup`/`pipeline` examples to simulate CPU-heavy nodes and see speedups.
- The pruning win (partial evaluation) is deterministic: fewer requested outputs => fewer executed nodes.

## Next

- Error hardening + invariant tightening (see issue #6)
- More docs/diagrams and additional DAG examples (as needed)