processkit
Async child-process management for Rust with a kernel-backed no-orphan guarantee: every process you start — and everything it spawns — lives in a kill-on-drop container, so no descendant outlives your program.
use Command;
async

Why processkit?
std::process and tokio::process reach (at most) the direct child. The
processes it spawned — a build tool's compiler children, the real payload
behind a wrapper (cmd /c …, sh -c …), a test's helper servers — survive a
timeout, a panic, or a dropped future, and keep running as orphans.
processkit spawns every child into the operating system's own containment
primitive — a Job Object on Windows, a cgroup v2 on Linux (with a
process-group fallback), a POSIX process group on macOS/BSD — so teardown
is a kernel operation over the whole tree, not a best-effort signal to one pid:
- Nothing escapes silently. Dropping the handle or group reaps every
descendant, grandchildren included. Where a mechanism has a genuine weakness
(a
setsidchild escapes a POSIX process group), the activeMechanismis reported instead of pretending — never a silent downgrade. - Async-first. Run-and-capture, line streaming, interactive stdin, readiness probes, shell-free pipelines, supervision — all tokio futures.
- Honest results. A non-zero exit is data (
ProcessResult) until you ask for success; a timeout is captured in the result; a cancellation is always an error; every platform divergence is typed or documented. - Testable. One trait seam (
ProcessRunner) swaps the real spawner for scripted doubles or record/replay cassettes — no subprocess in your tests.
Status: feature-complete — every capability below ships today; pre-1.0, so the API can still move between minor versions. See
CHANGELOG.md.
Install
This crate requires a tokio runtime.
Picking a verb
Every run starts with the same builder; the verb you finish with decides what you get back:
| You want | Call | You get |
|---|---|---|
| stdout, success required | .run() |
trimmed String; non-zero exit / timeout / kill → typed Error |
| the full outcome, exit code as data | .output_string() / .output_bytes() |
ProcessResult — code, stdout, stderr, timed_out; never errors on non-zero |
| just the exit code | .exit_code() |
i32 (a timed-out / killed run errors instead of inventing -1) |
| a yes/no answer | .probe() |
bool — exit 0 → true, 1 → false, anything else errors |
| the first matching output line | .first_line(|l| …) |
Option<String> — None when stdout closes without a match |
| a live handle — streaming, stdin, probes | .start() |
RunningProcess |
The same vocabulary repeats on every layer (ProcessRunner, CliClient), and
processkit::run("git", ["status"]) / processkit::output(…) skip the builder
for one-liners.
Quick start
use ;
async
Documentation
This README is the quick tour. The docs/ guide set
goes deeper on every capability, with more examples and the platform fine
print collected in one place. New here? Skim the Cookbook
first — it maps "I want to …" tasks to working snippets — then read
Running commands end to end:
| Guide | Covers |
|---|---|
| Cookbook | Task → snippet recipes for everything below; the fastest way in |
| Running commands | The full Command builder and every consuming verb, with error semantics |
| Process groups | Containment, teardown, signals, suspend/resume, members, limits, stats |
| Streaming & interactive I/O | Line streaming, conversational stdin, readiness probes, wait_any, profiling |
| Pipelines | Shell-free a | b | c, pipefail attribution, chain timeouts |
| Timeouts, retries & cancellation | Captured vs raised deadlines, retry classifiers, CancellationToken |
| Supervision | Restart policies, backoff & jitter, stop conditions, outcomes |
| Testing your code | The ProcessRunner seam, scripted/recording/mock doubles, cassettes, CliClient |
| Platform support | Mechanisms, all capability matrices, every caveat |
API reference: docs.rs/processkit.
Feature flags
Each flag is additive and only gates visibility — the kill-on-drop tree guarantee is unconditional in every configuration.
| Feature | Default | Adds |
|---|---|---|
stats |
✅ | group/per-run resource measurement, sample_stats, profile |
process-control |
✅ | Signal, ProcessGroup::{signal, suspend, resume, members, adopt} |
limits |
— | whole-tree resource caps (implies stats) |
cancellation |
— | CancellationToken integration (pulls tokio-util) |
record |
— | record/replay cassettes (pulls serde) |
mock |
— | mockall-generated MockRunner |
tracing |
— | lifecycle events: spawn/exit, timeout/cancel, teardown, retries, storms (never argv/env) |
Capping a group's resources
Requires the limits feature (off by default) — add it to the dependency:
= { = "…", = ["limits"] }
ProcessGroupOptions can then bound the whole tree's memory, process count, and CPU
at creation, so a runaway or untrusted child tree can't exhaust the host:
use ;
async
cpu_quota is a fraction of a single core (0.5 = half a core, 2.0 = two
cores); on Windows it is converted against the host's CPU count and is approximate.
Limits need a real container — a Windows Job Object or a Linux cgroup v2.
There is no whole-tree limit on macOS/the BSDs, the Linux process-group fallback, or
the no-containment target, and a Linux cgroup must permit controller delegation (run
as root, in a container, or under a systemd unit with Delegate=yes). When a
requested limit can't be enforced, with_options returns Error::ResourceLimit
instead of a silently-unbounded group — an unapplied cap is no protection.
Deeper: Process groups → resource limits.
Signalling and pausing the whole tree
Beyond the kill/shutdown teardown verbs, a group can broadcast a signal to every member or freeze and thaw the whole tree:
use ;
async
Signals are POSIX-only: on Windows just Signal::Kill is deliverable (it maps to
the Job Object terminate) and anything else returns Error::Unsupported.
Signal::Kill always takes the same whole-tree hard-kill path as
terminate_all(). Suspend/resume work everywhere a container exists — one
cgroup.freeze write covering the subtree on Linux, SIGSTOP/SIGCONT on
macOS/BSD and the Linux process-group fallback (both idempotent), and
per-thread suspension on Windows (best-effort; only there nested suspends
stack and need matching resumes).
Deeper: Process groups → signals, suspend/resume.
Inspecting the tree
members() snapshots the live member pids, and wait_any races several running
processes, reporting whichever exits first — the natural primitive for
supervising a few long-lived children:
use ;
async
members() lists the whole tree on Windows (Job Object) and Linux (cgroup); the
POSIX process-group backends list the tracked group leaders only. (members
is part of the default-on process-control feature; wait_any is always
available.) wait_any applies no per-process timeout (bound the race with
tokio::time::timeout) and does no output pumping — drain chatty children
first.
Deeper: Process groups → members · Streaming → racing children.
Sampling stats over time
A point-in-time stats() becomes a series with sample_stats, and a single run
can be profiled end-to-end (requires the default-on stats feature):
use ;
use Duration;
async
The series inherits stats()'s platform matrix (full CPU/memory on Windows and
Linux cgroup; counts only on the POSIX process-group backends); profile
samples the started child process itself and applies the run's normal
timeout/output handling.
Deeper: Process groups → stats · Streaming → profiling a run.
Supervising a long-lived child
Where Command::retry replays one run until it succeeds, a Supervisor keeps a
child alive: it restarts the command per policy whenever it exits, with
bounded restarts and exponential backoff (jittered by default so a restarted
fleet doesn't stampede):
use ;
use Duration;
async
run() reports a SupervisionOutcome — the final run's result, the restart
count, and why supervision stopped. The opt-in failure-storm guard
distinguishes "fails rarely" from "crash-looping": each failure feeds a score
that halves every failure_decay; past failure_threshold the supervisor
takes one collective storm_pause instead of hammering restarts at backoff
speed. Supervision is platform-agnostic and runs through the ProcessRunner
seam: pass .with_runner(&group) to keep every incarnation in one shared
kill-on-drop group, or a ScriptedRunner to test supervision logic
hermetically.
Deeper: Supervision.
Waiting for a child to be ready
"Start a server, then use it" needs the server to be ready, not merely started. Three probes replace the arbitrary sleep:
use Command;
use Duration;
async
async
A probe that doesn't pass within its deadline — or that can no longer pass
(the child exits; for wait_for_line, its stdout closes) — fails with
Error::NotReady (distinct from Error::Timeout, which is the run's own
Command::timeout) and does not kill the child: the caller decides what
happens next. wait_for_line consumes stdout up to the match
(continue with finish_streamed); wait_for_port / wait_for don't touch
the pipes at all.
Deeper: Streaming → readiness probes.
Pipelines without a shell
a | b | c without a shell string — native pipes, so no quoting or injection
surface, and every stage lives in one shared kill-on-drop group:
use Command;
async
The | operator is equivalent sugar: (a | b | c).run().
The outcome is pipefail: stdout is the last stage's output, while the
exit code, stderr, and reported program come from the first stage that didn't
exit cleanly (or the last stage when all succeed). For a consumer that
legitimately stops reading early — the producer | head -1 shape, where the
producer's SIGPIPE death is expected — mark that stage
.unchecked() and pipefail skips it (a checked failure still always wins).
The first stage's stdin source is honored; inner stages read from the pipe.
.timeout(d) bounds the whole chain (killing every stage at the deadline),
and run() requires every stage to succeed, returning the trimmed final
stdout.
Deeper: Pipelines.
Environment and privileges
Spawn-time controls for sandboxing and service launch:
use Command;
async
inherit_env clears the environment and copies only the named parent vars
(explicit env/env_remove still apply on top); it works everywhere. uid /
gid (group id is set before user id) and setsid are POSIX-only — on other
targets the run fails with Error::Unsupported rather than silently skipping
a privilege drop. One Linux caveat: under the cgroup mechanism the child
joins its cgroup after the uid has already dropped, and the auto-created
cgroup isn't writable by the target user — the spawn fails with a permission
error (never an uncontained child); privilege drop currently composes cleanly
with the process-group mechanism. setsid keeps containment: the group
tracks the new session's process group. create_no_window is a harmless
no-op outside Windows and, unlike the raw ProcessGroup::spawn escape hatch,
survives the group's CREATE_SUSPENDED containment flag (they are OR'd
together). kill_on_parent_death hardens the one case Drop can't cover —
the parent dying abruptly (SIGKILL): Windows already guarantees it (the job
handle closes with the process), Linux arms PDEATHSIG on the direct child,
macOS/BSD have no equivalent (documented no-op).
Deeper: Running commands → privileges and spawn flags.
Cancelling a run
Requires the cancellation feature (off by default). Hand a command a
CancellationToken (re-exported from tokio-util); cancelling the token
kills the process tree, and every consuming path reports Error::Cancelled:
use ;
async
Unlike a timeout — whose expiry is captured in the result as timed_out —
cancellation is always an error: the run was abandoned, so there is no
result to inspect. When a cancel and a timeout land together, cancellation
wins. A token cancelled before the run starts short-circuits without
spawning anything. On a shared ProcessGroup handle, cancelling kills the
child itself but leaves the group's siblings alone (same scope as a timeout),
and a supervised command that gets cancelled stops its Supervisor for good —
restarting into a still-cancelled token would loop futilely.
For a typed wrapper whose commands never cross your code, set the token once
on the client: CliClient::new("gh").default_cancel_on(token.child_token())
— cancelling it kills every in-flight command of that client.
Deeper: Timeouts, retries & cancellation.
Async streaming and interactive I/O
The one-shot helpers above buffer the whole output. For long-running or
conversational children, start() returns a live RunningProcess you can
drive asynchronously.
Stream stdout line by line
Process each line as it arrives — no waiting for the child to exit, no buffering
the full output. StreamExt (re-exported from tokio-stream) provides .next():
use ;
async
The command's
timeoutbounds the stream: at the deadline the tree is killed, the pipes close, and the stream ends (on a handle that owns its group — thestart()path). Acancel_ontoken (with thecancellationfeature) ends the stream the same way, and the followingfinish_streamedreportsError::Cancelled. For an ad-hoc bound, wrapping the loop intokio::time::timeoutand dropping the handle (which kills the tree) still works.
Interactive stdin — write requests, read responses
Keep stdin open with keep_stdin_open(), take the writer with
standard_input(), then interleave async writes and reads:
use ;
async
Feed stdin from an async stream, react to stdout as it's read
Stdin::from_lines writes each item of any Stream<Item = String> as a line —
back it with a channel, a file tail, or a network source. Pair it with
on_stdout_line / on_stderr_line to handle output inline (the handler runs on
the read pump, in addition to capture):
use ;
use iter; // any `Stream<Item = String>` works
async
Deeper: Streaming & interactive I/O.
Wrapping a CLI tool
CliClient + the cli_client! macro turn a typed wrapper around an external
tool (git, jj, gh, …) into just its parsers — the runner is injectable, so
the wrapper is hermetically testable with a ScriptedRunner (no subprocess).
The seam covers streaming too: a scripted start() feeds canned lines
through the same pump machinery, so stdout_lines/wait_for_line-based
orchestration tests hermetically as well:
use ;
use Path;
cli_client!
Deeper: Testing your code → CliClient.
Recording and replaying runs
Requires the record feature (off by default). RecordReplayRunner turns
real runs into a JSON cassette once, then replays them deterministically —
fast, hermetic, no subprocess in CI:
use ;
async
Entries are matched by program + args + cwd + has-stdin. Environment override
values never reach the file — only the sorted variable names, so a
committed fixture can't leak secrets (and env differences can't cause spurious
misses). When one invocation was recorded several times, replay serves the
entries in capture order and then repeats the last one — a recorded sequence
of changing outputs replays faithfully, while retry/probe loops keep getting a
stable final answer. An invocation absent from the cassette is a strict error
(replay never spawns a surprise subprocess), and the file carries a format
version so future readers fail loudly instead of misreading old fixtures.
Deeper: Testing your code → record/replay.
Contributing
Running the tests and the (maintainer-only) release process are documented in CONTRIBUTING.md.
License
Licensed under the MIT License.