procpilot 0.1.0

Production-grade subprocess runner with typed errors, retry, and timeout
Documentation

procpilot

Production-grade subprocess runner for Rust — typed errors, retry with backoff, timeout with pipe-draining, binary-safe output.

Built for CLI tools that need to spawn external processes and handle failure modes precisely. Not intended for shell scripting (see xshell for that).

Why not std::process::Command?

Command::output() returns an Output with a status field the caller must remember to check. Command::spawn() gives you a Child but no help with timeout, retry, or deadlock-safe pipe draining. Building a production CLI tool on Command alone means writing the same wrapping layer every time.

procpilot is that layer:

  • Typed errors — [RunError] distinguishes Spawn (couldn't start — binary missing, fork failed), NonZeroExit (command ran and reported failure with captured stdout/stderr), and Timeout (killed after exceeding budget). Callers can match to handle each appropriately.
  • Retry with exponential backoff — [run_with_retry] wraps any subprocess call with a user-supplied "is this error transient?" predicate.
  • Timeout with pipe-draining — background threads drain stdout/stderr while waiting, so chatty processes don't block on buffer overflow and fail to respond to the kill signal.
  • Binary-safe output — stdout is Vec<u8> (faithful for image/binary content); stdout_lossy() gives a zero-copy Cow<str> for text.
  • Environment variables — [run_cmd_in_with_env] handles the GIT_INDEX_FILE / SSH_AUTH_SOCK / etc. cases without dropping back to Command.

Usage

[dependencies]
procpilot = "0.1"

Running commands

use procpilot::{run_cmd, run_cmd_in, RunError};

// Basic: run a command, get captured output
let output = run_cmd("echo", &["hello"])?;
assert_eq!(output.stdout_lossy().trim(), "hello");

// In a specific directory
let output = run_cmd_in(&repo_path, "git", &["log", "--oneline", "-5"])?;

// Binary-safe output for image/binary content
let output = run_cmd_in(&repo_path, "git", &["show", "HEAD:logo.png"])?;
let image_bytes: Vec<u8> = output.stdout;

Handling "command ran and said no"

procpilot returns Result<RunOutput, RunError>. The three-variant enum lets callers distinguish infrastructure failure from command-reported failure from timeouts:

use procpilot::{run_cmd, RunError};

match run_cmd("git", &["show", "maybe-missing-ref"]) {
    Ok(output) => Some(output.stdout),
    Err(RunError::NonZeroExit { .. }) => None,   // ref doesn't exist — legitimate answer
    Err(e) => return Err(e.into()),              // real infrastructure failure
}
# ; Ok::<(), anyhow::Error>(())

RunError implements std::error::Error, so ? into anyhow::Result works when you don't care about the distinction.

Inspection methods on RunError:

  • err.is_non_zero_exit() / err.is_spawn_failure() / err.is_timeout() — check the variant
  • err.stderr() — captured stderr on NonZeroExit/Timeout, None on Spawn
  • err.exit_status() — exit status on NonZeroExit, None on others
  • err.program() — the program name that failed

RunError is marked #[non_exhaustive], so future variants won't break your match arms — include a wildcard fallback.

Timeouts

For commands that might hang (network operations, unreachable remotes, user-supplied queries), use the timeout variant:

use std::time::Duration;
use procpilot::{run_cmd_in_with_timeout, RunError};

match run_cmd_in_with_timeout(&repo, "git", &["fetch"], Duration::from_secs(30)) {
    Ok(_) => println!("fetched"),
    Err(RunError::Timeout { elapsed, stderr, .. }) => {
        eprintln!("fetch hung after {elapsed:?}; last stderr: {stderr}");
    }
    Err(e) => return Err(e.into()),
}
# ; Ok::<(), anyhow::Error>(())

Output collected before the kill is returned in the Timeout error variant.

Caveat on grandchildren: the kill signal reaches only the direct child. A shell wrapper like sh -c "slow-cmd" forks the target as a grandchild that survives the shell's kill. Use exec in the shell (sh -c "exec slow-cmd") or invoke the target directly.

Retry on transient errors

use procpilot::{run_with_retry, RunError};

// Retry when stderr looks like a lock-contention error
let output = run_with_retry(&repo, "git", &["pull"], |err| match err {
    RunError::NonZeroExit { stderr, .. } => stderr.contains(".lock"),
    _ => false,
})?;
# ; Ok::<(), RunError>(())

Uses exponential backoff (100ms, 200ms, 400ms) with up to 3 retries. The predicate is impl Fn(&RunError) -> bool so it can capture state.

Environment variables

use procpilot::run_cmd_in_with_env;

// Run with a custom GIT_INDEX_FILE (detects unstaged renames)
let output = run_cmd_in_with_env(
    &repo, "git", &["add", "-N", "--", "file.rs"],
    &[("GIT_INDEX_FILE", "/tmp/index.tmp")],
)?;
# ; Ok::<(), procpilot::RunError>(())

Inherited I/O

When the user should see output directly (e.g., running cargo test and letting test output stream):

use procpilot::run_cmd_inherited;

run_cmd_inherited("cargo", &["test"])?;
# ; Ok::<(), procpilot::RunError>(())

Binary availability

use procpilot::{binary_available, binary_version};

if binary_available("docker") {
    println!("docker version: {}", binary_version("docker").unwrap_or_default());
}

Design decisions

  • Vec<u8> stdout, String stderr. Stdout can be binary; stderr is conventionally text. Asymmetric by design.
  • NonZeroExit is an error, not a normal return. Forces the caller to opt into handling command-reported failures via match, rather than accidentally ignoring a missed status check.
  • Timeouts drain pipes in threads. A simple wait_timeout without draining will hang forever if the child fills the pipe buffer. This is a subtle production-grade correctness concern that a scripting library can skip.
  • No trait abstraction. Vcs-style traits belong in consumer code where the specific needs are known. procpilot provides primitives.
  • #[non_exhaustive] on RunError. New failure modes can be added without breaking callers.

License

Licensed under either Apache-2.0 or MIT at your option.