Skip to main content

Supervisor

Struct Supervisor 

Source
pub struct Supervisor<R: ProcessRunner = JobRunner> { /* private fields */ }
Expand description

Keeps a Command alive: runs it, classifies every exit against the RestartPolicy and the stop_when predicate, and restarts it after an exponential-backoff delay until supervision ends.

use processkit::{Command, RestartPolicy, Supervisor};
use std::time::Duration;

let outcome = Supervisor::new(Command::new("my-server").args(["--port", "8080"]))
    .restart(RestartPolicy::OnCrash)
    .max_restarts(5)
    .backoff(Duration::from_millis(200), 2.0)
    .stop_when(|res| res.code() == Some(0))
    .run()
    .await?;
println!("ended after {} restarts: {:?}", outcome.restarts, outcome.stopped);

Defaults: OnCrash, unlimited restarts, backoff 200ms × 2.0 capped at 30 s, jitter on, failure-storm guard off (enable with storm_pause; once enabled, failure-score half-life 30 s and threshold 5.0).

Runs go through a ProcessRunnerJobRunner by default (each incarnation in its own private kill-on-drop group). Inject another with with_runner: a &ProcessGroup supervises every incarnation inside one shared group, and a ScriptedRunner makes supervision logic fully hermetic in tests.

Implementations§

Source§

impl Supervisor<JobRunner>

Source

pub fn new(command: Command) -> Self

Supervise command with the default JobRunner (a fresh private kill-on-drop group per incarnation).

Source§

impl<R: ProcessRunner> Supervisor<R>

Source

pub fn with_runner<R2: ProcessRunner>(self, runner: R2) -> Supervisor<R2>

Run every incarnation through runner instead of the default JobRunner — e.g. a &ProcessGroup for one shared kill-on-drop group, or a test double for hermetic supervision tests.

With a shared group, the group’s state applies to every incarnation: notably, restarting into a suspended group on the Linux cgroup mechanism spawns the new child frozen (see the ProcessGroup::suspend docs, process-control feature) — resume the group before supervising into it.

Source

pub fn restart(self, policy: RestartPolicy) -> Self

When to restart (default: OnCrash).

Source

pub fn max_restarts(self, n: u32) -> Self

Restart at most n times — n + 1 total runs (default: unlimited).

Source

pub fn backoff(self, base: Duration, factor: f64) -> Self

Exponential backoff before each restart: the n-th restart (0-based) waits base × factor^n, capped by max_backoff. A factor below 1.0 (or non-finite) is treated as 1.0. Default: 200ms × 2.0.

Source

pub fn max_backoff(self, cap: Duration) -> Self

Cap any single backoff delay (default: 30 s).

Source

pub fn jitter(self, enabled: bool) -> Self

Multiply each backoff delay by a uniform factor in [0.5, 1.5) (default: on), so a fleet of supervised workers restarted by the same incident doesn’t stampede back in lockstep. Disable for deterministic delays.

Source

pub fn storm_pause(self, pause: Duration) -> Self

Enable the failure-storm guard: when crash-restarts cluster faster than the failure score can decay (see failure_decay / failure_threshold), pause restarts once for pause — jittered into [0.5, 1.5) of the nominal value per jitter — then reset the score and resume. Off by default; this is the master switch, the other two knobs only tune it.

Each failed run adds 1 to a score that halves every failure_decay: score = score × 0.5^(Δt / failure_decay) + 1. A service that fails rarely never accumulates past the threshold; a storm trips it and gets one collective pause instead of hammering restarts at backoff speed. (Design borrowed from Go’s suture supervisor — the idea, not the code.)

Only failures feed the score: crashes and spawn errors. A clean exit restarted under Always is not a failure. The storm pause stacks with (runs before) the per-restart backoff, and max_restarts is checked first — a storm pause never resurrects an exhausted budget. Pauses taken are reported in SupervisionOutcome::storm_pauses.

Source

pub fn failure_decay(self, decay: Duration) -> Self

Half-life of the failure score used by the storm guard (default: 30 s): every decay seconds without a failure, the accumulated score halves. A zero half-life keeps no history — every failure scores exactly 1, so the guard trips only with a threshold below 1.0. No effect unless storm_pause is set.

Source

pub fn failure_threshold(self, threshold: f64) -> Self

Failure score above which the storm guard trips (default: 5.0 — roughly “more than five failures inside one half-life”). A non-finite threshold never trips. No effect unless storm_pause is set.

Source

pub fn stop_when( self, predicate: impl Fn(&ProcessResult<String>) -> bool + Send + Sync + 'static, ) -> Self

End supervision when predicate matches a completed run — checked before the RestartPolicy on every exit, clean or not. (It never sees a run that failed to start; spawn errors are classified by the policy alone.)

Source

pub async fn run(self) -> Result<SupervisionOutcome>

Supervise until the policy, the predicate, or the restart budget ends it, and report the SupervisionOutcome.

§Errors

Returns Err only when the terminating attempt failed to produce a result at all (a spawn/IO failure when no further restart is allowed) — there is no final ProcessResult to report in that case. A spawn failure with restarts remaining counts as a crash and is retried.

§Cancellation

Dropping this future mid-run abandons the in-flight incarnation. With the default JobRunner it is killed on drop (the incarnation owns a private group); with a shared-group runner (with_runner(&group)) the incarnation stays alive in the caller’s group until the group tears it down.

An incarnation cancelled via its token (Command::cancel_on, with the cancellation feature) is terminal: supervision returns that Error::Cancelled immediately, regardless of policy or budget — the token stays cancelled, so a restart would only be cancelled again.

Trait Implementations§

Source§

impl<R: ProcessRunner> Debug for Supervisor<R>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<R = JobRunner> !RefUnwindSafe for Supervisor<R>

§

impl<R = JobRunner> !UnwindSafe for Supervisor<R>

§

impl<R> Freeze for Supervisor<R>
where R: Freeze,

§

impl<R> Send for Supervisor<R>

§

impl<R> Sync for Supervisor<R>

§

impl<R> Unpin for Supervisor<R>
where R: Unpin,

§

impl<R> UnsafeUnpin for Supervisor<R>
where R: UnsafeUnpin,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Any for T
where T: Any,

Source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Source§

fn type_name(&self) -> &'static str

Source§

impl<T> AnySync for T
where T: Any + Send + Sync,

Source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more