ractor_supervisor

Module supervisor

Source
Expand description

§ractor-supervisor

An OTP-style supervisor for the ractor framework—helping you build supervision trees in a straightforward, Rust-centric way.

Inspired by the Elixir/Erlang supervision concept, ractor-supervisor provides a robust mechanism for overseeing one or more child actors and automatically restarting them under configurable policies. If too many restarts happen in a brief time window—a “meltdown”—the supervisor itself shuts down abnormally, preventing errant restart loops.

Goal: Make it easier to define, configure, and maintain supervision trees in your ractor-based applications. With multiple restart policies, flexible supervision strategies, custom backoff support, and meltdown counters, ractor-supervisor helps you keep your actor systems both resilient and performant.

§Overview

§Supervision Strategies

  • OneForOne: Only the failing child is restarted.
  • OneForAll: If any child fails, all children are stopped and restarted.
  • RestForOne: The failing child and all subsequent children (as defined in order) are stopped and restarted.

§Restart Policies

  • Permanent: Always restart, no matter how the child exited.
  • Transient: Restart only if the child exited abnormally (panic or error).
  • Temporary: Never restart, regardless of exit reason.

§Meltdown Logic

  • max_restarts and max_seconds: The “time window” for meltdown counting. If more than max_restarts occur within max_seconds, the supervisor shuts down abnormally (meltdown).
  • restart_counter_reset_after: If the supervisor sees no failures for this many seconds, it clears its meltdown log and effectively “resets” the meltdown counters.

§Child-Level Resets & Backoff

  • restart_counter_reset_after (per child): If a specific child remains up for that many seconds, its own failure count is reset to zero on the next failure.
  • backoff_fn: An optional function to delay a child’s restart. For instance, you might implement exponential backoff to prevent immediate thrashing restarts.

§Usage

  1. Define one or more child actors by implementing Actor.
  2. For each child, create a ChildSpec with:
    • A Restart policy,
    • A spawn_fn that links the child to its supervisor,
    • Optional backoff_fn / meltdown resets.
  3. Configure SupervisorOptions, specifying meltdown thresholds (max_restarts, max_seconds) and a supervision Strategy.
  4. Pass those into SupervisorArguments and spawn your Supervisor via Actor::spawn(...).

You can also nest supervisors to build multi-level supervision trees—simply treat a supervisor as a “child” of another supervisor by specifying its own ChildSpec. This structure allows you to partition failure domains and maintain more complex actor systems in a structured, fault-tolerant manner.

If meltdown conditions are reached, the supervisor stops itself abnormally to prevent runaway restart loops.

§Example

use ractor::Actor;
use ractor_supervisor::*; // assuming your crate is named ractor_supervisor
use std::{time::Duration, sync::Arc};
use tokio::time::Instant;
use futures_util::FutureExt;

// A minimal child actor that simply does some work in `handle`.
struct MyWorker;

#[ractor::async_trait]
impl Actor for MyWorker {
    type Msg = ();
    type State = ();
    type Arguments = ();

    // Called before the actor fully starts. We can set up the actor’s internal state here.
    async fn pre_start(
        &self,
        _myself: ractor::ActorRef<Self::Msg>,
        _args: Self::Arguments,
    ) -> Result<Self::State, ractor::ActorProcessingErr> {
        Ok(())
    }

    // The main message handler. This is where you implement your actor’s behavior.
    async fn handle(
        &self,
        _myself: ractor::ActorRef<Self::Msg>,
        _msg: Self::Msg,
        _state: &mut Self::State
    ) -> Result<(), ractor::ActorProcessingErr> {
        // do some work...
        Ok(())
    }
}

// A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
async fn spawn_my_worker(
    supervisor_cell: ractor::ActorCell,
    child_id: String
) -> Result<ractor::ActorCell, ractor::SpawnErr> {
    // We name the child actor using `child_spec.id` (though naming is optional).
    let (child_ref, _join) = MyWorker::spawn_linked(
        Some(child_id), // actor name
        MyWorker,                    // actor instance
        (),                          // arguments
        supervisor_cell             // link to the supervisor
    ).await?;
    Ok(child_ref.get_cell())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // A child-level backoff function that implements exponential backoff after the second failure.
    // Return Some(delay) to make the supervisor wait before restarting this child.
    let my_backoff: ChildBackoffFn = Arc::new(
        |_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<u64>| {
            // On the first failure, restart immediately (None).
            // After the second failure, double the delay each time (exponential).
            if restart_count <= 1 {
                None
            } else {
                Some(Duration::from_secs(1 << restart_count))
            }
        }
    );

    // This specification describes exactly how to manage our single child actor.
    let child_spec = ChildSpec {
        id: "myworker".into(),  // Unique identifier for meltdown logs and debugging.
        restart: Restart::Transient, // Only restart if the child fails abnormally.
        spawn_fn: Box::new(|cell, id| spawn_my_worker(cell, id).boxed()),
        backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
        // If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
        restart_counter_reset_after: Some(60),
    };

    // Supervisor-level meltdown configuration. If more than 5 restarts occur within 10s, meltdown is triggered.
    // Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
    let options = SupervisorOptions {
        strategy: Strategy::OneForOne,  // If one child fails, only that child is restarted.
        max_restarts: 5,               // Permit up to 5 restarts in the meltdown window.
        max_seconds: 10,               // The meltdown window (in seconds).
        restart_counter_reset_after: Some(30), // If no failures for 30s, meltdown log is cleared.
    };

    // Group all child specs and meltdown options together:
    let args = SupervisorArguments {
        child_specs: vec![child_spec], // We only have one child in this example
        options,
    };

    // Spawn the supervisor with our arguments.
    let (sup_ref, sup_handle) = Actor::spawn(
        None,        // no name for the supervisor
        Supervisor,  // the Supervisor actor
        args
    ).await?;

    let _ = sup_ref.kill();
    let _ = sup_handle.await;

    Ok(())
}

Structs§

ChildFailureState
Internal tracking of a child’s failure count and the last time it failed.
ChildSpec
Defines how to spawn and manage a single child actor.
InspectableState
A snapshot of the supervisor’s state, used mainly for testing or debugging.
Contains copies of the supervisor’s “running” children, child failure counters, and meltdown log.
RestartLog
Each time we restart a child, we store a record for meltdown counting: (child_id, when).
Supervisor
The supervisor actor itself.
Spawns its children in post_start, listens for child failures, and restarts them if needed.
If meltdown occurs, it returns an error to end abnormally (thus skipping post_stop).
SupervisorArguments
The arguments needed to spawn the supervisor.
SupervisorOptions
Supervisor-level meltdown policy.
SupervisorState
Holds the supervisor’s live state: which children are running, how many times each child has failed, etc.

Enums§

Restart
Defines how a child actor is restarted after it exits.
Strategy
The supervision strategy for this supervisor’s children.
SupervisorError
Possible errors from the supervisor’s logic.
SupervisorMsg
Internal messages that instruct the supervisor to spawn a child, triggered by its meltdown logic.

Type Aliases§

ChildBackoffFn
A function pointer for computing child-level backoff delays before re-spawning a child.
SpawnFn
User-provided closure to spawn a child. You typically call Actor::spawn_linked here.
SpawnFuture
The future returned by a SpawnFn.