ractor_supervisor/lib.rs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231
//! # ractor-supervisor
//!
//! An **OTP-style supervisor** for the [`ractor`](https://docs.rs/ractor) framework—helping you build **supervision trees** in a straightforward, Rust-centric way.
//!
//! Inspired by the Elixir/Erlang supervision concept, `ractor-supervisor` provides a robust mechanism for overseeing **one or more child actors** and automatically restarting them under configurable policies. If too many restarts happen in a brief time window—a "meltdown"—the supervisor itself shuts down abnormally, preventing errant restart loops.
//!
//! ## Supervisor Types
//!
//! This crate provides three types of supervisors, each designed for specific use cases:
//!
//! ### 1. Static Supervisor (`Supervisor`)
//! - Manages a fixed set of children defined at startup
//! - Supports all supervision strategies (OneForOne, OneForAll, RestForOne)
//! - Best for static actor hierarchies where child actors are known at startup
//! - Example: A web server with predefined worker pools, cache managers, and connection handlers
//!
//! ### 2. Dynamic Supervisor (`DynamicSupervisor`)
//! - Allows adding/removing children at runtime
//! - Uses OneForOne strategy only (each child managed independently)
//! - Optional `max_children` limit
//! - Best for dynamic workloads where children are spawned/terminated on demand
//! - Example: A job queue processor that spawns worker actors based on load
//!
//! ### 3. Task Supervisor (`TaskSupervisor`)
//! - Specialized version of DynamicSupervisor for managing async tasks
//! - Wraps futures in actor tasks that can be supervised
//! - Simpler API focused on task execution rather than actor management
//! - Best for background jobs, periodic tasks, or any async work needing supervision
//! - Example: Scheduled jobs, background data processing, or cleanup tasks
//!
//! ## Supervision Strategies
//!
//! The strategy defines what happens when a child fails:
//!
//! - **OneForOne**: Only the failing child is restarted.
//! - **OneForAll**: If any child fails, all children are stopped and restarted.
//! - **RestForOne**: The failing child and all subsequent children (in definition order) are stopped and restarted.
//!
//! Strategies apply to **all failure scenarios**, including:
//! - Spawn errors (failures in `pre_start`/`post_start`)
//! - Runtime panics
//! - Normal and abnormal exits
//!
//! Example: If spawning a child fails during pre_start, it will count as a restart and trigger strategy logic.
//!
//! ## Common Features
//!
//! All supervisor types share these core features:
//!
//! ### Restart Policies
//! - **Permanent**: Always restart, no matter how the child exited.
//! - **Transient**: Restart only if the child exited abnormally (panic or error).
//! - **Temporary**: Never restart, regardless of exit reason.
//!
//! ### Meltdown Logic
//! - **`max_restarts`** and **`max_seconds`**: The "time window" for meltdown counting. If more than `max_restarts` occur within `max_seconds`, the supervisor shuts down abnormally (meltdown).
//! - **`restart_counter_reset_after`**: If the supervisor sees no failures for this many seconds, it clears its meltdown log and effectively "resets" the meltdown counters.
//!
//! ### Child-Level Features
//! - **`restart_counter_reset_after`** (per child): If a specific child remains up for that many seconds, its own failure count is reset to zero on the next failure.
//! - **`backoff_fn`**: An optional function to delay a child's restart. For instance, you might implement exponential backoff to prevent immediate thrashing restarts.
//!
//! ## Choosing the Right Supervisor
//!
//! 1. Use `Supervisor` when:
//! - Your actor hierarchy is known at startup
//! - You need OneForAll or RestForOne strategies
//! - Children are long-lived and relatively static
//!
//! 2. Use `DynamicSupervisor` when:
//! - Children need to be added/removed at runtime
//! - Each child is independent (OneForOne is sufficient)
//! - You need to limit the total number of children
//!
//! 3. Use `TaskSupervisor` when:
//! - You're working with futures/async tasks rather than full actors
//! - Tasks are short-lived or periodic
//! - You want a simpler API focused on task execution
//!
//! ## Important Requirements
//!
//! 1. **Actor Names**: Both supervisors and their child actors **must** have names set. These names are used for:
//! - Unique identification in the supervision tree
//! - Meltdown tracking and logging
//! - Global actor registry
//!
//! 2. **Proper Spawning**: When spawning supervisors or child actors, always use:
//! - [`Supervisor::spawn_linked`] or [`Supervisor::spawn`] for static supervisors
//! - [`DynamicSupervisor::spawn_linked`] or [`DynamicSupervisor::spawn`] for dynamic supervisors
//! - Do NOT use the generic [`Actor::spawn_linked`] directly
//!
//! ## Multi-Level Supervision Trees
//!
//! Supervisors can manage other **supervisors** as children, forming a **hierarchical** or **tree** structure. This way, different subsystems can each have their own meltdown thresholds or strategies. A meltdown in one subtree doesn't necessarily mean the entire application must go down, unless the top-level supervisor is triggered.
//!
//! For example:
//! ```text
//! Root Supervisor (Static, OneForOne)
//! ├── API Supervisor (Static, OneForAll)
//! │ ├── HTTP Server
//! │ └── WebSocket Server
//! ├── Worker Supervisor (Dynamic)
//! │ └── [Dynamic Worker Pool]
//! └── Task Supervisor
//! └── [Background Jobs]
//! ```
//!
//! ## Example Usage
//!
//! Here's a complete example using a static supervisor:
//!
//! ```rust
//! use ractor::Actor;
//! use ractor_supervisor::*;
//! use std::{time::Duration, sync::Arc};
//! use tokio::time::Instant;
//! use futures_util::FutureExt;
//!
//! // A minimal child actor that simply does some work in `handle`.
//! struct MyWorker;
//!
//! #[ractor::async_trait]
//! impl Actor for MyWorker {
//! type Msg = ();
//! type State = ();
//! type Arguments = ();
//!
//! // Called before the actor fully starts. We can set up the actor's internal state here.
//! async fn pre_start(
//! &self,
//! _myself: ractor::ActorRef<Self::Msg>,
//! _args: Self::Arguments,
//! ) -> Result<Self::State, ractor::ActorProcessingErr> {
//! Ok(())
//! }
//!
//! // The main message handler. This is where you implement your actor's behavior.
//! async fn handle(
//! &self,
//! _myself: ractor::ActorRef<Self::Msg>,
//! _msg: Self::Msg,
//! _state: &mut Self::State
//! ) -> Result<(), ractor::ActorProcessingErr> {
//! // do some work...
//! Ok(())
//! }
//! }
//!
//! // A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
//! async fn spawn_my_worker(
//! supervisor_cell: ractor::ActorCell,
//! child_id: String
//! ) -> Result<ractor::ActorCell, ractor::SpawnErr> {
//! // We name the child actor using `child_spec.id` (though naming is optional).
//! let (child_ref, _join) = Supervisor::spawn_linked(
//! child_id, // actor name
//! MyWorker, // actor instance
//! (), // arguments
//! supervisor_cell // link to the supervisor
//! ).await?;
//! Ok(child_ref.get_cell())
//! }
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//! // A child-level backoff function that implements exponential backoff after the second failure.
//! // Return Some(delay) to make the supervisor wait before restarting this child.
//! let my_backoff: ChildBackoffFn = Arc::new(
//! |_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<u64>| {
//! // On the first failure, restart immediately (None).
//! // After the second failure, double the delay each time (exponential).
//! if restart_count <= 1 {
//! None
//! } else {
//! Some(Duration::from_secs(1 << restart_count))
//! }
//! }
//! );
//!
//! // This specification describes exactly how to manage our single child actor.
//! let child_spec = ChildSpec {
//! id: "myworker".into(), // Unique identifier for meltdown logs and debugging.
//! restart: Restart::Transient, // Only restart if the child fails abnormally.
//! spawn_fn: Arc::new(|cell, id| spawn_my_worker(cell, id).boxed()),
//! backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
//! // If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
//! restart_counter_reset_after: Some(60),
//! };
//!
//! // Supervisor-level meltdown configuration. If more than 5 restarts occur within 10s, meltdown is triggered.
//! // Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
//! let options = SupervisorOptions {
//! strategy: SupervisorStrategy::OneForOne, // If one child fails, only that child is restarted.
//! max_restarts: 5, // Permit up to 5 restarts in the meltdown window.
//! max_seconds: 10, // The meltdown window (in seconds).
//! restart_counter_reset_after: Some(30), // If no failures for 30s, meltdown log is cleared.
//! };
//!
//! // Group all child specs and meltdown options together:
//! let args = SupervisorArguments {
//! child_specs: vec![child_spec], // We only have one child in this example
//! options,
//! };
//!
//! // Spawn the supervisor with our arguments.
//! let (sup_ref, sup_handle) = Supervisor::spawn(
//! "root".into(), // name for the supervisor
//! args
//! ).await?;
//!
//! let _ = sup_ref.kill();
//! let _ = sup_handle.await;
//!
//! Ok(())
//! }
//! ```
//!
//! For more examples, see:
//! - [`Supervisor`] for static supervision
//! - [`DynamicSupervisor`] for dynamic child management
//! - [`TaskSupervisor`] for supervised async tasks
//!
pub mod core;
pub mod dynamic;
pub mod supervisor;
pub mod task;
pub use core::*;
pub use dynamic::*;
pub use supervisor::*;
pub use task::*;