ractor_supervisor/lib.rs
1//! # ractor-supervisor
2//!
3//! An **OTP-style supervisor** for the [`ractor`](https://docs.rs/ractor) framework—helping you build **supervision trees** in a straightforward, Rust-centric way.
4//!
5//! Inspired by the Elixir/Erlang supervision concept, `ractor-supervisor` provides a robust mechanism for overseeing **one or more child actors** and automatically restarting them under configurable policies. If too many restarts happen in a brief time window—a "meltdown"—the supervisor itself shuts down abnormally, preventing errant restart loops.
6//!
7//! ## Supervisor Types
8//!
9//! This crate provides three types of supervisors, each designed for specific use cases:
10//!
11//! ### 1. Static Supervisor (`Supervisor`)
12//! - Manages a fixed set of children defined at startup
13//! - Supports all supervision strategies (OneForOne, OneForAll, RestForOne)
14//! - Best for static actor hierarchies where child actors are known at startup
15//! - Example: A web server with predefined worker pools, cache managers, and connection handlers
16//!
17//! ### 2. Dynamic Supervisor (`DynamicSupervisor`)
18//! - Allows adding/removing children at runtime
19//! - Uses OneForOne strategy only (each child managed independently)
20//! - Optional `max_children` limit
21//! - Best for dynamic workloads where children are spawned/terminated on demand
22//! - Example: A job queue processor that spawns worker actors based on load
23//!
24//! ### 3. Task Supervisor (`TaskSupervisor`)
25//! - Specialized version of DynamicSupervisor for managing async tasks
26//! - Wraps futures in actor tasks that can be supervised
27//! - Simpler API focused on task execution rather than actor management
28//! - Best for background jobs, periodic tasks, or any async work needing supervision
29//! - Example: Scheduled jobs, background data processing, or cleanup tasks
30//!
31//! ## Supervision Strategies
32//!
33//! The strategy defines what happens when a child fails:
34//!
35//! - **OneForOne**: Only the failing child is restarted.
36//! - **OneForAll**: If any child fails, all children are stopped and restarted.
37//! - **RestForOne**: The failing child and all subsequently started children (in definition order) are stopped and restarted.
38//!
39//! Strategies apply to **all failure scenarios**, including:
40//! - Spawn errors (failures in `pre_start`/`post_start`)
41//! - Runtime panics
42//! - Normal and abnormal exits
43//!
44//! Example: If spawning a child fails during `pre_start`, it will count as a restart and trigger strategy logic.
45//!
46//! ## Common Features
47//!
48//! All supervisor types share these core features:
49//!
50//! ### Restart Policies
51//! - **Permanent**: Always restart, no matter how the child exited.
52//! - **Transient**: Restart only if the child exited abnormally (panic or error).
53//! - **Temporary**: Never restart, regardless of exit reason.
54//!
55//! ### Meltdown Logic
56//! - **`max_restarts`** and **`max_window`**: The "time window" for meltdown counting, expressed as a [`Duration`]. If more than `max_restarts` occur within `max_window`, the supervisor shuts down abnormally (meltdown).
57//! - **`reset_after`**: If the supervisor sees no failures for the specified duration, it clears its meltdown log and effectively "resets" the meltdown counters.
58//!
59//! ### Child-Level Features
60//! - **`reset_after`** (per child): If a specific child remains up for the given duration, its own failure count is reset to zero on the next failure.
61//! - **`backoff_fn`**: An optional function to delay a child's restart. For instance, you might implement exponential backoff to prevent immediate thrashing restarts.
62//!
63//! ## Choosing the Right Supervisor
64//!
65//! 1. Use `Supervisor` when:
66//! - Your actor hierarchy is known at startup
67//! - You need OneForAll or RestForOne strategies
68//! - Children are long-lived and relatively static
69//!
70//! 2. Use `DynamicSupervisor` when:
71//! - Children need to be added/removed at runtime
72//! - Each child is independent (OneForOne is sufficient)
73//! - You need to limit the total number of children
74//!
75//! 3. Use `TaskSupervisor` when:
76//! - You're working with futures/async tasks rather than full actors
77//! - Tasks are short-lived or periodic
78//! - You want a simpler API focused on task execution
79//!
80//! ## Important Requirements
81//!
82//! 1. **Actor Names**: Both supervisors and their child actors **must** have names set. These names are used for:
83//! - Unique identification in the supervision tree
84//! - Meltdown tracking and logging
85//! - Global actor registry
86//!
87//! 2. **Proper Spawning**: When spawning supervisors or child actors, always use:
88//! - [`Supervisor::spawn_linked`] or [`Supervisor::spawn`] for static supervisors
89//! - [`DynamicSupervisor::spawn_linked`] or [`DynamicSupervisor::spawn`] for dynamic supervisors
90//! - Do NOT use the generic [`Actor::spawn_linked`] directly
91//!
92//! ## Multi-Level Supervision Trees
93//!
94//! Supervisors can manage other **supervisors** as children, forming a **hierarchical** or **tree** structure. This way, different subsystems can each have their own meltdown thresholds or strategies. A meltdown in one subtree doesn't necessarily mean the entire application must go down, unless the top-level supervisor is triggered.
95//!
96//! For example:
97//! ```text
98//! Root Supervisor (Static, OneForOne)
99//! ├── API Supervisor (Static, OneForAll)
100//! │ ├── HTTP Server
101//! │ └── WebSocket Server
102//! ├── Worker Supervisor (Dynamic)
103//! │ └── [Dynamic Worker Pool]
104//! └── Task Supervisor
105//! └── [Background Jobs]
106//! ```
107//!
108//! ## Example Usage
109//!
110//! Here's a complete example using a static supervisor:
111//!
112//! ```rust
113//! use ractor::Actor;
114//! use ractor_supervisor::*;
115//! use ractor::concurrency::Duration;
116//! use tokio::time::Instant;
117//! use futures_util::FutureExt;
118//!
119//! // A minimal child actor that simply does some work in `handle`.
120//! struct MyWorker;
121//!
122//! #[cfg_attr(feature = "async-trait", ractor::async_trait)]
123//! impl Actor for MyWorker {
124//! type Msg = ();
125//! type State = ();
126//! type Arguments = ();
127//!
128//! // Called before the actor fully starts. We can set up the actor's internal state here.
129//! async fn pre_start(
130//! &self,
131//! _myself: ractor::ActorRef<Self::Msg>,
132//! _args: Self::Arguments,
133//! ) -> Result<Self::State, ractor::ActorProcessingErr> {
134//! Ok(())
135//! }
136//!
137//! // The main message handler. This is where you implement your actor's behavior.
138//! async fn handle(
139//! &self,
140//! _myself: ractor::ActorRef<Self::Msg>,
141//! _msg: Self::Msg,
142//! _state: &mut Self::State
143//! ) -> Result<(), ractor::ActorProcessingErr> {
144//! // do some work...
145//! Ok(())
146//! }
147//! }
148//!
149//! // A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
150//! async fn spawn_my_worker(
151//! supervisor_cell: ractor::ActorCell,
152//! child_id: String
153//! ) -> Result<ractor::ActorCell, ractor::SpawnErr> {
154//! // We name the child actor using `child_spec.id` (though naming is optional).
155//! let (child_ref, _join) = Supervisor::spawn_linked(
156//! child_id, // actor name
157//! MyWorker, // actor instance
158//! (), // arguments
159//! supervisor_cell // link to the supervisor
160//! ).await?;
161//! Ok(child_ref.get_cell())
162//! }
163//!
164//! #[tokio::main]
165//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
166//! // A child-level backoff function that implements exponential backoff after the second failure.
167//! // Return Some(delay) to make the supervisor wait before restarting this child.
168//! let my_backoff: ChildBackoffFn = ChildBackoffFn::new(
169//! |_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<Duration>| {
170//! // On the first failure, restart immediately (None).
171//! // After the second failure, double the delay each time (exponential).
172//! if restart_count <= 1 {
173//! None
174//! } else {
175//! Some(Duration::from_secs(1 << restart_count))
176//! }
177//! }
178//! );
179//!
180//! // This specification describes exactly how to manage our single child actor.
181//! let child_spec = ChildSpec {
182//! id: "myworker".into(), // Unique identifier for meltdown logs and debugging.
183//! restart: Restart::Transient, // Only restart if the child fails abnormally.
184//! spawn_fn: SpawnFn::new(|cell, id| spawn_my_worker(cell, id)),
185//! backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
186//! // If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
187//! reset_after: Some(Duration::from_secs(60)),
188//! };
189//!
190//! // Supervisor-level meltdown configuration. If more than 5 restarts occur within a 10s window, meltdown is triggered.
191//! // Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
192//! let options = SupervisorOptions {
193//! strategy: SupervisorStrategy::OneForOne, // If one child fails, only that child is restarted.
194//! max_restarts: 5, // Permit up to 5 restarts in the meltdown window.
195//! max_window: Duration::from_secs(10), // The meltdown window.
196//! reset_after: Some(Duration::from_secs(30)), // If no failures for 30s, meltdown log is cleared.
197//! };
198//!
199//! // Group all child specs and meltdown options together:
200//! let args = SupervisorArguments {
201//! child_specs: vec![child_spec], // We only have one child in this example
202//! options,
203//! };
204//!
205//! // Spawn the supervisor with our arguments.
206//! let (sup_ref, sup_handle) = Supervisor::spawn(
207//! "root".into(), // name for the supervisor
208//! args
209//! ).await?;
210//!
211//! let _ = sup_ref.kill();
212//! let _ = sup_handle.await;
213//!
214//! Ok(())
215//! }
216//! ```
217//!
218//! For more examples, see:
219//! - [`Supervisor`] for static supervision
220//! - [`DynamicSupervisor`] for dynamic child management
221//! - [`TaskSupervisor`] for supervised async tasks
222//!
223pub mod core;
224pub mod dynamic;
225pub mod supervisor;
226pub mod task;
227
228pub use core::*;
229pub use dynamic::*;
230pub use supervisor::*;
231pub use task::*;