ractor_supervisor/
lib.rs

1//! # ractor-supervisor
2//!
3//! An **OTP-style supervisor** for the [`ractor`](https://docs.rs/ractor) framework—helping you build **supervision trees** in a straightforward, Rust-centric way.
4//!
5//! Inspired by the Elixir/Erlang supervision concept, `ractor-supervisor` provides a robust mechanism for overseeing **one or more child actors** and automatically restarting them under configurable policies. If too many restarts happen in a brief time window—a "meltdown"—the supervisor itself shuts down abnormally, preventing errant restart loops.
6//!
7//! ## Supervisor Types
8//!
9//! This crate provides three types of supervisors, each designed for specific use cases:
10//!
11//! ### 1. Static Supervisor (`Supervisor`)
12//! - Manages a fixed set of children defined at startup
13//! - Supports all supervision strategies (OneForOne, OneForAll, RestForOne)
14//! - Best for static actor hierarchies where child actors are known at startup
15//! - Example: A web server with predefined worker pools, cache managers, and connection handlers
16//!
17//! ### 2. Dynamic Supervisor (`DynamicSupervisor`)
18//! - Allows adding/removing children at runtime
19//! - Uses OneForOne strategy only (each child managed independently)
20//! - Optional `max_children` limit
21//! - Best for dynamic workloads where children are spawned/terminated on demand
22//! - Example: A job queue processor that spawns worker actors based on load
23//!
24//! ### 3. Task Supervisor (`TaskSupervisor`)
25//! - Specialized version of DynamicSupervisor for managing async tasks
26//! - Wraps futures in actor tasks that can be supervised
27//! - Simpler API focused on task execution rather than actor management
28//! - Best for background jobs, periodic tasks, or any async work needing supervision
29//! - Example: Scheduled jobs, background data processing, or cleanup tasks
30//!
31//! ## Supervision Strategies
32//!
33//! The strategy defines what happens when a child fails:
34//!
35//! - **OneForOne**: Only the failing child is restarted.
36//! - **OneForAll**: If any child fails, all children are stopped and restarted.
37//! - **RestForOne**: The failing child and all subsequently started children (in definition order) are stopped and restarted.
38//!
39//! Strategies apply to **all failure scenarios**, including:
40//! - Spawn errors (failures in `pre_start`/`post_start`)
41//! - Runtime panics
42//! - Normal and abnormal exits
43//!
44//! Example: If spawning a child fails during `pre_start`, it will count as a restart and trigger strategy logic.
45//!
46//! ## Common Features
47//!
48//! All supervisor types share these core features:
49//!
50//! ### Restart Policies
51//! - **Permanent**: Always restart, no matter how the child exited.
52//! - **Transient**: Restart only if the child exited abnormally (panic or error).
53//! - **Temporary**: Never restart, regardless of exit reason.
54//!
55//! ### Meltdown Logic
56//! - **`max_restarts`** and **`max_window`**: The "time window" for meltdown counting, expressed as a [`Duration`]. If more than `max_restarts` occur within `max_window`, the supervisor shuts down abnormally (meltdown).
57//! - **`reset_after`**: If the supervisor sees no failures for the specified duration, it clears its meltdown log and effectively "resets" the meltdown counters.
58//!
59//! ### Child-Level Features
60//! - **`reset_after`** (per child): If a specific child remains up for the given duration, its own failure count is reset to zero on the next failure.
61//! - **`backoff_fn`**: An optional function to delay a child's restart. For instance, you might implement exponential backoff to prevent immediate thrashing restarts.
62//!
63//! ## Choosing the Right Supervisor
64//!
65//! 1. Use `Supervisor` when:
66//!    - Your actor hierarchy is known at startup
67//!    - You need OneForAll or RestForOne strategies
68//!    - Children are long-lived and relatively static
69//!
70//! 2. Use `DynamicSupervisor` when:
71//!    - Children need to be added/removed at runtime
72//!    - Each child is independent (OneForOne is sufficient)
73//!    - You need to limit the total number of children
74//!
75//! 3. Use `TaskSupervisor` when:
76//!    - You're working with futures/async tasks rather than full actors
77//!    - Tasks are short-lived or periodic
78//!    - You want a simpler API focused on task execution
79//!
80//! ## Important Requirements
81//!
82//! 1. **Actor Names**: Both supervisors and their child actors **must** have names set. These names are used for:
83//!    - Unique identification in the supervision tree
84//!    - Meltdown tracking and logging
85//!    - Global actor registry
86//!
87//! 2. **Proper Spawning**: When spawning supervisors or child actors, always use:
88//!    - [`Supervisor::spawn_linked`] or [`Supervisor::spawn`] for static supervisors
89//!    - [`DynamicSupervisor::spawn_linked`] or [`DynamicSupervisor::spawn`] for dynamic supervisors
90//!    - Do NOT use the generic [`Actor::spawn_linked`] directly
91//!
92//! ## Multi-Level Supervision Trees
93//!
94//! Supervisors can manage other **supervisors** as children, forming a **hierarchical** or **tree** structure. This way, different subsystems can each have their own meltdown thresholds or strategies. A meltdown in one subtree doesn't necessarily mean the entire application must go down, unless the top-level supervisor is triggered.
95//!
96//! For example:
97//! ```text
98//! Root Supervisor (Static, OneForOne)
99//! ├── API Supervisor (Static, OneForAll)
100//! │   ├── HTTP Server
101//! │   └── WebSocket Server
102//! ├── Worker Supervisor (Dynamic)
103//! │   └── [Dynamic Worker Pool]
104//! └── Task Supervisor
105//!     └── [Background Jobs]
106//! ```
107//!
108//! ## Example Usage
109//!
110//! Here's a complete example using a static supervisor:
111//!
112//! ```rust
113//! use ractor::Actor;
114//! use ractor_supervisor::*;
115//! use ractor::concurrency::Duration;
116//! use tokio::time::Instant;
117//! use futures_util::FutureExt;
118//!
119//! // A minimal child actor that simply does some work in `handle`.
120//! struct MyWorker;
121//!
122//! #[cfg_attr(feature = "async-trait", ractor::async_trait)]
123//! impl Actor for MyWorker {
124//!     type Msg = ();
125//!     type State = ();
126//!     type Arguments = ();
127//!
128//!     // Called before the actor fully starts. We can set up the actor's internal state here.
129//!     async fn pre_start(
130//!         &self,
131//!         _myself: ractor::ActorRef<Self::Msg>,
132//!         _args: Self::Arguments,
133//!     ) -> Result<Self::State, ractor::ActorProcessingErr> {
134//!         Ok(())
135//!     }
136//!
137//!     // The main message handler. This is where you implement your actor's behavior.
138//!     async fn handle(
139//!         &self,
140//!         _myself: ractor::ActorRef<Self::Msg>,
141//!         _msg: Self::Msg,
142//!         _state: &mut Self::State
143//!     ) -> Result<(), ractor::ActorProcessingErr> {
144//!         // do some work...
145//!         Ok(())
146//!     }
147//! }
148//!
149//! // A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
150//! async fn spawn_my_worker(
151//!     supervisor_cell: ractor::ActorCell,
152//!     child_id: String
153//! ) -> Result<ractor::ActorCell, ractor::SpawnErr> {
154//!     // We name the child actor using `child_spec.id` (though naming is optional).
155//!     let (child_ref, _join) = Supervisor::spawn_linked(
156//!         child_id,                    // actor name
157//!         MyWorker,                    // actor instance
158//!         (),                          // arguments
159//!         supervisor_cell             // link to the supervisor
160//!     ).await?;
161//!     Ok(child_ref.get_cell())
162//! }
163//!
164//! #[tokio::main]
165//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
166//!     // A child-level backoff function that implements exponential backoff after the second failure.
167//!     // Return Some(delay) to make the supervisor wait before restarting this child.
168//!     let my_backoff: ChildBackoffFn = ChildBackoffFn::new(
169//!         |_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<Duration>| {
170//!             // On the first failure, restart immediately (None).
171//!             // After the second failure, double the delay each time (exponential).
172//!             if restart_count <= 1 {
173//!                 None
174//!             } else {
175//!                 Some(Duration::from_secs(1 << restart_count))
176//!             }
177//!         }
178//!     );
179//!
180//!     // This specification describes exactly how to manage our single child actor.
181//!     let child_spec = ChildSpec {
182//!         id: "myworker".into(),  // Unique identifier for meltdown logs and debugging.
183//!         restart: Restart::Transient, // Only restart if the child fails abnormally.
184//!         spawn_fn: SpawnFn::new(|cell, id| spawn_my_worker(cell, id)),
185//!         backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
186//!         // If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
187//!         reset_after: Some(Duration::from_secs(60)),
188//!     };
189//!
190//!     // Supervisor-level meltdown configuration. If more than 5 restarts occur within a 10s window, meltdown is triggered.
191//!     // Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
192//!     let options = SupervisorOptions {
193//!         strategy: SupervisorStrategy::OneForOne,  // If one child fails, only that child is restarted.
194//!         max_restarts: 5,               // Permit up to 5 restarts in the meltdown window.
195//!         max_window: Duration::from_secs(10),  // The meltdown window.
196//!         reset_after: Some(Duration::from_secs(30)), // If no failures for 30s, meltdown log is cleared.
197//!     };
198//!
199//!     // Group all child specs and meltdown options together:
200//!     let args = SupervisorArguments {
201//!         child_specs: vec![child_spec], // We only have one child in this example
202//!         options,
203//!     };
204//!
205//!     // Spawn the supervisor with our arguments.
206//!     let (sup_ref, sup_handle) = Supervisor::spawn(
207//!         "root".into(), // name for the supervisor
208//!         args
209//!     ).await?;
210//!
211//!     let _ = sup_ref.kill();
212//!     let _ = sup_handle.await;
213//!
214//!     Ok(())
215//! }
216//! ```
217//!
218//! For more examples, see:
219//! - [`Supervisor`] for static supervision
220//! - [`DynamicSupervisor`] for dynamic child management
221//! - [`TaskSupervisor`] for supervised async tasks
222//!
223pub mod core;
224pub mod dynamic;
225pub mod supervisor;
226pub mod task;
227
228pub use core::*;
229pub use dynamic::*;
230pub use supervisor::*;
231pub use task::*;