ractor_supervisor/
lib.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
//! # ractor-supervisor
//!
//! An **OTP-style supervisor** for the [`ractor`](https://docs.rs/ractor) framework—helping you build **supervision trees** in a straightforward, Rust-centric way.
//!
//! Inspired by the Elixir/Erlang supervision concept, `ractor-supervisor` provides a robust mechanism for overseeing **one or more child actors** and automatically restarting them under configurable policies. If too many restarts happen in a brief time window—a "meltdown"—the supervisor itself shuts down abnormally, preventing errant restart loops.
//!
//! ## Supervisor Types
//!
//! This crate provides three types of supervisors, each designed for specific use cases:
//!
//! ### 1. Static Supervisor (`Supervisor`)
//! - Manages a fixed set of children defined at startup
//! - Supports all supervision strategies (OneForOne, OneForAll, RestForOne)
//! - Best for static actor hierarchies where child actors are known at startup
//! - Example: A web server with predefined worker pools, cache managers, and connection handlers
//!
//! ### 2. Dynamic Supervisor (`DynamicSupervisor`)
//! - Allows adding/removing children at runtime
//! - Uses OneForOne strategy only (each child managed independently)
//! - Optional `max_children` limit
//! - Best for dynamic workloads where children are spawned/terminated on demand
//! - Example: A job queue processor that spawns worker actors based on load
//!
//! ### 3. Task Supervisor (`TaskSupervisor`)
//! - Specialized version of DynamicSupervisor for managing async tasks
//! - Wraps futures in actor tasks that can be supervised
//! - Simpler API focused on task execution rather than actor management
//! - Best for background jobs, periodic tasks, or any async work needing supervision
//! - Example: Scheduled jobs, background data processing, or cleanup tasks
//!
//! ## Supervision Strategies
//!
//! The strategy defines what happens when a child fails:
//!
//! - **OneForOne**: Only the failing child is restarted.
//! - **OneForAll**: If any child fails, all children are stopped and restarted.
//! - **RestForOne**: The failing child and all subsequent children (in definition order) are stopped and restarted.
//!
//! Strategies apply to **all failure scenarios**, including:
//! - Spawn errors (failures in `pre_start`/`post_start`)
//! - Runtime panics
//! - Normal and abnormal exits
//!
//! Example: If spawning a child fails during pre_start, it will count as a restart and trigger strategy logic.
//!
//! ## Common Features
//!
//! All supervisor types share these core features:
//!
//! ### Restart Policies
//! - **Permanent**: Always restart, no matter how the child exited.
//! - **Transient**: Restart only if the child exited abnormally (panic or error).
//! - **Temporary**: Never restart, regardless of exit reason.
//!
//! ### Meltdown Logic
//! - **`max_restarts`** and **`max_seconds`**: The "time window" for meltdown counting. If more than `max_restarts` occur within `max_seconds`, the supervisor shuts down abnormally (meltdown).
//! - **`restart_counter_reset_after`**: If the supervisor sees no failures for this many seconds, it clears its meltdown log and effectively "resets" the meltdown counters.
//!
//! ### Child-Level Features
//! - **`restart_counter_reset_after`** (per child): If a specific child remains up for that many seconds, its own failure count is reset to zero on the next failure.
//! - **`backoff_fn`**: An optional function to delay a child's restart. For instance, you might implement exponential backoff to prevent immediate thrashing restarts.
//!
//! ## Choosing the Right Supervisor
//!
//! 1. Use `Supervisor` when:
//!    - Your actor hierarchy is known at startup
//!    - You need OneForAll or RestForOne strategies
//!    - Children are long-lived and relatively static
//!
//! 2. Use `DynamicSupervisor` when:
//!    - Children need to be added/removed at runtime
//!    - Each child is independent (OneForOne is sufficient)
//!    - You need to limit the total number of children
//!
//! 3. Use `TaskSupervisor` when:
//!    - You're working with futures/async tasks rather than full actors
//!    - Tasks are short-lived or periodic
//!    - You want a simpler API focused on task execution
//!
//! ## Important Requirements
//!
//! 1. **Actor Names**: Both supervisors and their child actors **must** have names set. These names are used for:
//!    - Unique identification in the supervision tree
//!    - Meltdown tracking and logging
//!    - Global actor registry
//!
//! 2. **Proper Spawning**: When spawning supervisors or child actors, always use:
//!    - [`Supervisor::spawn_linked`] or [`Supervisor::spawn`] for static supervisors
//!    - [`DynamicSupervisor::spawn_linked`] or [`DynamicSupervisor::spawn`] for dynamic supervisors
//!    - Do NOT use the generic [`Actor::spawn_linked`] directly
//!
//! ## Multi-Level Supervision Trees
//!
//! Supervisors can manage other **supervisors** as children, forming a **hierarchical** or **tree** structure. This way, different subsystems can each have their own meltdown thresholds or strategies. A meltdown in one subtree doesn't necessarily mean the entire application must go down, unless the top-level supervisor is triggered.
//!
//! For example:
//! ```text
//! Root Supervisor (Static, OneForOne)
//! ├── API Supervisor (Static, OneForAll)
//! │   ├── HTTP Server
//! │   └── WebSocket Server
//! ├── Worker Supervisor (Dynamic)
//! │   └── [Dynamic Worker Pool]
//! └── Task Supervisor
//!     └── [Background Jobs]
//! ```
//!
//! ## Example Usage
//!
//! Here's a complete example using a static supervisor:
//!
//! ```rust
//! use ractor::Actor;
//! use ractor_supervisor::*;
//! use std::{time::Duration, sync::Arc};
//! use tokio::time::Instant;
//! use futures_util::FutureExt;
//!
//! // A minimal child actor that simply does some work in `handle`.
//! struct MyWorker;
//!
//! #[ractor::async_trait]
//! impl Actor for MyWorker {
//!     type Msg = ();
//!     type State = ();
//!     type Arguments = ();
//!
//!     // Called before the actor fully starts. We can set up the actor's internal state here.
//!     async fn pre_start(
//!         &self,
//!         _myself: ractor::ActorRef<Self::Msg>,
//!         _args: Self::Arguments,
//!     ) -> Result<Self::State, ractor::ActorProcessingErr> {
//!         Ok(())
//!     }
//!
//!     // The main message handler. This is where you implement your actor's behavior.
//!     async fn handle(
//!         &self,
//!         _myself: ractor::ActorRef<Self::Msg>,
//!         _msg: Self::Msg,
//!         _state: &mut Self::State
//!     ) -> Result<(), ractor::ActorProcessingErr> {
//!         // do some work...
//!         Ok(())
//!     }
//! }
//!
//! // A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
//! async fn spawn_my_worker(
//!     supervisor_cell: ractor::ActorCell,
//!     child_id: String
//! ) -> Result<ractor::ActorCell, ractor::SpawnErr> {
//!     // We name the child actor using `child_spec.id` (though naming is optional).
//!     let (child_ref, _join) = Supervisor::spawn_linked(
//!         child_id,                    // actor name
//!         MyWorker,                    // actor instance
//!         (),                          // arguments
//!         supervisor_cell             // link to the supervisor
//!     ).await?;
//!     Ok(child_ref.get_cell())
//! }
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//!     // A child-level backoff function that implements exponential backoff after the second failure.
//!     // Return Some(delay) to make the supervisor wait before restarting this child.
//!     let my_backoff: ChildBackoffFn = Arc::new(
//!         |_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<u64>| {
//!             // On the first failure, restart immediately (None).
//!             // After the second failure, double the delay each time (exponential).
//!             if restart_count <= 1 {
//!                 None
//!             } else {
//!                 Some(Duration::from_secs(1 << restart_count))
//!             }
//!         }
//!     );
//!
//!     // This specification describes exactly how to manage our single child actor.
//!     let child_spec = ChildSpec {
//!         id: "myworker".into(),  // Unique identifier for meltdown logs and debugging.
//!         restart: Restart::Transient, // Only restart if the child fails abnormally.
//!         spawn_fn: Arc::new(|cell, id| spawn_my_worker(cell, id).boxed()),
//!         backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
//!         // If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
//!         restart_counter_reset_after: Some(60),
//!     };
//!
//!     // Supervisor-level meltdown configuration. If more than 5 restarts occur within 10s, meltdown is triggered.
//!     // Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
//!     let options = SupervisorOptions {
//!         strategy: SupervisorStrategy::OneForOne,  // If one child fails, only that child is restarted.
//!         max_restarts: 5,               // Permit up to 5 restarts in the meltdown window.
//!         max_seconds: 10,               // The meltdown window (in seconds).
//!         restart_counter_reset_after: Some(30), // If no failures for 30s, meltdown log is cleared.
//!     };
//!
//!     // Group all child specs and meltdown options together:
//!     let args = SupervisorArguments {
//!         child_specs: vec![child_spec], // We only have one child in this example
//!         options,
//!     };
//!
//!     // Spawn the supervisor with our arguments.
//!     let (sup_ref, sup_handle) = Supervisor::spawn(
//!         "root".into(), // name for the supervisor
//!         args
//!     ).await?;
//!
//!     let _ = sup_ref.kill();
//!     let _ = sup_handle.await;
//!
//!     Ok(())
//! }
//! ```
//!
//! For more examples, see:
//! - [`Supervisor`] for static supervision
//! - [`DynamicSupervisor`] for dynamic child management
//! - [`TaskSupervisor`] for supervised async tasks
//!
pub mod core;
pub mod dynamic;
pub mod supervisor;
pub mod task;

pub use core::*;
pub use dynamic::*;
pub use supervisor::*;
pub use task::*;