1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
//! Resilient, self-healing async process hierarchies in the style of Erlang/OTP.
//!
//! In complex concurrent systems it is important to ensure that different processes are
//! well-synchronized. In particular, processes need to start in proper order and, in the event of
//! an error, be properly torn down and cleaned up. This problem becomes especially difficult when
//! building *robust* concurrent systems where partial failures should be handled gracefully, often
//! via some form of automated recovery of the failed components.
//!
//! Often these requirements are ignored or met via a collection of *ad hoc* mechanisms. If you've
//! ever found yourself spawning many asynchronous tasks and then wondering what to do with all the
//! `JoinHandle`s, Spry offers a very good default answer.
//!
//! **Spry** is a library supporting a set of common conventions for orderly startup, shutdown, and
//! restarting of a *hierarchy* of concurrent [Child] processes, or [System]s. Its design is
//! inspired by Erlang's OTP [supervisor](https://www.erlang.org/doc/man/supervisor.html) behavior,
//! adapted for Rust.
//!
//! Spry is also meant to be easy to adopt being unopinionated about the style of process it
//! manages. Processes instead opt-in to Spry's conventions as they become useful.
//!
//! # Hello Spry
//!
//! As a simple example, we can define a top-level system with two managed child processes.
//!
//! ```rust
//! use std::time::Duration;
//! use tokio::sync::mpsc;
//!
//! use spry::prelude::*;
//!
//! // Child processes are defined by "startup functions", asynchronous functions that eventually
//! // spawn the main child process body. They may also optionally return a value.
//! // When a child's startup function returns, it's assumed that the child is in a *stable* state.
//! async fn receiver() -> MainLoop<mpsc::UnboundedSender<String>> {
//! // While creating a channel is synchronous, we could also perform asynchronous setup work.
//! let (tx, mut rx) = mpsc::unbounded_channel();
//!
//! // After startup, we return a join handle pointing at the managed process and, optionally,
//! // any other data that should be available after this process is ready. In this case, we
//! // return an mpsc sender used to communicate with this process.
//! MainLoop::new_returning(tx, tokio::spawn(async move {
//! while let Some(msg) = rx.recv().await {
//! println!("Received: {}", msg);
//! }
//! }))
//! }
//!
//! // Startup functions may also be defined by implementing the `Child` trait. This can be useful
//! // when a child process is parameterized.
//! struct Sender(mpsc::UnboundedSender<String>);
//!
//! impl<'a> Child<'a, ()> for Sender {
//! // Startup functions receive a `Context` object that provides access to different control
//! // services offered by Spry. In particular here we'll check to see whether this process has
//! // been asked to "settle", i.e. gracefully terminate.
//! //
//! // Processes that are not responsive to being asked to settle will eventually be forcefully
//! // aborted. In this case, the process will simply fail to return from the next `.await` point.
//! async fn start(self, cx: Context) -> MainLoop<()> {
//! let msgs = ["Hello", "from", "Spry"];
//! MainLoop::new(tokio::spawn(async move {
//! // We'll see below that we spawn this child as a "permanent" child. This implies that we
//! // expect it to run forever, thus the `.cycle()`.
//! for msg in msgs.iter().cycle() {
//! // These checks are optional, but they give you more control over process lifecycles
//! if cx.is_settled() { break; }
//!
//! // if the process panics on this unwrap, Spry will attempt to restart it
//! self.0.send(msg.to_string()).unwrap();
//!
//! tokio::time::sleep(Duration::from_millis(100)).await;
//! }
//! }))
//! }
//! }
//!
//! // Finally, we define a "system", built from these two child processes. Systems are also defined
//! // by their start functions, though they take a different form. Due to present type system
//! // limitations this must be defined via a trait on a type you define.
//! struct HelloWorld;
//!
//! impl System<&'static str> for HelloWorld {
//! async fn start(&self, scope: &mut Scope<&'static str>) {
//! // This declares the children involved in this system and also how they connect together.
//! // By default, children are considered "permanent". This implies that if either child
//! // terminates, normally or otherwise, this system will gracefully shut down and then start
//! // each child again using this method.
//! let send_channel = scope.child("receiver", |_| receiver()).spawn().await;
//! scope.child("sender", Sender(send_channel)).spawn().await;
//!
//! // This example doesn't include any, but we could also launch sub-systems here as children.
//! // Failures and restarts in these subsystems are isolated from their parent until they reach
//! // a configurable limit after which those failures are escalated.
//! //
//! // scope.system("subsystem", Subsystem).spawn().await;
//! }
//! }
//!
//! #[tokio::main]
//! async fn main() {
//! // Spry begins by declaring a top-level system.
//! let toplevel = Toplevel::new("app", HelloWorld);
//!
//! // We can make use of the top-level shutdown token to hook Spry into external signals.
//! tokio::spawn({
//! let token = toplevel.shutdown_token().clone();
//! async move {
//! // we'll have both ctrl_c and a timer as system termination signals
//! // 600ms is about 6 messages from our sender
//! tokio::select! {
//! _ = tokio::signal::ctrl_c() => {},
//! _ = tokio::time::sleep(Duration::from_millis(600)) => {}
//! }
//!
//! // This will cause a graceful shutdown to cascade through the entire process hierarchy in
//! // reverse order of startup.
//! token.signal_shutdown()
//! }
//! });
//!
//! // When using Spry, it's a good idea to disable the default panic handler.
//! std::panic::set_hook(Box::new(|_| {}));
//!
//! // We join the top-level system which will start and operate the entire process hierarchy.
//! //
//! // This will return either when:
//! // - the toplevel has gracefully shut down from receiving a ctrl-c signal, or
//! // - sufficiently frequent failures have occurred that the system decides it is unrecoverable
//! match toplevel.start().await {
//! Ok(()) => println!("normal app termination"),
//! Err(e) => println!("abnormal app termination: {:?}", e),
//! }
//! }
//! ```
//!
//! # Process lifecycles
//!
//! The most important concept in Spry is that of a process's *lifecycle*. This is an overlay that
//! builds atop typical asynchronous spawning patterns of Rust.
//!
//! A live process may be in one of three states:
//!
//! - *Starting*, when the process is not yet stable/ready and subsequent startup work must wait.
//! This period persists while the asynchronous body of the process' startup function is being
//! run/polled. Once this returns, the process is considered *running*.
//! - *Running*, when the process is considered stable and ready to perform work. If the process
//! halts it is considered to have terminated _normally_, but it may also panic, an _abnormal_
//! termination. Finally, at any point the process may be asked to _settle_ externally. It's not
//! necessary for a process to respond, but it may opt in by observing settlement using its
//! [Context].
//! - *Settling*, when the process is pending forceful termination (i.e. being aborted). The process
//! is given a configurable amount of time to clean up and exit normally. If it persists beyond
//! this limit it will be aborted. An aborted process will never return from its next `.await`
//! point.
//!
//! Living processes may eventually terminate as described above. When this occurs, Spry will
//! observe the termination and respond in accordance with the process's configured lifecycle
//! [policy].
//!
//! - *Permanent* children are expected to run indefinitely. If they terminate for any reason this
//! will trigger a restart of their parent system, and they will themselves be restarted.
//! - *Transient* children are expected to run and then eventually terminate normally. If they panic
//! then they will trigger a restart of their parent system, and they will themselves be restarted.
//! - *Temporary* children may terminate for any reason without causing a restart. They are never
//! restarted, even if they are terminated due to the failure of a sibling process.
//!
//! # "Let it crash"
//!
//! Spry is designed to encourage a style of programming where processes are designed to simply fail
//! and be restarted when they encounter a problem. This is a common pattern in Erlang/OTP sometimes
//! referred to as "let it crash".
//!
//! This style is motivated by the observation that often unexpected and errorful states are
//! ephemeral. Instead of trying to defensively handle each unexpected result using a constellation
//! of methods, we just die, get restarted, and try again shortly. Hopefully, this time the error
//! will be avoided.
//!
//! This style can be disconcerting at first as it leans into partiality. Liberal use of
//! `.unwrap()`, `.expect()`, and even `panic!()` is encouraged. Spry will catch these and perform
//! a partial restart.
//!
//! That said, let it crash is no substitute for thoughtful defensive programming within parts of
//! your codebase that cannot tolerate failure and restart. In these places, Rust's strong error
//! handling systems are critical.
//!
//! More to the point, concurrency is hard and highly concurrent systems, while valuable, often
//! become ["complex systems"](https://how.complexsystems.fail/). Spry is not a silver bullet.
//! Supervision and restarts are good default policy for isolating partial failures. At the same
//! time, achieving robustness involves many systems both technical and social.
//!
//! # Integration with [`tracing`](https://crates.io/crates/tracing)
//!
//! To that end, Spry ships with built-in support for the
//! [`tracing`](https://crates.io/crates/tracing) crate. This makes it easier to track processes
//! within your system and understand how they interact. Indeed, this is the exclusive way that
//! partial failures within a Spry system are reported.
pub use *;