spry/lib.rs
1//! Resilient, self-healing async process hierarchies in the style of Erlang/OTP.
2//!
3//! In complex concurrent systems it is important to ensure that different processes are
4//! well-synchronized. In particular, processes need to start in proper order and, in the event of
5//! an error, be properly torn down and cleaned up. This problem becomes especially difficult when
6//! building *robust* concurrent systems where partial failures should be handled gracefully, often
7//! via some form of automated recovery of the failed components.
8//!
9//! Often these requirements are ignored or met via a collection of *ad hoc* mechanisms. If you've
10//! ever found yourself spawning many asynchronous tasks and then wondering what to do with all the
11//! `JoinHandle`s, Spry offers a very good default answer.
12//!
13//! **Spry** is a library supporting a set of common conventions for orderly startup, shutdown, and
14//! restarting of a *hierarchy* of concurrent [Child] processes, or [System]s. Its design is
15//! inspired by Erlang's OTP [supervisor](https://www.erlang.org/doc/man/supervisor.html) behavior,
16//! adapted for Rust.
17//!
18//! Spry is also meant to be easy to adopt being unopinionated about the style of process it
19//! manages. Processes instead opt-in to Spry's conventions as they become useful.
20//!
21//! # Hello Spry
22//!
23//! As a simple example, we can define a top-level system with two managed child processes.
24//!
25//! ```rust
26//! use std::time::Duration;
27//! use tokio::sync::mpsc;
28//!
29//! use spry::prelude::*;
30//!
31//! // Child processes are defined by "startup functions", asynchronous functions that eventually
32//! // spawn the main child process body. They may also optionally return a value.
33//! // When a child's startup function returns, it's assumed that the child is in a *stable* state.
34//! async fn receiver() -> MainLoop<mpsc::UnboundedSender<String>> {
35//! // While creating a channel is synchronous, we could also perform asynchronous setup work.
36//! let (tx, mut rx) = mpsc::unbounded_channel();
37//!
38//! // After startup, we return a join handle pointing at the managed process and, optionally,
39//! // any other data that should be available after this process is ready. In this case, we
40//! // return an mpsc sender used to communicate with this process.
41//! MainLoop::new_returning(tx, tokio::spawn(async move {
42//! while let Some(msg) = rx.recv().await {
43//! println!("Received: {}", msg);
44//! }
45//! }))
46//! }
47//!
48//! // Startup functions may also be defined by implementing the `Child` trait. This can be useful
49//! // when a child process is parameterized.
50//! struct Sender(mpsc::UnboundedSender<String>);
51//!
52//! impl<'a> Child<'a, ()> for Sender {
53//! // Startup functions receive a `Context` object that provides access to different control
54//! // services offered by Spry. In particular here we'll check to see whether this process has
55//! // been asked to "settle", i.e. gracefully terminate.
56//! //
57//! // Processes that are not responsive to being asked to settle will eventually be forcefully
58//! // aborted. In this case, the process will simply fail to return from the next `.await` point.
59//! async fn start(self, cx: Context) -> MainLoop<()> {
60//! let msgs = ["Hello", "from", "Spry"];
61//! MainLoop::new(tokio::spawn(async move {
62//! // We'll see below that we spawn this child as a "permanent" child. This implies that we
63//! // expect it to run forever, thus the `.cycle()`.
64//! for msg in msgs.iter().cycle() {
65//! // These checks are optional, but they give you more control over process lifecycles
66//! if cx.is_settled() { break; }
67//!
68//! // if the process panics on this unwrap, Spry will attempt to restart it
69//! self.0.send(msg.to_string()).unwrap();
70//!
71//! tokio::time::sleep(Duration::from_millis(100)).await;
72//! }
73//! }))
74//! }
75//! }
76//!
77//! // Finally, we define a "system", built from these two child processes. Systems are also defined
78//! // by their start functions, though they take a different form. Due to present type system
79//! // limitations this must be defined via a trait on a type you define.
80//! struct HelloWorld;
81//!
82//! impl System<&'static str> for HelloWorld {
83//! async fn start(&self, scope: &mut Scope<&'static str>) {
84//! // This declares the children involved in this system and also how they connect together.
85//! // By default, children are considered "permanent". This implies that if either child
86//! // terminates, normally or otherwise, this system will gracefully shut down and then start
87//! // each child again using this method.
88//! let send_channel = scope.child("receiver", |_| receiver()).spawn().await;
89//! scope.child("sender", Sender(send_channel)).spawn().await;
90//!
91//! // This example doesn't include any, but we could also launch sub-systems here as children.
92//! // Failures and restarts in these subsystems are isolated from their parent until they reach
93//! // a configurable limit after which those failures are escalated.
94//! //
95//! // scope.system("subsystem", Subsystem).spawn().await;
96//! }
97//! }
98//!
99//! #[tokio::main]
100//! async fn main() {
101//! // Spry begins by declaring a top-level system.
102//! let toplevel = Toplevel::new("app", HelloWorld);
103//!
104//! // We can make use of the top-level shutdown token to hook Spry into external signals.
105//! tokio::spawn({
106//! let token = toplevel.shutdown_token().clone();
107//! async move {
108//! // we'll have both ctrl_c and a timer as system termination signals
109//! // 600ms is about 6 messages from our sender
110//! tokio::select! {
111//! _ = tokio::signal::ctrl_c() => {},
112//! _ = tokio::time::sleep(Duration::from_millis(600)) => {}
113//! }
114//!
115//! // This will cause a graceful shutdown to cascade through the entire process hierarchy in
116//! // reverse order of startup.
117//! token.signal_shutdown()
118//! }
119//! });
120//!
121//! // When using Spry, it's a good idea to disable the default panic handler.
122//! std::panic::set_hook(Box::new(|_| {}));
123//!
124//! // We join the top-level system which will start and operate the entire process hierarchy.
125//! //
126//! // This will return either when:
127//! // - the toplevel has gracefully shut down from receiving a ctrl-c signal, or
128//! // - sufficiently frequent failures have occurred that the system decides it is unrecoverable
129//! match toplevel.start().await {
130//! Ok(()) => println!("normal app termination"),
131//! Err(e) => println!("abnormal app termination: {:?}", e),
132//! }
133//! }
134//! ```
135//!
136//! # Process lifecycles
137//!
138//! The most important concept in Spry is that of a process's *lifecycle*. This is an overlay that
139//! builds atop typical asynchronous spawning patterns of Rust.
140//!
141//! A live process may be in one of three states:
142//!
143//! - *Starting*, when the process is not yet stable/ready and subsequent startup work must wait.
144//! This period persists while the asynchronous body of the process' startup function is being
145//! run/polled. Once this returns, the process is considered *running*.
146//! - *Running*, when the process is considered stable and ready to perform work. If the process
147//! halts it is considered to have terminated _normally_, but it may also panic, an _abnormal_
148//! termination. Finally, at any point the process may be asked to _settle_ externally. It's not
149//! necessary for a process to respond, but it may opt in by observing settlement using its
150//! [Context].
151//! - *Settling*, when the process is pending forceful termination (i.e. being aborted). The process
152//! is given a configurable amount of time to clean up and exit normally. If it persists beyond
153//! this limit it will be aborted. An aborted process will never return from its next `.await`
154//! point.
155//!
156//! Living processes may eventually terminate as described above. When this occurs, Spry will
157//! observe the termination and respond in accordance with the process's configured lifecycle
158//! [policy].
159//!
160//! - *Permanent* children are expected to run indefinitely. If they terminate for any reason this
161//! will trigger a restart of their parent system, and they will themselves be restarted.
162//! - *Transient* children are expected to run and then eventually terminate normally. If they panic
163//! then they will trigger a restart of their parent system, and they will themselves be restarted.
164//! - *Temporary* children may terminate for any reason without causing a restart. They are never
165//! restarted, even if they are terminated due to the failure of a sibling process.
166//!
167//! # "Let it crash"
168//!
169//! Spry is designed to encourage a style of programming where processes are designed to simply fail
170//! and be restarted when they encounter a problem. This is a common pattern in Erlang/OTP sometimes
171//! referred to as "let it crash".
172//!
173//! This style is motivated by the observation that often unexpected and errorful states are
174//! ephemeral. Instead of trying to defensively handle each unexpected result using a constellation
175//! of methods, we just die, get restarted, and try again shortly. Hopefully, this time the error
176//! will be avoided.
177//!
178//! This style can be disconcerting at first as it leans into partiality. Liberal use of
179//! `.unwrap()`, `.expect()`, and even `panic!()` is encouraged. Spry will catch these and perform
180//! a partial restart.
181//!
182//! That said, let it crash is no substitute for thoughtful defensive programming within parts of
183//! your codebase that cannot tolerate failure and restart. In these places, Rust's strong error
184//! handling systems are critical.
185//!
186//! More to the point, concurrency is hard and highly concurrent systems, while valuable, often
187//! become ["complex systems"](https://how.complexsystems.fail/). Spry is not a silver bullet.
188//! Supervision and restarts are good default policy for isolating partial failures. At the same
189//! time, achieving robustness involves many systems both technical and social.
190//!
191//! # Integration with [`tracing`](https://crates.io/crates/tracing)
192//!
193//! To that end, Spry ships with built-in support for the
194//! [`tracing`](https://crates.io/crates/tracing) crate. This makes it easier to track processes
195//! within your system and understand how they interact. Indeed, this is the exclusive way that
196//! partial failures within a Spry system are reported.
197
198pub use prelude::*;
199
200pub mod breaker;
201pub mod builder;
202mod core;
203pub mod error;
204mod internal;
205mod nursery;
206pub mod policy;
207pub mod prelude;
208pub mod signals;
209mod system;