spry/
lib.rs

1//! Resilient, self-healing async process hierarchies in the style of Erlang/OTP.
2//!
3//! In complex concurrent systems it is important to ensure that different processes are
4//! well-synchronized. In particular, processes need to start in proper order and, in the event of
5//! an error, be properly torn down and cleaned up. This problem becomes especially difficult when
6//! building *robust* concurrent systems where partial failures should be handled gracefully, often
7//! via some form of automated recovery of the failed components.
8//!
9//! Often these requirements are ignored or met via a collection of *ad hoc* mechanisms. If you've
10//! ever found yourself spawning many asynchronous tasks and then wondering what to do with all the
11//! `JoinHandle`s, Spry offers a very good default answer.
12//!
13//! **Spry** is a library supporting a set of common conventions for orderly startup, shutdown, and
14//! restarting of a *hierarchy* of concurrent [Child] processes, or [System]s. Its design is
15//! inspired by Erlang's OTP [supervisor](https://www.erlang.org/doc/man/supervisor.html) behavior,
16//! adapted for Rust.
17//!
18//! Spry is also meant to be easy to adopt being unopinionated about the style of process it
19//! manages. Processes instead opt-in to Spry's conventions as they become useful.
20//!
21//! # Hello Spry
22//!
23//! As a simple example, we can define a top-level system with two managed child processes.
24//!
25//! ```rust
26//! use std::time::Duration;
27//! use tokio::sync::mpsc;
28//!
29//! use spry::prelude::*;
30//!
31//! // Child processes are defined by "startup functions", asynchronous functions that eventually
32//! // spawn the main child process body. They may also optionally return a value.
33//! // When a child's startup function returns, it's assumed that the child is in a *stable* state.
34//! async fn receiver() -> MainLoop<mpsc::UnboundedSender<String>> {
35//!   // While creating a channel is synchronous, we could also perform asynchronous setup work.
36//!   let (tx, mut rx) = mpsc::unbounded_channel();
37//!
38//!   // After startup, we return a join handle pointing at the managed process and, optionally,
39//!   // any other data that should be available after this process is ready. In this case, we
40//!   // return an mpsc sender used to communicate with this process.
41//!   MainLoop::new_returning(tx, tokio::spawn(async move {
42//!     while let Some(msg) = rx.recv().await {
43//!       println!("Received: {}", msg);
44//!     }
45//!   }))
46//! }
47//!
48//! // Startup functions may also be defined by implementing the `Child` trait. This can be useful
49//! // when a child process is parameterized.
50//! struct Sender(mpsc::UnboundedSender<String>);
51//!
52//! impl<'a> Child<'a, ()> for Sender {
53//!   // Startup functions receive a `Context` object that provides access to different control
54//!   // services offered by Spry. In particular here we'll check to see whether this process has
55//!   // been asked to "settle", i.e. gracefully terminate.
56//!   //
57//!   // Processes that are not responsive to being asked to settle will eventually be forcefully
58//!   // aborted. In this case, the process will simply fail to return from the next `.await` point.
59//!   async fn start(self, cx: Context) -> MainLoop<()> {
60//!     let msgs = ["Hello", "from", "Spry"];
61//!     MainLoop::new(tokio::spawn(async move {
62//!       // We'll see below that we spawn this child as a "permanent" child. This implies that we
63//!       // expect it to run forever, thus the `.cycle()`.
64//!       for msg in msgs.iter().cycle() {
65//!         // These checks are optional, but they give you more control over process lifecycles
66//!         if cx.is_settled() { break; }
67//!
68//!         // if the process panics on this unwrap, Spry will attempt to restart it
69//!         self.0.send(msg.to_string()).unwrap();
70//!
71//!         tokio::time::sleep(Duration::from_millis(100)).await;
72//!       }
73//!     }))
74//!   }
75//! }
76//!
77//! // Finally, we define a "system", built from these two child processes. Systems are also defined
78//! // by their start functions, though they take a different form. Due to present type system
79//! // limitations this must be defined via a trait on a type you define.
80//! struct HelloWorld;
81//!
82//! impl System<&'static str> for HelloWorld {
83//!   async fn start(&self, scope: &mut Scope<&'static str>) {
84//!     // This declares the children involved in this system and also how they connect together.
85//!     // By default, children are considered "permanent". This implies that if either child
86//!     // terminates, normally or otherwise, this system will gracefully shut down and then start
87//!     // each child again using this method.
88//!     let send_channel = scope.child("receiver", |_| receiver()).spawn().await;
89//!     scope.child("sender", Sender(send_channel)).spawn().await;
90//!
91//!     // This example doesn't include any, but we could also launch sub-systems here as children.
92//!     // Failures and restarts in these subsystems are isolated from their parent until they reach
93//!     // a configurable limit after which those failures are escalated.
94//!     //
95//!     //   scope.system("subsystem", Subsystem).spawn().await;
96//!   }
97//! }
98//!
99//! #[tokio::main]
100//! async fn main() {
101//!   // Spry begins by declaring a top-level system.
102//!   let toplevel = Toplevel::new("app", HelloWorld);
103//!
104//!   // We can make use of the top-level shutdown token to hook Spry into external signals.
105//!   tokio::spawn({
106//!     let token = toplevel.shutdown_token().clone();
107//!     async move {
108//!       // we'll have both ctrl_c and a timer as system termination signals
109//!       // 600ms is about 6 messages from our sender
110//!       tokio::select! {
111//!         _ = tokio::signal::ctrl_c() => {},
112//!         _ = tokio::time::sleep(Duration::from_millis(600)) => {}
113//!       }
114//!
115//!       // This will cause a graceful shutdown to cascade through the entire process hierarchy in
116//!       // reverse order of startup.
117//!       token.signal_shutdown()
118//!     }
119//!   });
120//!
121//!   // When using Spry, it's a good idea to disable the default panic handler.
122//!   std::panic::set_hook(Box::new(|_| {}));
123//!
124//!   // We join the top-level system which will start and operate the entire process hierarchy.
125//!   //
126//!   // This will return either when:
127//!   // - the toplevel has gracefully shut down from receiving a ctrl-c signal, or
128//!   // - sufficiently frequent failures have occurred that the system decides it is unrecoverable
129//!   match toplevel.start().await {
130//!     Ok(()) => println!("normal app termination"),
131//!     Err(e) => println!("abnormal app termination: {:?}", e),
132//!   }
133//! }
134//! ```
135//!
136//! # Process lifecycles
137//!
138//! The most important concept in Spry is that of a process's *lifecycle*. This is an overlay that
139//! builds atop typical asynchronous spawning patterns of Rust.
140//!
141//! A live process may be in one of three states:
142//!
143//! - *Starting*, when the process is not yet stable/ready and subsequent startup work must wait.
144//!   This period persists while the asynchronous body of the process' startup function is being
145//!   run/polled. Once this returns, the process is considered *running*.
146//! - *Running*, when the process is considered stable and ready to perform work. If the process
147//!   halts it is considered to have terminated _normally_, but it may also panic, an _abnormal_
148//!   termination. Finally, at any point the process may be asked to _settle_ externally. It's not
149//!   necessary for a process to respond, but it may opt in by observing settlement using its
150//!   [Context].
151//! - *Settling*, when the process is pending forceful termination (i.e. being aborted). The process
152//!   is given a configurable amount of time to clean up and exit normally. If it persists beyond
153//!   this limit it will be aborted. An aborted process will never return from its next `.await`
154//!   point.
155//!
156//! Living processes may eventually terminate as described above. When this occurs, Spry will
157//! observe the termination and respond in accordance with the process's configured lifecycle
158//! [policy].
159//!
160//! - *Permanent* children are expected to run indefinitely. If they terminate for any reason this
161//!   will trigger a restart of their parent system, and they will themselves be restarted.
162//! - *Transient* children are expected to run and then eventually terminate normally. If they panic
163//!   then they will trigger a restart of their parent system, and they will themselves be restarted.
164//! - *Temporary* children may terminate for any reason without causing a restart. They are never
165//!   restarted, even if they are terminated due to the failure of a sibling process.
166//!
167//! # "Let it crash"
168//!
169//! Spry is designed to encourage a style of programming where processes are designed to simply fail
170//! and be restarted when they encounter a problem. This is a common pattern in Erlang/OTP sometimes
171//! referred to as "let it crash".
172//!
173//! This style is motivated by the observation that often unexpected and errorful states are
174//! ephemeral. Instead of trying to defensively handle each unexpected result using a constellation
175//! of methods, we just die, get restarted, and try again shortly. Hopefully, this time the error
176//! will be avoided.
177//!
178//! This style can be disconcerting at first as it leans into partiality. Liberal use of
179//! `.unwrap()`, `.expect()`, and even `panic!()` is encouraged. Spry will catch these and perform
180//! a partial restart.
181//!
182//! That said, let it crash is no substitute for thoughtful defensive programming within parts of
183//! your codebase that cannot tolerate failure and restart. In these places, Rust's strong error
184//! handling systems are critical.
185//!
186//! More to the point, concurrency is hard and highly concurrent systems, while valuable, often
187//! become ["complex systems"](https://how.complexsystems.fail/). Spry is not a silver bullet.
188//! Supervision and restarts are good default policy for isolating partial failures. At the same
189//! time, achieving robustness involves many systems both technical and social.
190//!
191//! # Integration with [`tracing`](https://crates.io/crates/tracing)
192//!
193//! To that end, Spry ships with built-in support for the
194//! [`tracing`](https://crates.io/crates/tracing) crate. This makes it easier to track processes
195//! within your system and understand how they interact. Indeed, this is the exclusive way that
196//! partial failures within a Spry system are reported.
197
198pub use prelude::*;
199
200pub mod breaker;
201pub mod builder;
202mod core;
203pub mod error;
204mod internal;
205mod nursery;
206pub mod policy;
207pub mod prelude;
208pub mod signals;
209mod system;
spry/lib.rs

spry/
lib.rs