1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
//! Resilience and recovery mechanisms for fallible operations.
//!
//! # Quick Start
//!
//! Add resilience to fallible operations, such as RPC calls over the network, with just a few lines of code.
//! **Retry** handles transient failures and **Timeout** prevents operations from hanging indefinitely:
//!
//! ```rust
//! # #[cfg(all(feature = "retry", feature = "timeout"))]
//! # {
//! # use std::time::Duration;
//! # use tick::Clock;
//! use layered::{Execute, Service, Stack};
//! use seatbelt::retry::Retry;
//! use seatbelt::timeout::Timeout;
//! use seatbelt::{RecoveryInfo, ResilienceContext};
//!
//! # async fn main(clock: Clock) {
//! let context = ResilienceContext::new(&clock);
//! let service = (
//! // Retry middleware: Automatically retries failed operations
//! Retry::layer("retry", &context)
//! .clone_input()
//! .recovery_with(|output: &String, _| match output.as_str() {
//! "temporary_error" => RecoveryInfo::retry(),
//! "operation timed out" => RecoveryInfo::retry(),
//! _ => RecoveryInfo::never(),
//! }),
//! // Timeout middleware: Cancels operations that take too long
//! Timeout::layer("timeout", &context)
//! .timeout_output(|_| "operation timed out".to_string())
//! .timeout(Duration::from_secs(30)),
//! // Your core business logic
//! Execute::new(my_string_operation),
//! )
//! .into_service();
//!
//! let result = service.execute("input data".to_string()).await;
//! # }
//! # async fn my_string_operation(input: String) -> String {
//! # // Simulate processing that transforms the input string
//! # format!("processed: {}", input)
//! # }
//! # }
//! ```
//!
//! # Why?
//!
//! Communicating over a network is inherently fraught with problems. The network can go down at any time,
//! sometimes for a millisecond or two. The endpoint you're connecting to may crash or be rebooted,
//! network configuration may change from under you, etc. To deliver a robust experience to users, and to
//! achieve `5` or more `9s` of availability, it is imperative to implement robust resilience patterns to
//! mask these transient failures.
//!
//! This crate provides production-ready resilience middleware with excellent telemetry for building
//! robust distributed systems that can automatically handle timeouts, retries, and other failure
//! scenarios.
//!
//! - **Production-ready** - Battle-tested middleware with sensible defaults and comprehensive
//! configuration options.
//! - **Excellent telemetry** - Built-in support for metrics and structured logging to monitor
//! resilience behavior in production.
//! - **Runtime agnostic** - Works seamlessly across any async runtime. Use the same resilience
//! patterns across different projects and migrate between runtimes without changes.
//!
//! # Overview
//!
//! This crate uses the [`layered`] crate for composing middleware. The middleware layers
//! can be stacked together using tuples and built into a service using the [`Stack`][layered::Stack] trait.
//!
//! Resilience middleware also requires [`Clock`][tick::Clock] from the [`tick`] crate for timing
//! operations like delays, timeouts, and backoff calculations. The clock is passed through
//! [`ResilienceContext`] when creating middleware layers.
//!
//! ## Core Types
//!
//! - [`ResilienceContext`] - Holds shared state for resilience middleware, including the clock.
//! - [`RecoveryInfo`] - Classifies errors as recoverable (transient) or non-recoverable (permanent).
//! - [`Recovery`] - A trait for types that can determine their recoverability.
//!
//! ## Built-in Middleware
//!
//! This crate provides built-in resilience middleware that you can use out of the box. See the documentation
//! for each module for details on how to use them.
//!
//! - [`timeout`] - Middleware that cancels long-running operations.
//! - [`retry`] - Middleware that automatically retries failed operations.
//! - [`hedging`] - Middleware that reduces tail latency via additional concurrent execution.
//! - [`breaker`] - Middleware that prevents cascading failures.
//! - [`fallback`] - Middleware that replaces invalid output with a user-defined alternative.
//!
//! ## Chaos Testing
//!
//! The [`chaos`] module provides middleware for deliberately injecting faults into a service
//! pipeline, enabling teams to verify that their systems handle failures gracefully.
//!
//! - [`chaos::injection`] - Middleware that replaces service output with a user-provided value
//! at a configurable probability.
//! - [`chaos::latency`] - Middleware that injects artificial delay before the inner service
//! call at a configurable probability.
//!
//! # Middleware Ordering
//!
//! The order in which resilience middleware is composed **matters**. Layers apply outer to inner
//! (the first layer in the tuple is outermost). A recommended ordering:
//!
//! ```text
//! Request → [Fallback → [Retry → [Breaker → [Timeout → Operation]]]]
//! ```
//!
//! - **Fallback** (outermost): guarantees a usable response even if every retry is exhausted.
//! - **Retry**: retries the entire inner stack; each attempt gets its own timeout.
//! - **Breaker**: short-circuits failing calls so retry can back off until the breaker resets.
//! - **Timeout** (innermost): bounds each individual attempt.
//!
//! Keep `Timeout` **inside** `Retry` so that a timed-out attempt is aborted and retried
//! correctly. If `Timeout` were outside, a single timeout would govern all attempts combined
//! and could cancel everything with no chance to recover.
//!
//! # Tower Compatibility
//!
//! All resilience middleware are compatible with the Tower ecosystem when the `tower-service`
//! feature is enabled. This allows you to use `tower::ServiceBuilder` to compose middleware stacks:
//!
//! ```rust
//! # use std::time::Duration;
//! # use tick::Clock;
//! use seatbelt::retry::Retry;
//! use seatbelt::timeout::Timeout;
//! use seatbelt::{RecoveryInfo, ResilienceContext};
//! use tower::ServiceBuilder;
//!
//! # async fn example(clock: Clock) {
//! let context: ResilienceContext<String, Result<String, String>> = ResilienceContext::new(&clock);
//!
//! let service = ServiceBuilder::new()
//! .layer(
//! Retry::layer("my_retry", &context)
//! .clone_input()
//! .recovery_with(|result: &Result<String, String>, _| match result {
//! Ok(_) => RecoveryInfo::never(),
//! Err(_) => RecoveryInfo::retry(),
//! }),
//! )
//! .layer(
//! Timeout::layer("my_timeout", &context)
//! .timeout(Duration::from_secs(30))
//! .timeout_error(|_| "operation timed out".to_string()),
//! )
//! .service_fn(|input: String| async move { Ok::<_, String>(format!("processed: {input}")) });
//! # }
//! ```
//!
//! # Examples
//!
//! Examples covering each middleware and common composition patterns:
//!
//! - [`timeout`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/timeout.rs): Basic timeout that cancels long-running operations.
//! - [`timeout_advanced`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/timeout_advanced.rs): Dynamic timeout duration and timeout callbacks.
//! - [`retry`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/retry.rs): Automatic retry with input cloning and recovery classification.
//! - [`retry_advanced`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/retry_advanced.rs): Custom input cloning with attempt metadata injection.
//! - [`retry_outage`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/retry_outage.rs): Input restoration from errors when cloning is not possible.
//! - [`breaker`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/breaker.rs): Circuit breaker that monitors failure rates.
//! - [`hedging`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/hedging.rs): Hedging slow requests with parallel attempts to reduce tail latency.
//! - [`fallback`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/fallback.rs): Substitutes default values for invalid outputs.
//! - [`resilience_pipeline`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/resilience_pipeline.rs): Composing retry and timeout with metrics.
//! - [`tower`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/tower.rs): Tower `ServiceBuilder` integration.
//! - [`config`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/config.rs): Loading settings from a [JSON file](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/config.json).
//! - [`chaos_injection`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/chaos_injection.rs): Fault injection with configurable probability.
//! - [`chaos_injection_advanced`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/chaos_injection_advanced.rs): Simulating an extended outage with dynamic injection rates.
//! - [`chaos_latency`](https://github.com/microsoft/oxidizer/blob/main/crates/seatbelt/examples/chaos_latency.rs): Injecting artificial delay with configurable probability.
//!
//! # Features
//!
//! This crate provides several optional features that can be enabled in your `Cargo.toml`:
//!
//! - **`timeout`** - Enables the [`timeout`] middleware for canceling long-running operations.
//! - **`retry`** - Enables the [`retry`] middleware for automatically retrying failed operations with
//! configurable backoff strategies, jitter, and recovery classification.
//! - **`hedging`** - Enables the [`hedging`] middleware for reducing tail latency via additional
//! concurrent requests with configurable delay modes.
//! - **`breaker`** - Enables the [`breaker`] middleware for preventing cascading failures.
//! - **`fallback`** - Enables the [`fallback`] middleware for replacing invalid output with a
//! user-defined alternative.
//! - **`chaos-injection`** - Enables the [`chaos::injection`] middleware for injecting faults
//! with a configurable probability.
//! - **`chaos-latency`** - Enables the [`chaos::latency`] middleware for injecting artificial
//! delay with a configurable probability.
//! - **`metrics`** - Exposes the OpenTelemetry metrics API for collecting and reporting metrics.
//! - **`logs`** - Enables structured logging for resilience middleware using the `tracing` crate.
//! - **`serde`** - Enables `serde::Serialize` and `serde::Deserialize` implementations for
//! configuration types.
//! - **`tower-service`** - Enables [`tower_service::Service`] trait implementations for all
//! resilience middleware.
pub use ;
pub use ResilienceContext;
pub
pub use Attempt;
pub
pub
pub type TelemetryString = Cow;