ninelives 0.2.0

Resilience primitives for async Rust: retry, circuit breaker, bulkhead, timeout, and composable stacks.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
# Nine Lives 🐱

> Tower-native fractal supervision for async Rust — autonomous, self-healing Services via composable policy algebra.

<img alt="ninelives" src="https://github.com/user-attachments/assets/354f1818-c1c5-4e0a-ba1d-30382db5705f" />

**Resilience patterns for Rust with algebraic composition.**

[![Crates.io](https://img.shields.io/crates/v/ninelives.svg)](https://crates.io/crates/ninelives)
[![Documentation](https://docs.rs/ninelives/badge.svg)](https://docs.rs/ninelives)
[![License](https://img.shields.io/crates/l/ninelives.svg)](LICENSE)

Nine Lives provides battle-tested resilience patterns (retry, circuit breaker, bulkhead, timeout) as composable [tower](https://github.com/tower-rs/tower) layers with a unique algebraic composition system.

### Features

- 🔁 **Retry policies** with exponential/linear/constant backoff and jitter
-**Circuit breakers** with half-open state recovery
- 🚧 **Bulkheads** for concurrency limiting and resource isolation
- ⏱️ **Timeout policies** integrated with tokio
- 🧮 **Algebraic composition** via intuitive operators (`+`, `|`, `&`)
- 🏎️ **Fork-join** for concurrent racing (Happy Eyeballs pattern)
- 🔒 **Lock-free implementations** using atomics
- 🏗️ **Tower-native** - works with any tower `Service`
- 🌐 **Companion sinks** (OTLP, NATS, Kafka, Elastic, etcd, Prometheus, JSONL) via optional crates

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
ninelives = "0.2"
tower = "0.5"
tokio = { version = "1", features = ["full"] }
```

### Basic Usage

```rust
use ninelives::prelude::*;
use std::time::Duration;
use tower::{Service, ServiceBuilder, ServiceExt};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Apply a timeout to any service
    let mut svc = ServiceBuilder::new()
        .layer(TimeoutLayer::new(Duration::from_secs(1))?)
        .service_fn(|req: &str| async move {
            Ok::<_, std::io::Error>(format!("Response: {}", req))
        });

    let response = svc.ready().await?.call("hello").await?;
    println!("{}", response);
    Ok(())
}
```

## Algebraic Composition - The Nine Lives Advantage

Compose resilience strategies using intuitive operators:

- **`Policy(A) + Policy(B)`** - Sequential composition: `A` wraps `B`
- **`Policy(A) | Policy(B)`** - Fallback: try `A`, fall back to `B` on error
- **`Policy(A) & Policy(B)`** - Fork-join: try both concurrently, return first success

**Precedence:** `&` > `+` > `|` (like `*` > `+` > bitwise-or in math)

### Example: Fallback Strategy

Try an aggressive timeout first, fall back to a longer timeout on failure:

```rust
use ninelives::prelude::*;
use std::time::Duration;
use tower::{ServiceBuilder, Layer};

let fast = Policy(TimeoutLayer::new(Duration::from_millis(100))?);
let slow = Policy(TimeoutLayer::new(Duration::from_secs(5))?);
let policy = fast | slow;

let svc = ServiceBuilder::new()
    .layer(policy)
    .service_fn(|req| async { Ok::<_, std::io::Error>(req) });
```

### Example: Fork-Join (Happy Eyeballs)

Race two strategies concurrently and return the first success:

```rust
use ninelives::prelude::*;
use std::time::Duration;

// Create two timeout policies with different durations
let ipv4 = Policy(TimeoutLayer::new(Duration::from_millis(100))?);
let ipv6 = Policy(TimeoutLayer::new(Duration::from_millis(150))?);

// Race them concurrently - first success wins
let policy = ipv4 & ipv6;

let svc = ServiceBuilder::new()
    .layer(policy)
    .service_fn(|req| async { Ok::<_, std::io::Error>(req) });
```

### Example: Multi-Tier Resilience

Combine multiple strategies with automatic precedence:

```rust
use ninelives::prelude::*;
use std::time::Duration;

// Aggressive: just a fast timeout
let aggressive = Policy(TimeoutLayer::new(Duration::from_millis(50))?);

// Defensive: nested timeouts for retries
let defensive = Policy(TimeoutLayer::new(Duration::from_secs(10))?)
              + Policy(TimeoutLayer::new(Duration::from_secs(5))?);

// Try aggressive first, fall back to defensive
let policy = aggressive | defensive;
// Parsed as: Policy(Timeout50ms) | (Policy(Timeout10s) + Policy(Timeout5s))
```

### Example: Circuit Breaker with Retry

```rust
use ninelives::prelude::*;
use std::time::Duration;

// Build a retry policy with exponential backoff
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_jitter(Jitter::full())
    .build()?;

// Configure circuit breaker
let circuit_breaker = CircuitBreakerLayer::new(
    CircuitBreakerConfig::default()
        .failure_threshold(5)
        .timeout_duration(Duration::from_secs(10))
)?;

// Compose: circuit breaker wraps retry
let policy = Policy(circuit_breaker) + Policy(retry.into_layer());
```

## Telemetry Sink Ladder

- **Baby mode:** `MemorySink::with_capacity(1_000)` for local inspection.
- **Intermediate:** `NonBlockingSink(LogSink)` to keep request paths non-blocking while logging.
- **Advanced:** `NonBlockingSink(OtlpSink)` + `StreamingSink` fan-out for in-cluster consumers.
- **GOD MODE:** `StreamingSink` → NATS/Kafka/Elastic via companion crates, with Observer + Sentinel auto-tuning when drop/evict metrics spike.

See recipes in `src/cookbook.rs` and companion cookbooks:
- `ninelives-otlp/README.md`
- `ninelives-nats/README.md`
- `ninelives-kafka/README.md`
- `ninelives-elastic/README.md`
- `ninelives-etcd/README.md`
- `ninelives-prometheus/README.md`
- `ninelives-jsonl/README.md`

## Cookbook (pick your recipe)

- **Simple retry:** `retry_fast` — 3 attempts, 50ms exp backoff + jitter.
- **Latency guard:** `timeout_p95` — 300ms budget.
- **Bulkhead:** `bulkhead_isolate(max)` — protect shared deps.
- **API guardrail (intermediate):** `api_guardrail` — timeout + breaker + bulkhead.
- **Reliable read (advanced):** `reliable_read` — fast path then fallback stack.
- **Hedged read (tricky):** `hedged_read` — fork-join two differently-tuned stacks.
- **Hedge + fallback (god tier):** `hedged_then_fallback` — race two fast paths, then fall back to a sturdy stack.
- **Sensible defaults:** `sensible_defaults` — timeout + retry + bulkhead starter pack.

All live in `src/cookbook.rs`.
Moved to the `ninelives-cookbook` crate (see its README/examples).

## Tower Integration

Nine Lives layers work seamlessly with tower's `ServiceBuilder`:

```rust
use ninelives::prelude::*;
use tower::ServiceBuilder;
use std::time::Duration;

let service = ServiceBuilder::new()
    .layer(TimeoutLayer::new(Duration::from_secs(30))?)
    .layer(CircuitBreakerLayer::new(CircuitBreakerConfig::default())?)
    .layer(BulkheadLayer::new(10)?)
    .service(my_inner_service);
```

Or use the algebraic syntax:

```rust
let policy = Policy(TimeoutLayer::new(Duration::from_secs(30))?)
           + Policy(CircuitBreakerLayer::new(CircuitBreakerConfig::default())?)
           + Policy(BulkheadLayer::new(10)?);

let service = ServiceBuilder::new()
    .layer(policy)
    .service(my_inner_service);
```

## Available Layers

### TimeoutLayer

Enforces time limits on operations:

```rust
use ninelives::prelude::*;
use std::time::Duration;

let timeout = TimeoutLayer::new(Duration::from_secs(5))?;
```

### RetryLayer

Retries failed operations with configurable backoff and jitter:

```rust
use ninelives::prelude::*;
use std::time::Duration;

let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_jitter(Jitter::full())
    .build()?
    .into_layer();
```

**Backoff strategies:**
- `Backoff::constant(duration)` - Fixed delay
- `Backoff::linear(base)` - Linear increase: `base * attempt`
- `Backoff::exponential(base)` - Exponential: `base * 2^attempt`

**Jitter strategies:**
- `Jitter::none()` - No jitter
- `Jitter::full()` - Random [0, delay]
- `Jitter::equal()` - delay/2 + random [0, delay/2]
- `Jitter::decorrelated()` - AWS-style stateful jitter

### CircuitBreakerLayer

Prevents cascading failures with three-state management (Closed/Open/HalfOpen):

```rust
use ninelives::prelude::*;
use std::time::Duration;

let circuit_breaker = CircuitBreakerLayer::new(
    CircuitBreakerConfig::default()
        .failure_threshold(5)        // Open after 5 failures
        .timeout_duration(Duration::from_secs(10))  // Stay open for 10s
        .half_open_max_calls(3)      // Allow 3 test calls in half-open
)?;
```

### BulkheadLayer

Limits concurrent requests for resource isolation:

```rust
use ninelives::prelude::*;

let bulkhead = BulkheadLayer::new(10)?;  // Max 10 concurrent requests
```

## Error Handling

All resilience errors are unified under `ResilienceError<E>`:

```rust
use ninelives::ResilienceError;

match service.call(request).await {
    Ok(response) => { /* success */ },
    Err(ResilienceError::Timeout { .. }) => { /* timeout */ },
    Err(ResilienceError::CircuitOpen { .. }) => { /* circuit breaker open */ },
    Err(ResilienceError::RetryExhausted { failures, .. }) => {
        // All retry attempts failed
        eprintln!("Failed after {} attempts", failures.len());
    },
    Err(ResilienceError::Bulkhead { .. }) => { /* capacity exhausted */ },
    Err(ResilienceError::Inner(e)) => { /* inner service error */ },
}
```

## Operator Precedence

When combining operators, understand the precedence rules:

```rust
// & binds tighter than +, and + binds tighter than |
A | B + C & D   // Parsed as: A | (B + (C & D))

// Use parentheses for explicit control
(A | B) + C     // C wraps the fallback between A and B
```

**Examples:**

```rust
// Try fast, fallback to slow with retry
let policy = fast | retry + slow;
// Equivalent to: fast | (retry + slow)

// Retry wraps a fallback
let policy = retry + (fast | slow);

// Happy Eyeballs: race IPv4 and IPv6
let policy = ipv4 & ipv6;
// Both called concurrently, first success wins

// Complex composition
let policy = aggressive | defensive + (ipv4 & ipv6);
// Try aggressive, fallback to defensive wrapping parallel attempts
```

## Testability

Nine Lives is designed for testing with dependency injection:

```rust
use ninelives::prelude::*;
use std::time::Duration;

// Use InstantSleeper for tests (no actual delays)
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .backoff(Backoff::exponential(Duration::from_millis(100)))
    .with_sleeper(InstantSleeper)
    .build()?;

// TrackingSleeper records sleep durations for assertions
let tracker = TrackingSleeper::new();
let retry = RetryPolicy::builder()
    .max_attempts(3)
    .with_sleeper(tracker.clone())
    .build()?;

// ... exercise retry ...

let sleeps = tracker.get_sleeps();
assert_eq!(sleeps.len(), 2); // Slept twice before success
```

## Roadmap (snapshot)

Nine Lives is marching toward autonomous, fractal resilience. Current focus:

- ✅ Phase 0–1: Tower-native algebra + telemetry sinks (done)
- 🚧 Phase 2: Control plane & adaptive configs (in progress)
- 🧭 Phase 3: Observer for aggregated state (planned)
- 🔮 Phase 5: Sentinel meta-policies + shadow eval (planned)

Full detail and milestones live in [ROADMAP.md](ROADMAP.md).

## Performance

Nine Lives is built for production:

- **Lock-free** circuit breaker state transitions using atomics
- **Zero-allocation** backoff/jitter calculations with overflow protection
- **Minimal overhead** - resilience layers add < 1% latency in common cases

Benchmarks coming soon.

## Comparison to Other Libraries

| Feature | Nine Lives | Resilience4j (Java) | Polly (C#) | tower |
|---------|-----------|---------------------|-----------|-------|
| Uniform Service Abstraction |||||
| Algebraic Composition (`+`, `\|`, `&`) |||||
| Fork-Join (Happy Eyeballs) |||||
| Tower Integration | ✅ Native | N/A | N/A | ✅ Native |
| Lock-Free Implementations || Partial | Partial | Varies |
| Retry with Backoff/Jitter |||||
| Circuit Breaker |||||
| Bulkhead |||||
| Timeout |||||

**Nine Lives' unique advantage:** Algebraic composition with fork-join support lets you express complex resilience strategies declaratively, including concurrent racing patterns like Happy Eyeballs, without nested builders or imperative code.

## Examples

See the [`ninelives-cookbook/examples`](ninelives-cookbook/examples/) directory for runnable examples:

- `retry_only.rs` - Focused retry with backoff, jitter, and `should_retry`
- `bulkhead_concurrency.rs` - Non-blocking bulkhead behavior under contention
- `timeout_fallback.rs` - Timeout with fallback policy
- `decorrelated_jitter.rs` - AWS-style decorrelated jitter
- `algebra_composition.rs` - Algebraic composition patterns
- `telemetry_basic.rs` / `telemetry_composition.rs` - Attaching sinks and composing telemetry

Run with:

```bash
cargo run -p ninelives-cookbook --example timeout_fallback
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

## License

Apache License, Version 2.0 ([LICENSE](LICENSE) or <http://www.apache.org/licenses/LICENSE-2.0>)

_@ 2025 • James Ross • [📧](mailto:james@flyingrobots.dev) • [🔗 FLYING•ROBOTS](https://github.com/flyingrobots)_