reliability-toolkit
Async reliability primitives for Tokio-based Rust services: token-bucket rate limiter, 3-state circuit breaker, exponential backoff with full jitter, and a semaphore-backed bulkhead. Small, composable, no surprises.
use Duration;
use ;
let limiter = new; // 100 rps, burst 100
let breaker = builder
.failure_threshold
.cool_down
.build;
let retry: = new;
let pool = new;
let result = retry.run.await;
Why another reliability crate
Most of the ecosystem ships these primitives as separate crates with subtly incompatible APIs. This crate's design rules:
- One async runtime assumption — Tokio. Everything is
async fn. No "executor-agnostic" weasel words that mean "doesn't compose with anything you'd actually use." - No
randdep. Jitter uses an inline xorshift PRNG. That's one less surface for advisory updates. - Cheap clones. Every primitive is
Arc-shaped under the hood, so you can hand it across tasks without lifetime gymnastics. - Compose by stacking calls, not by trait gymnastics. No middleware traits, no
tower::Serviceopt-in — those are great, but they push complexity into the wrong place when all you want is "retry around this expensive thing."
Primitives
RateLimiter — token bucket
let limiter = new;
limiter.acquire.await; // blocks until a token is free
limiter.acquire_n.await; // five at once
limiter.try_acquire.await; // returns bool — no waiting
- O(1) per call (
Mutexover a smallState) - Burst is enforced as a hard ceiling
- Refill is computed lazily on each call — no background timer
CircuitBreaker — Closed → Open → HalfOpen
let cb = builder
.failure_threshold // 5 consecutive failures trips it
.cool_down // then stays open for 30s
.half_open_max_calls // admit one trial call before reclosing
.build;
match cb.call.await
- Failure counting is on consecutive failures inside
Closed; a single success resets it HalfOpenadmits up tohalf_open_max_callsand waits for all of them to succeed before reclosing — one failure flips back toOpentrip()andreset()are exposed for kill switches
Retry — exponential backoff with full jitter
let retry: Retry =
new;
let result = retry.run.await;
- Backoff is
min(max_delay, base * 2^(attempt - 1))with full jitter applied retry_ifis a&E -> boolpredicate; absent means "retry all errors"- The closure is
FnMut() -> Future(not a single future), so the next attempt gets a fresh future
Bulkhead — concurrency cap
let pool = new;
let _permit = pool.acquire.await?;
// ... up to 20 of these can be in flight; further callers wait
- Backed by
tokio::sync::Semaphore try_acquire()for non-blocking attemptsclose()to drain on shutdown
Composition
The layering you usually want is rate-limit → bulkhead → circuit-breaker → retry (with retry outermost), so a transient failure inside breaker.call() doesn't bypass the budget the rate limiter is enforcing. See tests/composition.rs and examples/compose.rs.
Run the example:
Benchmarks
Single-threaded hot-bucket throughput on an M-class workstation typically runs in the tens of millions of ops/sec for try_acquire(). The numbers exist mostly to catch regressions in the token math — the goal is "negligible vs. the call you're wrapping," and it is.
Tests
CI runs the matrix stable, beta, and 1.85.0 (MSRV).
Related work in this ecosystem
This is part of the Platform Reliability Stack — small, focused libraries that compose into a production reliability story:
- slo-budget-tracker — Python SLO + error-budget library + Prometheus exporter.
- procurement-decision-api — drafts AI Procurement Decision Cards from vendor Suite documents.
- More at kineticgain.com.
License
MIT. See LICENSE.