tower-resilience
A comprehensive resilience and fault-tolerance toolkit for Tower services, inspired by Resilience4j.
About
Resilience patterns for Tower services, inspired by Resilience4j. Includes circuit breaker, bulkhead, retry with backoff, rate limiting, and more.
Resilience Patterns
- Circuit Breaker - Prevents cascading failures by stopping calls to failing services
- Bulkhead - Isolates resources to prevent system-wide failures
- Time Limiter - Advanced timeout handling with cancellation support
- Retry - Intelligent retry with exponential backoff, jitter, and retry budgets
- Rate Limiter - Controls request rate with fixed or sliding window algorithms
- Cache - Response memoization to reduce load
- Fallback - Graceful degradation when services fail
- Hedge - Reduces tail latency by racing redundant requests
- Reconnect - Automatic reconnection with configurable backoff strategies
- Health Check - Proactive health monitoring with intelligent resource selection
- Executor - Delegates request processing to dedicated executors for parallelism
- Adaptive Concurrency - Dynamic concurrency limiting using AIMD or Vegas algorithms
- Coalesce - Deduplicates concurrent identical requests (singleflight pattern)
- Chaos - Inject failures and latency for testing resilience (development/testing only)
Quick Start
[]
= "0.1"
= "0.5"
use ServiceBuilder;
use *;
let circuit_breaker = builder
.failure_rate_threshold
.build;
let service = new
.layer
.layer
.service;
Note: Use
for_request::<T>()with the request typeTyour service handles so the circuit breaker can plug intoServiceBuilder. Thelayer.layer(service)method still works when you need direct control over the service value.
Examples
Circuit Breaker
Prevent cascading failures by opening the circuit when error rate exceeds threshold:
use CircuitBreakerLayer;
use Duration;
let layer = builder
.name
.failure_rate_threshold // Open at 50% failure rate
.sliding_window_size // Track last 100 calls
.wait_duration_in_open // Stay open 60s
.on_state_transition
.build;
let service = layer.layer;
Full examples: circuitbreaker.rs | circuitbreaker_fallback.rs | circuitbreaker_health_check.rs
Bulkhead
Limit concurrent requests to prevent resource exhaustion:
use BulkheadLayer;
use Duration;
let layer = builder
.name
.max_concurrent_calls // Max 10 concurrent
.max_wait_duration // Wait up to 5s
.on_call_permitted
.on_call_rejected
.build;
let service = layer.layer;
Full examples: bulkhead.rs | bulkhead_advanced.rs
Time Limiter
Enforce timeouts on operations with configurable cancellation:
use TimeLimiterLayer;
use Duration;
let layer = builder
.timeout_duration
.cancel_running_future // Cancel on timeout
.on_timeout
.build;
let service = layer.layer;
Full examples: timelimiter.rs | timelimiter_example.rs
Retry
Retry failed requests with exponential backoff and jitter:
use RetryLayer;
use Duration;
let layer = builder
.max_attempts
.exponential_backoff
.on_retry
.on_success
.build;
let service = layer.layer;
Full examples: retry.rs | retry_example.rs
Rate Limiter
Control request rate to protect downstream services:
use RateLimiterLayer;
use Duration;
let layer = builder
.limit_for_period // 100 requests
.refresh_period // per second
.timeout_duration // Wait up to 500ms
.on_permit_acquired
.build;
let service = layer.layer;
Full examples: ratelimiter.rs | ratelimiter_example.rs
Cache
Cache responses to reduce load on expensive operations:
use ;
use Duration;
let layer = builder
.max_size
.ttl // 5 minute TTL
.eviction_policy // LRU, LFU, or FIFO
.key_extractor
.on_hit
.on_miss
.build;
let service = layer.layer;
Full examples: cache.rs | cache_example.rs
Fallback
Provide fallback responses when the primary service fails:
use FallbackLayer;
// Return a static fallback value on error
let layer = value;
// Or compute fallback from the error
let layer = from_error;
// Or use a backup service
let layer = service;
let service = layer.layer;
Hedge
Reduce tail latency by firing backup requests after a delay:
use HedgeLayer;
use Duration;
// Fire a hedge request if primary takes > 100ms
let layer = builder
.delay
.max_hedged_attempts
.build;
// Or fire all requests in parallel (no delay)
let layer = builder
.no_delay
.max_hedged_attempts
.build;
let service = layer.layer;
Reconnect
Automatically reconnect on connection failures with configurable backoff:
use ;
use Duration;
let layer = new;
let service = layer.layer;
Full examples: reconnect.rs | reconnect_basic.rs | reconnect_custom_policy.rs
Health Check
Proactive health monitoring with intelligent resource selection:
use ;
use Duration;
// Create wrapper with multiple resources
let wrapper = builder
.with_context
.with_context
.with_checker
.with_interval
.with_selection_strategy
.build;
// Start background health checking
wrapper.start.await;
// Get a healthy resource
if let Some = wrapper.get_healthy.await
Note: Health Check is not a Tower layer - it's a wrapper pattern for managing multiple resources with automatic failover.
Full examples: basic.rs
Coalesce
Deduplicate concurrent identical requests (singleflight pattern):
use CoalesceLayer;
use ServiceBuilder;
// Coalesce by request ID - concurrent requests for same ID share one execution
let layer = new;
let service = new
.layer
.service;
// Use with cache to prevent stampede on cache miss
let service = new
.layer // Check cache first
.layer // Coalesce cache misses
.service;
Use cases:
- Cache stampede prevention: When cache expires, only one request refreshes it
- Expensive computations: Deduplicate identical report generation requests
- Rate-limited APIs: Reduce calls to external APIs by coalescing identical requests
Note: Response and error types must implement Clone to be shared with all waiters.
Executor
Delegate request processing to dedicated executors for parallel execution:
use ExecutorLayer;
use ServiceBuilder;
// Use a dedicated runtime for CPU-heavy work
let compute_runtime = new_multi_thread
.worker_threads
.thread_name
.build
.unwrap;
let layer = new;
// Or use the current runtime
let layer = current;
let service = new
.layer
.service;
Use cases:
- CPU-bound processing: Parallelize CPU-intensive request handling
- Runtime isolation: Process requests on a dedicated runtime
- Thread pool delegation: Use specific thread pools for certain workloads
Adaptive Concurrency
Dynamically adjust concurrency limits based on observed latency and error rates:
use ;
use ServiceBuilder;
use Duration;
// AIMD: Classic TCP-style congestion control
// Increases limit on success, decreases on failure/high latency
let layer = new;
// Vegas: More stable, uses RTT to estimate queue depth
let layer = new;
let service = new
.layer
.service;
Use cases:
- Auto-tuning: No manual concurrency limit configuration needed
- Variable backends: Adapts to changing downstream capacity
- Load shedding: Automatically reduces load when backends struggle
Full examples: adaptive.rs
Chaos (Testing Only)
Inject failures and latency to test your resilience patterns:
use ChaosLayer;
use Duration;
let chaos = builder
.name
.error_rate // 10% of requests fail
.error_fn
.latency_rate // 20% delayed
.min_latency
.max_latency
.seed // Deterministic chaos
.build;
let service = chaos.layer;
WARNING: Only use in development/testing environments. Never in production.
Full examples: chaos.rs | chaos_example.rs
Error Handling
ResilienceError<E> provides a unified error type for composed layers:
use ResilienceError;
type ServiceError = ;
let service = new
.layer
.layer
.layer
.service;
// Check error types
if err.is_timeout
if err.is_rate_limited
Pattern Composition
Stack multiple patterns for comprehensive resilience:
use ServiceBuilder;
// Client-side: timeout -> circuit breaker -> retry
let client = new
.layer
.layer
.layer
.service;
// Server-side: rate limit -> bulkhead -> timeout
let server = new
.layer
.layer
.layer
.service;
For comprehensive guidance on composing patterns effectively, see:
- Composition Guide - Pattern selection, recommended stacks, layer ordering, and anti-patterns
- Composition Tests - Working examples of all documented stacks that verify correct compilation
Benchmarks
Happy path overhead (no failures triggered):
| Pattern | Overhead |
|---|---|
| Retry (no retries) | ~80-100 ns |
| Time Limiter | ~107 ns |
| Rate Limiter | ~124 ns |
| Bulkhead | ~162 ns |
| Cache (hit) | ~250 ns |
| Circuit Breaker (closed) | ~298 ns |
Examples
See examples/ for more.
Stress Tests
MSRV
1.64.0 (matches Tower)
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! Please see the contributing guidelines for more information.