tower-resilience
A comprehensive resilience and fault-tolerance toolkit for Tower services, inspired by Resilience4j.
About
Resilience patterns for Tower services, inspired by Resilience4j. Includes circuit breaker, bulkhead, retry with backoff, rate limiting, and more.
Resilience Patterns
- Circuit Breaker - Prevents cascading failures by stopping calls to failing services
- Bulkhead - Isolates resources to prevent system-wide failures
- Time Limiter - Advanced timeout handling with cancellation support
- Retry - Intelligent retry with exponential backoff, jitter, and retry budgets
- Rate Limiter - Controls request rate with fixed or sliding window algorithms
- Cache - Response memoization to reduce load
- Fallback - Graceful degradation when services fail
- Hedge - Reduces tail latency by racing redundant requests
- Reconnect - Automatic reconnection with configurable backoff strategies
- Health Check - Proactive health monitoring with intelligent resource selection
- Executor - Delegates request processing to dedicated executors for parallelism
- Adaptive Concurrency - Dynamic concurrency limiting using AIMD or Vegas algorithms
- Coalesce - Deduplicates concurrent identical requests (singleflight pattern)
- Chaos - Inject failures and latency for testing resilience (development/testing only)
Quick Start
[]
= "0.7"
= "0.5"
use ;
use *;
let circuit_breaker = builder
.failure_rate_threshold
.build;
let service = new
.layer
.layer
.service;
Presets: Get Started in One Line
Every pattern includes preset configurations with sensible defaults. Start immediately without tuning parameters - customize later when you need to:
use RetryLayer;
use CircuitBreakerLayer;
use RateLimiterLayer;
use BulkheadLayer;
// Retry with exponential backoff (3 attempts, 100ms base)
let retry = exponential_backoff.build;
// Circuit breaker with balanced defaults
let breaker = standard.build;
// Rate limit to 100 requests per second
let limiter = per_second.build;
// Limit to 50 concurrent requests
let bulkhead = medium.build;
Available Presets
| Pattern | Presets | Description |
|---|---|---|
| Retry | exponential_backoff() |
3 attempts, 100ms base - balanced default |
aggressive() |
5 attempts, 50ms base - fast recovery | |
conservative() |
2 attempts, 500ms base - minimal overhead | |
| Circuit Breaker | standard() |
50% threshold, 100 calls - balanced |
fast_fail() |
25% threshold, 20 calls - fail fast | |
tolerant() |
75% threshold, 200 calls - high tolerance | |
| Rate Limiter | per_second(n) |
n requests per second |
per_minute(n) |
n requests per minute | |
burst(rate, size) |
Sustained rate with burst capacity | |
| Bulkhead | small() |
10 concurrent calls |
medium() |
50 concurrent calls | |
large() |
200 concurrent calls |
Presets return builders, so you can customize any setting:
// Start with a preset, override what you need
let breaker = fast_fail
.name // Add observability
.wait_duration_in_open // Custom recovery time
.build;
Examples
Circuit Breaker
Prevent cascading failures by opening the circuit when error rate exceeds threshold:
use Layer;
use CircuitBreakerLayer;
use Duration;
let layer = builder
.name
.failure_rate_threshold // Open at 50% failure rate
.sliding_window_size // Track last 100 calls
.wait_duration_in_open // Stay open 60s
.on_state_transition
.build;
let service = layer.layer;
Full examples: circuitbreaker.rs | circuitbreaker_fallback.rs | circuitbreaker_health_check.rs
Bulkhead
Limit concurrent requests to prevent resource exhaustion:
use BulkheadLayer;
use Duration;
let layer = builder
.name
.max_concurrent_calls // Max 10 concurrent
.max_wait_duration // Wait up to 5s
.on_call_permitted
.on_call_rejected
.build;
let service = layer.layer;
Full examples: bulkhead.rs | bulkhead_advanced.rs
Time Limiter
Enforce timeouts on operations with configurable cancellation:
use TimeLimiterLayer;
use Duration;
let layer = builder
.timeout_duration
.cancel_running_future // Cancel on timeout
.on_timeout
.build;
let service = layer.layer;
Full examples: timelimiter.rs | timelimiter_example.rs
Retry
Retry failed requests with exponential backoff and jitter:
use RetryLayer;
use Duration;
let layer = builder
.max_attempts
.exponential_backoff
.on_retry
.on_success
.build;
let service = layer.layer;
Full examples: retry.rs | retry_example.rs
Rate Limiter
Control request rate to protect downstream services:
use RateLimiterLayer;
use Duration;
let layer = builder
.limit_for_period // 100 requests
.refresh_period // per second
.timeout_duration // Wait up to 500ms
.on_permit_acquired
.build;
let service = layer.layer;
Full examples: ratelimiter.rs | ratelimiter_example.rs
Cache
Cache responses to reduce load on expensive operations:
use ;
use Duration;
let layer = builder
.max_size
.ttl // 5 minute TTL
.eviction_policy // LRU, LFU, or FIFO
.key_extractor
.on_hit
.on_miss
.build;
let service = layer.layer;
Full examples: cache.rs | cache_example.rs
Fallback
Provide fallback responses when the primary service fails:
use FallbackLayer;
// Return a static fallback value on error
let layer = value;
// Or compute fallback from the error
let layer = from_error;
// Or use a backup service
let layer = service;
let service = layer.layer;
Hedge
Reduce tail latency by firing backup requests after a delay:
use HedgeLayer;
use Duration;
// Fire a hedge request if primary takes > 100ms
let layer = builder
.delay
.max_hedged_attempts
.build;
// Or fire all requests in parallel (no delay)
let layer = builder
.no_delay
.max_hedged_attempts
.build;
let service = layer.layer;
Note: Hedge requires Req: Clone (requests are cloned for parallel execution) and E: Clone (for error handling). If your types don't implement Clone, consider wrapping them in Arc.
Full examples: hedge.rs
Reconnect
Automatically reconnect on connection failures with configurable backoff:
use ;
use Duration;
let layer = new;
let service = layer.layer;
Full examples: reconnect.rs | reconnect_basic.rs | reconnect_custom_policy.rs
Health Check
Proactive health monitoring with intelligent resource selection:
use ;
use Duration;
// Create wrapper with multiple resources
let wrapper = builder
.with_context
.with_context
.with_checker
.with_interval
.with_selection_strategy
.build;
// Start background health checking
wrapper.start.await;
// Get a healthy resource
if let Some = wrapper.get_healthy.await
Note: Health Check is not a Tower layer - it's a wrapper pattern for managing multiple resources with automatic failover.
Full examples: healthcheck_basic.rs
Coalesce
Deduplicate concurrent identical requests (singleflight pattern):
use CoalesceLayer;
use ServiceBuilder;
// Coalesce by request ID - concurrent requests for same ID share one execution
let layer = new;
let service = new
.layer
.service;
// Use with cache to prevent stampede on cache miss
let service = new
.layer // Check cache first
.layer // Coalesce cache misses
.service;
Use cases:
- Cache stampede prevention: When cache expires, only one request refreshes it
- Expensive computations: Deduplicate identical report generation requests
- Rate-limited APIs: Reduce calls to external APIs by coalescing identical requests
Note: Response and error types must implement Clone to be shared with all waiters.
Executor
Delegate request processing to dedicated executors for parallel execution:
use ExecutorLayer;
use ServiceBuilder;
// Use a dedicated runtime for CPU-heavy work
let compute_runtime = new_multi_thread
.worker_threads
.thread_name
.build
.unwrap;
let layer = new;
// Or use the current runtime
let layer = current;
let service = new
.layer
.service;
Use cases:
- CPU-bound processing: Parallelize CPU-intensive request handling
- Runtime isolation: Process requests on a dedicated runtime
- Thread pool delegation: Use specific thread pools for certain workloads
Adaptive Concurrency
Dynamically adjust concurrency limits based on observed latency and error rates:
use ;
use ServiceBuilder;
use Duration;
// AIMD: Classic TCP-style congestion control
// Increases limit on success, decreases on failure/high latency
let layer = new;
// Vegas: More stable, uses RTT to estimate queue depth
let layer = new;
let service = new
.layer
.service;
Use cases:
- Auto-tuning: No manual concurrency limit configuration needed
- Variable backends: Adapts to changing downstream capacity
- Load shedding: Automatically reduces load when backends struggle
Full examples: adaptive.rs
Chaos (Testing Only)
Inject failures and latency to test your resilience patterns:
use ChaosLayer;
use Duration;
// Types inferred from closure signature - no type parameters needed!
let chaos = builder
.name
.error_rate // 10% of requests fail
.error_fn
.latency_rate // 20% delayed
.min_latency
.max_latency
.seed // Deterministic chaos
.build;
let service = chaos.layer;
WARNING: Only use in development/testing environments. Never in production.
Full examples: chaos.rs | chaos_example.rs
Error Handling
When composing multiple resilience layers, each layer has its own error type (e.g., CircuitBreakerError, BulkheadError). The ResilienceError<E> type unifies these into a single error type, eliminating boilerplate.
The Problem
Without a unified error type, you'd need From implementations for every layer combination:
// Without ResilienceError: ~80 lines of boilerplate for 4 layers
The Solution
Use ResilienceError<E> as your service error type - all layer errors automatically convert:
use ResilienceError;
// Your application error
// That's it! Zero From implementations needed
type ServiceError = ;
Pattern Matching
Handle different failure modes explicitly:
use ResilienceError;
Helper Methods
Quickly check error categories:
if err.is_timeout
if err.is_circuit_open
if err.is_rate_limited
if err.is_application
When to Use
Use ResilienceError<E> when:
- Building new services with multiple resilience layers
- You want zero boilerplate error handling
- Standard error categorization is sufficient
Use manual From implementations when:
- You need very specific error semantics
- Integrating with legacy error types
- You need specialized error logging per layer
See the tower_resilience_core::error module for full documentation.
Pattern Composition
Stack multiple patterns for comprehensive resilience:
use ServiceBuilder;
// Client-side: timeout -> circuit breaker -> retry
let client = new
.layer
.layer
.layer
.service;
// Server-side: rate limit -> bulkhead -> timeout
let server = new
.layer
.layer
.layer
.service;
For comprehensive guidance on composing patterns effectively, see:
- Composition Guide - Pattern selection, recommended stacks, layer ordering, and anti-patterns
- Composition Tests - Working examples of all documented stacks that verify correct compilation
Benchmarks
Happy path overhead (no failures triggered):
| Pattern | Overhead |
|---|---|
| Retry (no retries) | ~80-100 ns |
| Time Limiter | ~107 ns |
| Rate Limiter | ~124 ns |
| Bulkhead | ~162 ns |
| Cache (hit) | ~250 ns |
| Circuit Breaker (closed) | ~298 ns |
Examples
See examples/ for more.
Stress Tests
MSRV
1.64.0 (matches Tower)
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! Please see the contributing guidelines for more information.