Expand description
Circuit breaker pattern for preventing cascading failures.
Implements the circuit breaker reliability pattern to protect Iron Runtime from cascading failures when upstream LLM providers become unavailable. Automatically opens circuits after threshold failures, preventing wasted requests to failing services.
§Purpose
This crate provides fault tolerance for LLM provider integration:
- Detect failing services through failure rate monitoring
- Prevent cascading failures by short-circuiting bad requests
- Auto-recovery with configurable timeout periods
- Per-service state isolation
§Circuit Breaker States
The circuit breaker follows a three-state model:
-
Closed: Normal operation, requests pass through. Failure counter increments on each failure. Transitions to Open when failures reach threshold.
-
Open: Circuit is open, requests fail fast without hitting upstream service. Prevents wasted resources on known-bad endpoints. Transitions to HalfOpen after timeout expires.
-
HalfOpen: Trial period, allows limited requests to test recovery. First success closes circuit. First failure reopens circuit.
§Key Types
CircuitBreaker- Main circuit breaker with per-service state trackingCircuitState- Circuit state enum (Closed, Open, HalfOpen)
§Public API
§Basic Usage
use iron_reliability::CircuitBreaker;
// Create breaker: 5 failures triggers open, 60s timeout
let breaker = CircuitBreaker::new(5, 60);
// Check before making request
if breaker.is_open("openai") {
// Circuit open, fail fast
return Err("Service unavailable");
}
// Make request...
match make_llm_request() {
Ok(response) => {
breaker.record_success("openai");
Ok(response)
}
Err(e) => {
breaker.record_failure("openai");
Err(e)
}
}§Integration Pattern
use iron_reliability::CircuitBreaker;
use std::sync::Arc;
struct LlmRouter {
breaker: Arc<CircuitBreaker>,
}
impl LlmRouter {
fn route_request(&self, provider: &str) -> Result<(), String> {
// Fast-fail if circuit open
if self.breaker.is_open(provider) {
return Err(format!("Circuit open for {}", provider));
}
// Attempt request
match self.call_provider(provider) {
Ok(resp) => {
self.breaker.record_success(provider);
Ok(resp)
}
Err(e) => {
self.breaker.record_failure(provider);
Err(e)
}
}
}
fn call_provider(&self, provider: &str) -> Result<(), String> {
// Implementation...
}
}§Configuration
Circuit breaker behavior is controlled by two parameters:
-
failure_threshold: Number of consecutive failures before opening circuit. Higher values tolerate transient failures. Lower values provide faster detection. Typical: 3-10 failures.
-
timeout_secs: How long circuit stays open before attempting recovery. Longer timeouts reduce load on failing services. Shorter timeouts enable faster recovery. Typical: 30-120 seconds.
§Thread Safety
CircuitBreaker is thread-safe and designed for concurrent access.
Internal state uses Arc<Mutex<>> for safe sharing across request handlers.