iron_reliability 0.3.0

Resilience patterns for distributed systems
Documentation
# iron_reliability - Specification

**Module:** iron_reliability
**Layer:** 3 (Feature)
**Status:** Active

> **Specification Philosophy:** This specification focuses on architectural-level design and well-established knowledge. It describes what the module does and why, not implementation details or algorithms. Implementation constraints are minimal to allow flexibility. For detailed requirements, see spec/-archived_detailed_spec.md.

---

## Responsibility

Circuit breaker patterns and retry logic for LLM provider reliability. Prevents cascade failures, implements automatic fallback chains, handles transient errors with exponential backoff.

---

## Scope

**In Scope:**
- Circuit breaker (open/half-open/closed states)
- Retry logic with exponential backoff
- Fallback chain management (primary → secondary → tertiary providers)
- Timeout enforcement for LLM calls
- Failure tracking and recovery

**Out of Scope:**
- Cost tracking (see iron_cost)
- Safety validation (see iron_safety)
- Observability export (see iron_telemetry)

---

## Dependencies

**Required Modules:**
- iron_types - Foundation types
- iron_telemetry - Error logging

**Required External:**
- tokio - Async runtime for timeouts

**Optional:**
- None

---

## Core Concepts

**Key Components:**
- **Circuit Breaker:** Prevents cascade failures, tracks failure rate
- **Retry Manager:** Handles transient errors with exponential backoff
- **Fallback Chain:** Routes to backup providers when primary fails
- **Timeout Enforcer:** Prevents hanging requests

---

## Integration Points

**Used by:**
- iron_runtime - Wraps all LLM calls with reliability patterns

**Uses:**
- iron_telemetry - Logs circuit breaker state changes and retry attempts

---

*For detailed circuit breaker logic, see spec/-archived_detailed_spec.md*
*For architecture, see docs/architecture/002_layer_model.md*