allframe 0.1.28

# Resilience Architecture Refactoring Plan

**Status**: 🚧 Active Development
**Priority**: P0 (Critical for Clean Architecture compliance)
**Timeline**: Long-term (Q1 2025)

---

## Executive Summary

The current resilience implementation (`#[retry]`, `#[circuit_breaker]`, `#[rate_limited]` macros) violates Clean Architecture principles by injecting infrastructure-level code directly into domain and application layer functions. This refactoring will:

- **Move resilience logic to proper architectural layers**
- **Maintain full backward compatibility**
- **Establish patterns for future infrastructure features**
- **Improve testability and maintainability**

---

## Current Architectural Violation

### Problem
```rust
// Current: Infrastructure code injected into domain layer
#[retry(max_retries = 3)]
async fn business_operation(&self) -> Result<BusinessResult, BusinessError> {
    // This function now depends on RetryExecutor, RetryConfig, etc.
    // VIOLATION: Domain layer knows about infrastructure concerns
}
```

### Root Cause
- Macros inject infrastructure types directly into function bodies
- No separation between business logic and resilience policies
- Infrastructure concerns bleed into domain layer

---

## Target Architecture

### Clean Architecture Layers

```
┌─────────────────────────────────────┐
│         Presentation Layer          │
│  (REST, GraphQL, gRPC handlers)     │
│  - HTTP status codes, serialization │
└─────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────┐
│       Application Layer            │
│  (Use Cases, Orchestration)        │
│  - Business workflows              │
│  - Transaction coordination        │
│  - Resilience orchestration        │
└─────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────┐
│         Domain Layer                │
│  (Business Logic, Entities)        │
│  - Pure business rules             │
│  - Domain models                   │
│  - Resilience contracts            │
└─────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────┐
│     Infrastructure Layer           │
│  (External Dependencies)           │
│  - Retry implementations           │
│  - Circuit breaker state           │
│  - Rate limiting storage           │
│  - External service clients        │
└─────────────────────────────────────┘
```

### Resilience Flow

```
Domain Layer (Contracts)
    ↓
Application Layer (Orchestration)
    ↓
Infrastructure Layer (Implementation)
```

---

## Implementation Phases

### Phase 1: Domain Layer Resilience Contracts ✅

**Goal**: Define resilience contracts in the domain layer without implementation details.

**Deliverables**:
- `ResilientOperation` trait for operations that need resilience
- `ResiliencePolicy` enum for declaring resilience requirements
- `ResilienceError` types for domain-level error handling

**Example**:
```rust
// Domain layer - pure contracts
#[derive(ResiliencePolicy)]
pub enum PaymentResiliencePolicy {
    ProcessPayment {
        max_retries: u32,
        timeout_seconds: u64,
    },
    CheckStatus {
        circuit_breaker: bool,
    },
}

pub trait PaymentService: Send + Sync {
    async fn process_payment(
        &self,
        payment: Payment,
        policy: ResiliencePolicy,
    ) -> Result<PaymentResult, PaymentError>;
}
```

### Phase 2: Application Layer Orchestration 🚧

**Goal**: Application layer orchestrates resilience without domain knowing implementation details.

**Deliverables**:
- `ResilienceOrchestrator` trait for wiring policies to implementations
- Application-level resilience configuration
- Policy-to-implementation mapping

**Example**:
```rust
// Application layer - orchestration
pub struct PaymentUseCase {
    payment_service: Arc<dyn PaymentService>,
    resilience_orchestrator: Arc<ResilienceOrchestrator>,
}

impl PaymentUseCase {
    pub async fn process_payment(&self, payment: Payment) -> Result<PaymentResult, UseCaseError> {
        let policy = ResiliencePolicy::Retry {
            max_attempts: 3,
            backoff: ExponentialBackoff::default(),
        };

        // Orchestrator handles the resilience implementation
        self.resilience_orchestrator
            .execute_with_policy(policy, || async {
                self.payment_service.process_payment(payment).await
            })
            .await
    }
}
```

### Phase 3: Infrastructure Layer Implementation ✅

**Goal**: Infrastructure provides concrete resilience implementations.

**Deliverables**:
- `RetryExecutor` implementations
- `CircuitBreaker` implementations
- `RateLimiter` implementations
- `ResilienceOrchestrator` concrete implementation

### Phase 4: Macro Refactoring (Current Target)

**Goal**: Update macros to use the new architectural pattern.

**Deliverables**:
- Backward-compatible macro API
- New architectural macro variants
- Migration guide for existing code

**Migration Path**:
```rust
// Old way (infrastructure injection - VIOLATION)
#[retry(max_retries = 3)]
async fn business_logic() -> Result<(), Error> { /* ... */ }

// New way (architecturally clean)
#[resilient(policy = "retry(max_retries = 3)")]
async fn business_logic() -> Result<(), Error> { /* ... */ }

// Or at application layer
let result = orchestrator.execute_with_policy(
    ResiliencePolicy::Retry { max_attempts: 3 },
    || business_logic()
).await;
```

---

## Detailed Implementation Plan

### 1. Domain Layer Contracts (Week 1-2)

**Create `resilience` module in domain layer:**

```rust
// crates/allframe-core/src/domain/resilience.rs

/// Resilience policies that domain can declare without knowing implementation
#[derive(Clone, Debug)]
pub enum ResiliencePolicy {
    None,
    Retry {
        max_attempts: u32,
        backoff: BackoffStrategy,
    },
    CircuitBreaker {
        failure_threshold: u32,
        recovery_timeout: Duration,
    },
    RateLimit {
        requests_per_second: u32,
        burst_capacity: u32,
    },
    Timeout {
        duration: Duration,
    },
}

/// Domain-level error that can be converted to infrastructure errors
#[derive(thiserror::Error, Debug)]
pub enum ResilienceDomainError {
    #[error("Operation timed out")]
    Timeout,

    #[error("Operation failed after {attempts} attempts")]
    RetryExhausted { attempts: u32 },

    #[error("Circuit breaker is open")]
    CircuitOpen,

    #[error("Rate limit exceeded")]
    RateLimited,

    #[error("Infrastructure error: {0}")]
    Infrastructure(#[from] Box<dyn std::error::Error + Send + Sync>),
}

/// Trait for operations that declare resilience requirements
pub trait ResilientOperation<T, E> {
    fn resilience_policy(&self) -> ResiliencePolicy;
    async fn execute(&self) -> Result<T, E>;
}
```

### 2. Application Layer Orchestration (Week 3-4)

**Create resilience orchestrator:**

```rust
// crates/allframe-core/src/application/resilience.rs

/// Orchestrates resilience policies across infrastructure implementations
#[async_trait::async_trait]
pub trait ResilienceOrchestrator: Send + Sync {
    async fn execute_with_policy<T, F, Fut, E>(
        &self,
        policy: ResiliencePolicy,
        operation: F,
    ) -> Result<T, ResilienceError>
    where
        F: FnOnce() -> Fut + Send,
        Fut: Future<Output = Result<T, E>> + Send,
        E: Into<ResilienceError> + Send;

    fn get_circuit_breaker(&self, name: &str) -> Option<&CircuitBreaker>;
    fn get_rate_limiter(&self, name: &str) -> Option<&RateLimiter>;
}

/// Default implementation using infrastructure layer
pub struct DefaultResilienceOrchestrator {
    retry_executor: Arc<RetryExecutor>,
    circuit_breakers: HashMap<String, CircuitBreaker>,
    rate_limiters: HashMap<String, RateLimiter>,
}
```

### 3. Infrastructure Layer Implementation (Week 5-6)

**Refactor existing infrastructure to implement orchestrator:**

```rust
// crates/allframe-core/src/infrastructure/resilience.rs

impl ResilienceOrchestrator for DefaultResilienceOrchestrator {
    async fn execute_with_policy<T, F, Fut, E>(
        &self,
        policy: ResiliencePolicy,
        operation: F,
    ) -> Result<T, ResilienceError>
    where
        // ... trait bounds
    {
        match policy {
            ResiliencePolicy::None => operation().await.map_err(Into::into),
            ResiliencePolicy::Retry { max_attempts, backoff } => {
                self.retry_executor
                    .execute_with_config(
                        RetryConfig::new(max_attempts).with_backoff(backoff),
                        operation,
                    )
                    .await
            }
            ResiliencePolicy::CircuitBreaker { failure_threshold, recovery_timeout } => {
                let cb = self.get_or_create_circuit_breaker("default", failure_threshold, recovery_timeout);
                cb.call(operation).await
            }
            // ... other policy implementations
        }
    }
}
```

### 4. Macro Backward Compatibility (Week 7-8)

**Create new macros that use architectural pattern:**

```rust
// crates/allframe-macros/src/resilience.rs

/// New architectural macro - injects orchestration at application layer
#[proc_macro_attribute]
pub fn resilient(attr: TokenStream, item: TokenStream) -> TokenStream {
    // Parse policy from attribute
    // Generate code that uses ResilienceOrchestrator
    // Maintains domain layer purity
}

/// Keep old macros for backward compatibility with deprecation warnings
#[proc_macro_attribute]
pub fn retry(attr: TokenStream, item: TokenStream) -> TokenStream {
    // Add deprecation warning
    // Delegate to new resilient macro with retry policy
}
```

---

## Testing Strategy

### 1. Architectural Compliance Tests
- **Domain Layer Isolation**: Ensure domain layer has no infrastructure dependencies
- **Dependency Direction**: Verify dependencies only flow inward
- **Mock Infrastructure**: Test domain logic with mocked infrastructure

### 2. Backward Compatibility Tests
- **Existing Code**: All existing `#[retry]` usage continues to work
- **Deprecation Warnings**: Old macros show warnings pointing to new patterns
- **Performance**: New architecture doesn't degrade performance

### 3. Integration Tests
- **End-to-End**: Full request flow with resilience policies
- **Failure Scenarios**: Circuit breaker opens, retries exhausted
- **Policy Configuration**: Runtime policy changes work correctly

---

## Migration Guide

### For Existing Users

**Immediate (No Changes Required):**
```rust
// This continues to work unchanged
#[retry(max_retries = 3)]
async fn my_function() -> Result<(), Error> { /* ... */ }
```

**Recommended (Architecturally Clean):**
```rust
// New pattern - resilience declared at domain layer
#[derive(ResiliencePolicy)]
struct MyPolicy {
    retry: RetryPolicy,
    circuit_breaker: CircuitBreakerPolicy,
}

impl ResilientOperation for MyOperation {
    fn resilience_policy(&self) -> ResiliencePolicy {
        ResiliencePolicy::Retry { max_attempts: 3 }
    }
}

// Application layer orchestration
let result = orchestrator.execute_with_policy(
    operation.resilience_policy(),
    || operation.execute()
).await;
```

---

## Success Metrics

### 1. Architectural Compliance
- ✅ Domain layer has zero infrastructure dependencies
- ✅ Dependencies flow inward only
- ✅ Infrastructure can be swapped without domain changes

### 2. Performance
- ✅ No performance degradation (<5% overhead)
- ✅ Memory usage remains stable
- ✅ Compilation time impact minimal

### 3. Developer Experience
- ✅ Backward compatibility maintained
- ✅ Clear migration path provided
- ✅ Better error messages and debugging

### 4. Testability
- ✅ Domain logic can be tested without infrastructure
- ✅ Infrastructure can be tested in isolation
- ✅ Integration tests cover full resilience flows

---

## Risk Assessment

### High Risk
- **Breaking Changes**: Must maintain 100% backward compatibility
- **Performance Impact**: New abstraction layer could add overhead
- **Complexity**: More layers increase cognitive load

### Mitigation
- **Gradual Migration**: Old macros continue working with warnings
- **Performance Benchmarking**: Measure and optimize abstraction overhead
- **Documentation**: Comprehensive guides for new patterns

---

## Timeline

- **Week 1-2**: Domain layer contracts ✅
- **Week 3-4**: Application layer orchestration 🚧
- **Week 5-6**: Infrastructure implementation ✅
- **Week 7-8**: Macro refactoring and testing
- **Week 9-10**: Integration testing and documentation
- **Week 11-12**: Performance optimization and final validation

**Target Completion**: End of Q1 2025
**Risk Level**: Medium (architectural change with backward compatibility requirements)

---

**Status**: Phase 2 in progress - Application layer orchestration implementation.