theater 0.3.4

A WebAssembly actor system for AI agents
Documentation
# Actor Runtime Refactoring: Before & After Analysis

## The Problem with the Current Implementation

### State Management Complexity
The current implementation manages state through **7+ mutable variables** with complex interactions:

```rust
let mut actor_instance: Option<Arc<RwLock<ActorInstance>>> = None;
let mut metrics: Option<Arc<RwLock<MetricsCollector>>> = None;
let mut handler_tasks: Vec<JoinHandle<()>> = Vec::new();
let mut current_operation: Option<JoinHandle<()>> = None;
let mut shutdown_requested = false;
let mut shutdown_response_tx: Option<oneshot::Sender<...>> = None;
let mut current_status = "Starting".to_string();
```

**Problems:**
- No clear indication of valid states
- Easy to have inconsistent state (e.g., `actor_instance` is Some but `metrics` is None)
- `Option<T>` unwrapping everywhere creates boilerplate and panic risks
- Boolean flags (`shutdown_requested`) don't compose well

### The Giant Select Loop
The current `start()` method has a ~400 line `tokio::select!` with 8+ branches, each handling multiple states implicitly:

```rust
Some(op) = operation_rx.recv(), if actor_instance.is_some() && current_operation.is_none() && !*paused.read().await => {
    // Can only receive operations if:
    // - Setup is complete (actor_instance exists)
    // - No operation is running
    // - Not paused
    // This logic is buried in the guard!
}
```

**Problems:**
- State transition logic is scattered across branches
- Guards on select branches hide important logic
- Hard to see which messages are valid in which states
- Testing individual states is nearly impossible

### Unclear State Transitions
Try answering these questions from the current code:
- Can I pause during startup? (No, but you have to read the control handler to know)
- What happens if shutdown is requested during an operation? (Waits for completion, but this is implicit)
- Can info requests work during startup? (Some can, some can't - depends on the request)

The answers exist in the code, but they're **not obvious**.

## The Solution: Explicit State Machine

### State Definition
```rust
enum ActorState {
    Starting {
        setup_task: JoinHandle<Result<SetupComplete, ActorError>>,
        status_rx: Receiver<String>,
        current_status: String,
        pending_shutdown: Option<oneshot::Sender<Result<(), ActorError>>>,
    },
    Idle {
        resources: ActorResources,
    },
    Processing {
        resources: ActorResources,
        current_operation: JoinHandle<Result<Vec<u8>, ActorError>>,
        operation_name: String,
        pending_shutdown: Option<oneshot::Sender<Result<(), ActorError>>>,
    },
    Paused {
        resources: ActorResources,
    },
    ShuttingDown,
}
```

### Benefits

#### 1. **Impossible States Are Unrepresentable**
You can't have `actor_instance = None` while in the `Processing` state, because `Processing` contains `resources: ActorResources` which includes the instance.

The compiler enforces correctness!

#### 2. **Clear State Transitions**
```rust
loop {
    let next_state = match &mut self.state {
        ActorState::Starting { .. } => self.handle_starting_state().await,
        ActorState::Idle { .. } => self.handle_idle_state().await,
        ActorState::Processing { .. } => self.handle_processing_state().await,
        ActorState::Paused { .. } => self.handle_paused_state().await,
        ActorState::ShuttingDown => break,
    };
    
    match next_state {
        StateTransition::Continue(new_state) => self.state = new_state,
        StateTransition::Shutdown => {
            self.transition_to_shutdown().await;
            break;
        }
        StateTransition::Error(error) => {
            self.notify_error(error).await;
            self.transition_to_shutdown().await;
            break;
        }
    }
}
```

**Every state transition is explicit and visible!**

#### 3. **Each State Handler is Focused**
Instead of one giant select handling all states:

```rust
async fn handle_idle_state(&mut self) -> StateTransition {
    let resources = /* extract from state */;
    
    tokio::select! {
        Some(op) = self.operation_rx.recv() => {
            // Start operation
            StateTransition::Continue(ActorState::Processing { ... })
        }
        Some(control) = self.control_rx.recv() => {
            match control {
                ActorControl::Pause { response_tx } => {
                    StateTransition::Continue(ActorState::Paused { resources })
                }
                // ...
            }
        }
        // ...
    }
}
```

**Much easier to understand!** Each handler only deals with messages relevant to that state.

#### 4. **Easier Testing**
You can test individual state handlers:

```rust
#[tokio::test]
async fn test_pause_during_idle() {
    let mut machine = create_test_machine(ActorState::Idle { ... });
    
    send_control_message(&machine, ActorControl::Pause);
    
    let transition = machine.handle_idle_state().await;
    
    assert!(matches!(transition, StateTransition::Continue(ActorState::Paused { .. })));
}
```

#### 5. **Better Error Handling**
Errors are handled at the state machine level:

```rust
StateTransition::Error(error) => {
    self.notify_error(error).await;
    self.transition_to_shutdown().await;
    break;
}
```

No more scattered error handling!

## Side-by-Side Comparison

### Handling Shutdown During Operation

**Before (implicit):**
```rust
ActorControl::Shutdown { response_tx } => {
    if setup_task.is_some() {
        shutdown_requested = true;
        shutdown_response_tx = Some(response_tx);
    } else if current_operation.is_some() {
        shutdown_requested = true;
        shutdown_response_tx = Some(response_tx);
    } else {
        let _ = response_tx.send(Ok(()));
        break;
    }
}
```

**After (explicit):**
```rust
// In handle_processing_state()
ActorControl::Shutdown { response_tx } => {
    info!("Shutdown requested during operation - will complete after operation");
    StateTransition::Continue(ActorState::Processing {
        resources,
        current_operation,
        operation_name,
        pending_shutdown: Some(response_tx), // Clear intent!
    })
}
```

### Handling Operation Completion

**Before:**
```rust
_ = async {
    match current_operation.as_mut() {
        Some(task) => task.await,
        None => std::future::pending().await,
    }
} => {
    info!("Operation completed");
    current_operation = None;

    // Check if shutdown was requested and no more operations are running
    if shutdown_requested {
        if let Some(response_tx) = shutdown_response_tx.take() {
            let _ = response_tx.send(Ok(()));
        }
        break;
    }
}
```

**After:**
```rust
// In handle_processing_state()
result = current_operation => {
    info!("Operation '{}' completed", operation_name);
    
    // If shutdown was pending, do it now
    if let Some(response_tx) = pending_shutdown {
        let _ = response_tx.send(Ok(()));
        return StateTransition::Shutdown;
    }

    StateTransition::Continue(ActorState::Idle { resources })
}
```

## Metrics

### Lines of Code
- **Current `start()` method:** ~400 lines
- **Refactored:**
  - `run()` loop: ~40 lines
  - `handle_starting_state()`: ~80 lines
  - `handle_idle_state()`: ~60 lines
  - `handle_processing_state()`: ~70 lines
  - `handle_paused_state()`: ~40 lines
  - **Total:** ~290 lines, but **much more readable**

### Cognitive Complexity
- **Current:** High - need to track 7+ variables and their interactions
- **Refactored:** Low - each state handler is self-contained

### Testability
- **Current:** Hard - need to mock the entire runtime to test specific scenarios
- **Refactored:** Easy - can test individual state handlers in isolation

## Migration Path

You don't have to do this all at once! Here's a suggested migration:

1. **Phase 1:** Create the new state enum and `ActorResources` struct
2. **Phase 2:** Extract one state handler (e.g., `handle_idle_state()`)
3. **Phase 3:** Gradually migrate other states
4. **Phase 4:** Replace the old implementation once all states are migrated
5. **Phase 5:** Add tests for individual state handlers

## Additional Benefits

### Documentation
The state machine *is* the documentation:
```rust
enum ActorState {
    Starting { /* ... */ },  // Actor is loading
    Idle { /* ... */ },      // Waiting for work
    Processing { /* ... */ }, // Executing operation
    Paused { /* ... */ },    // Paused by user
    ShuttingDown,            // Cleaning up
}
```

Anyone can understand the actor lifecycle at a glance!

### Debugging
State transitions are logged:
```
Actor abc123 state: Starting -> Idle
Actor abc123 state: Idle -> Processing (operation: calculate)
Actor abc123 state: Processing -> Idle
Actor abc123 state: Idle -> ShuttingDown
```

Much easier to debug than tracking boolean flags!

### Future Extensions
Want to add a new state like "Suspended" or "Upgrading"? Just add it to the enum:

```rust
enum ActorState {
    // ... existing states ...
    Suspended {
        resources: ActorResources,
        snapshot: Snapshot,
    },
}
```

And implement `handle_suspended_state()`. The compiler will tell you everywhere you need to handle it!

## Conclusion

The explicit state machine:
- ✅ Makes impossible states unrepresentable
- ✅ Makes state transitions clear and explicit
- ✅ Reduces cognitive load
- ✅ Improves testability
- ✅ Better error handling
- ✅ Self-documenting
- ✅ Easier to extend

**This is a high-impact refactoring that will pay dividends for the lifetime of the project.**