duroxide 0.1.27

# Proposal: Orchestration Instance Caching (Decoupled Replay Lifetime)

## Problem Statement

Today, the orchestration dispatcher follows this lifecycle for every message receive:

```
1. Fetch orchestration item (instance lock + history + messages)
2. Rehydrate: Load full history from storage
3. Replay: Execute orchestration from beginning to rebuild in-memory state
4. Process: Feed new messages, execute one turn
5. Persist: Commit history delta + new work items
6. Dehydrate: Release lock, discard in-memory orchestration state
7. Repeat from step 1 for next message
```

**The Problem**: When messages trickle in for an orchestration (not fast enough to batch, not slow enough to justify full rehydration), we pay the full CPU + I/O cost of replay for every single message. For orchestrations with long histories, this becomes very expensive.

### Cost Breakdown Per Message Today

| Step | I/O | CPU | Memory |
|------|-----|-----|--------|
| Fetch item | DB read (history + queue) | Minimal | Load history |
| Replay | None | O(history_len) - re-execute all awaits | Full orchestration state |
| Process | None | O(1) - one new event | Delta events |
| Persist | DB write | Minimal | None |
| Dehydrate | None | None | Free all |

For an orchestration with 1000 events in history receiving 10 messages 100ms apart:
- **Today**: 10 × full replay = 10 × O(1000) = O(10,000) replay operations
- **With caching**: 1 × full replay + 9 × incremental = O(1000) + 9 × O(1) ≈ O(1000) replay operations

## Proposed Solutions

### Option 1: In-Memory Instance Cache with Extended Lock

**Concept**: Keep hydrated orchestration instances in memory with an extended lock, allowing subsequent messages to be processed without replay.

```
┌─────────────────────────────────────────────────────────────────┐
│                    Orchestration Worker                          │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────────┐│
│  │              In-Memory Instance Cache                        ││
│  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐            ││
│  │  │ instance-A  │ │ instance-B  │ │ instance-C  │            ││
│  │  │ - history   │ │ - history   │ │ - history   │            ││
│  │  │ - state     │ │ - state     │ │ - state     │            ││
│  │  │ - lock      │ │ - lock      │ │ - lock      │            ││
│  │  │ - last_msg  │ │ - last_msg  │ │ - last_msg  │            ││
│  │  └─────────────┘ └─────────────┘ └─────────────┘            ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
```

**Lifecycle**:
```
1. Fetch message for instance
2. Check cache: instance in cache?
   ├─ YES: Use cached state, skip replay
   └─ NO: Fetch history, replay, cache instance
3. Process message with cached state
4. Persist history delta
5. Update cache state
6. Start idle timer (configurable, e.g., 5-30 seconds)
7. If new message arrives before timer: goto step 3
8. If timer expires: evict from cache, release lock
```

**Configuration Options**:
```rust
pub struct InstanceCacheOptions {
    /// Enable instance caching
    pub enabled: bool,
    
    /// Max time to keep an instance in cache after last message
    /// Default: 10 seconds
    pub idle_timeout: Duration,
    
    /// Max instances to keep in cache per worker
    /// Uses LRU eviction when exceeded
    /// Default: 100
    pub max_cached_instances: usize,
    
    /// Max time to hold an instance lock (extended from normal lock timeout)
    /// Must be >= idle_timeout
    /// Default: 60 seconds
    pub extended_lock_timeout: Duration,
    
    /// Max history size (events) before forcing eviction
    /// Prevents memory bloat from very large histories
    /// Default: 10,000 events
    pub max_history_size: usize,
}
```

**Pros**:
- Significant CPU savings for trickling messages
- Transparent to orchestration code
- Configurable tradeoffs (memory vs CPU)
- Works with existing provider interface

**Cons**:
- Increased memory usage
- Longer lock hold times reduce parallelism across workers
- Complexity in cache eviction and lock management
- Risk of stale state if provider is modified externally

**Implementation Complexity**: Medium

---

### Option 2: Message Coalescing / Delayed Dequeue

**Concept**: Instead of processing messages immediately, wait briefly for more messages to arrive and batch them together.

```
Message arrives → Start coalesce timer (e.g., 100ms)
                         │
                         ▼
            ┌────────────────────────┐
            │  Coalesce Window       │
            │  - Collect messages    │
            │  - Max wait: 100ms     │
            │  - Max batch: 50       │
            └────────────────────────┘
                         │
                         ▼
              Process entire batch
              (single replay)
```

**Provider Changes**:
```rust
pub trait Provider {
    /// Fetch orchestration item with coalescing
    /// Waits up to `coalesce_timeout` for additional messages
    async fn fetch_orchestration_item_coalesced(
        &self,
        lock_timeout: Duration,
        poll_timeout: Duration,
        coalesce_timeout: Duration,
        max_batch_size: usize,
    ) -> Result<Option<(OrchestrationItem, String, u32)>, ProviderError>;
}
```

**Configuration**:
```rust
pub struct CoalesceOptions {
    /// Enable message coalescing
    pub enabled: bool,
    
    /// How long to wait for additional messages
    /// Default: 100ms
    pub coalesce_window: Duration,
    
    /// Maximum messages to batch
    /// Default: 50
    pub max_batch_size: usize,
    
    /// Only coalesce if history size exceeds this threshold
    /// Small orchestrations don't benefit from coalescing
    /// Default: 100 events
    pub min_history_for_coalesce: usize,
}
```

**Pros**:
- Simple conceptually
- No memory overhead between batches
- Works well for bursty workloads
- Minimal changes to existing architecture

**Cons**:
- Adds latency (coalesce window) to every message
- Doesn't help if messages are spaced > coalesce_window apart
- May hurt throughput for fast message streams (waiting when not needed)
- Tuning the window is workload-dependent

**Implementation Complexity**: Low

---

### Option 3: Incremental Replay with Checkpoints

**Concept**: Periodically snapshot the orchestration state so replay can start from the checkpoint instead of from the beginning.

```
History:  [E1][E2][E3][E4][E5][CP1][E6][E7][E8][E9][E10][CP2][E11][E12]
                            │                        │
                     Checkpoint 1              Checkpoint 2
                     (at event 5)              (at event 10)

To process E13:
  - Load checkpoint CP2 (state at E10)
  - Replay only E11, E12
  - Process E13
```

**Checkpoint Data**:
```rust
pub struct OrchestrationCheckpoint {
    /// Event ID this checkpoint was taken after
    pub after_event_id: u64,
    
    /// Serialized orchestration state
    /// This requires orchestrations to be serializable
    pub state: Vec<u8>,
    
    /// Pending actions at checkpoint time
    pub pending_actions: Vec<Action>,
    
    /// Completion map state
    pub completion_map: CompletionMapSnapshot,
    
    /// Checkpoint creation timestamp
    pub created_at: DateTime<Utc>,
}
```

**Provider Changes**:
```rust
pub trait Provider {
    /// Store a checkpoint for an instance
    async fn store_checkpoint(
        &self,
        instance: &str,
        execution_id: u64,
        checkpoint: OrchestrationCheckpoint,
    ) -> Result<(), ProviderError>;
    
    /// Load the latest checkpoint for an instance
    async fn load_checkpoint(
        &self,
        instance: &str,
        execution_id: u64,
    ) -> Result<Option<OrchestrationCheckpoint>, ProviderError>;
    
    /// Fetch orchestration item with checkpoint
    async fn fetch_orchestration_item_with_checkpoint(
        &self,
        lock_timeout: Duration,
        poll_timeout: Duration,
    ) -> Result<Option<(OrchestrationItem, Option<OrchestrationCheckpoint>, String, u32)>, ProviderError>;
}
```

**Checkpoint Strategy Options**:
```rust
pub enum CheckpointStrategy {
    /// Checkpoint every N events
    EveryNEvents(usize),
    
    /// Checkpoint when history size exceeds threshold
    OnHistorySize(usize),
    
    /// Checkpoint after specific event types (e.g., ActivityCompleted)
    OnEventType(Vec<EventKind>),
    
    /// Time-based checkpointing
    EveryDuration(Duration),
    
    /// Adaptive: checkpoint when replay cost exceeds threshold
    Adaptive { max_replay_events: usize },
}
```

**Pros**:
- Bounds replay cost regardless of history length
- Works well for very long-running orchestrations
- Checkpoints can be garbage collected with history compaction
- No lock extension needed

**Cons**:
- Requires serializable orchestration state (major constraint!)
- Storage overhead for checkpoints
- Checkpoint creation adds overhead
- Complexity in checkpoint validation/versioning
- **BREAKING**: Current design relies on replay from history, not serialized state

**Implementation Complexity**: High (requires fundamental changes to execution model)

---

### Option 4: Hybrid - Instance Cache with Fallback Coalescing

**Concept**: Combine Options 1 and 2 for best of both worlds.

```
┌──────────────────────────────────────────────────────────────────┐
│                      Message Arrival                              │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────────┐
              │   Is instance in cache?       │
              └───────────────────────────────┘
                    │                │
                   YES               NO
                    │                │
                    ▼                ▼
           ┌─────────────┐   ┌─────────────────────┐
           │ Process     │   │ Start coalesce      │
           │ immediately │   │ timer (100ms)       │
           │ with cache  │   └─────────────────────┘
           └─────────────┘              │
                                        ▼
                              ┌─────────────────────┐
                              │ Fetch + Replay +    │
                              │ Process batch       │
                              │ + Cache instance    │
                              └─────────────────────┘
```

**Benefits**:
- Cached instances get immediate processing (no latency)
- Cold instances benefit from coalescing (batch replay)
- Automatic cache warming for active orchestrations
- Graceful degradation when cache is full

**Configuration**:
```rust
pub struct HybridCacheOptions {
    // Instance cache settings
    pub cache: InstanceCacheOptions,
    
    // Coalescing for cache misses
    pub coalesce_on_miss: CoalesceOptions,
    
    /// Whether to coalesce even for cached instances
    /// (wait briefly for more messages before processing)
    pub coalesce_on_hit: bool,
}
```

**Implementation Complexity**: Medium-High

---

### Option 5: Sticky Instance Routing

**Concept**: Route messages for the same instance to the same worker, maximizing cache hit rate.

```
                    ┌─────────────────────┐
                    │   Message Router    │
                    │  (consistent hash)  │
                    └─────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
         ▼                    ▼                    ▼
   ┌───────────┐        ┌───────────┐        ┌───────────┐
   │ Worker 0  │        │ Worker 1  │        │ Worker 2  │
   │           │        │           │        │           │
   │ Cache:    │        │ Cache:    │        │ Cache:    │
   │ inst-A    │        │ inst-B    │        │ inst-C    │
   │ inst-D    │        │ inst-E    │        │ inst-F    │
   └───────────┘        └───────────┘        └───────────┘
```

**Routing Strategies**:
```rust
pub enum RoutingStrategy {
    /// Hash instance ID to worker
    ConsistentHash,
    
    /// Track which worker has instance cached, route to it
    CacheAware,
    
    /// Prefer last worker that processed instance
    Affinity { max_age: Duration },
}
```

**Pros**:
- Maximizes cache efficiency
- Works with Option 1 (Instance Cache)
- Natural load balancing with consistent hashing

**Cons**:
- Requires coordination between workers (or provider support)
- Hot instances can overload single worker
- Complexity in failover when worker dies
- May not work well with external queue systems

**Implementation Complexity**: Medium-High

---

## Recommendation

For duroxide, I recommend **Option 4 (Hybrid - Instance Cache with Fallback Coalescing)** with a phased implementation:

### Phase 1: Message Coalescing (Option 2)
- Low complexity, immediate benefit
- Add `coalesce_window` to `fetch_orchestration_item`
- Provider waits briefly for additional messages before returning
- No runtime changes needed

### Phase 2: In-Memory Instance Cache (Option 1)
- Add instance cache to orchestration dispatcher
- Extend lock while instance is cached
- LRU eviction with configurable limits
- Background lock renewal for cached instances

### Phase 3: Sticky Routing (Option 5)
- Only if multi-worker scenarios show cache thrashing
- Add cache-aware routing hints to provider

### Why Not Checkpoints (Option 3)?

Checkpointing requires serializable orchestration state, which is a fundamental change to the execution model. Rust closures and async state machines are not easily serializable. This would require:
- Custom serialization for all user orchestration state
- Versioning for checkpoint compatibility across code changes
- Careful handling of captured references and runtime state

This is a much larger undertaking and may not be feasible without significant API changes.

---

## API Sketch for Phase 1 + 2

### Runtime Options

```rust
pub struct RuntimeOptions {
    // ... existing options ...
    
    /// Instance caching configuration
    pub instance_cache: InstanceCacheConfig,
}

pub struct InstanceCacheConfig {
    /// Enable the instance cache
    /// Default: false (opt-in initially)
    pub enabled: bool,
    
    /// How long to keep instance in cache after last message
    /// Default: 10 seconds
    pub idle_timeout: Duration,
    
    /// Maximum instances to cache per worker
    /// Default: 100
    pub max_instances: usize,
    
    /// Maximum history size (events) before forcing eviction
    /// Default: 10_000
    pub max_history_size: usize,
    
    /// Coalesce window for cache misses
    /// How long to wait for additional messages when instance is not cached
    /// Default: 100ms
    pub coalesce_window: Duration,
    
    /// Maximum messages to coalesce
    /// Default: 50
    pub max_coalesce_batch: usize,
}
```

### New Provider Method (Optional Enhancement)

```rust
pub trait Provider {
    // ... existing methods ...
    
    /// Peek for additional messages for an instance (non-blocking)
    /// Used to collect more messages while coalescing
    async fn peek_orchestration_messages(
        &self,
        instance: &str,
        lock_token: &str,
        max_messages: usize,
    ) -> Result<Vec<WorkItem>, ProviderError>;
}
```

### Cache Structure

```rust
struct CachedInstance {
    /// Instance identifier
    instance: String,
    
    /// Hydrated history manager (maintains full state)
    history_manager: HistoryManager,
    
    /// Current lock token
    lock_token: String,
    
    /// When the lock expires
    lock_expires_at: Instant,
    
    /// Last activity timestamp (for idle eviction)
    last_activity: Instant,
    
    /// Execution ID for this cached state
    execution_id: u64,
}

struct InstanceCache {
    /// LRU cache of instances
    instances: LruCache<String, CachedInstance>,
    
    /// Configuration
    config: InstanceCacheConfig,
}
```

---

## Metrics to Add

```rust
// Cache metrics
orchestration_cache_hits_total
orchestration_cache_misses_total
orchestration_cache_evictions_total{reason="idle|lru|size|error"}
orchestration_cache_size_current
orchestration_replay_events_saved_total  // Events skipped due to cache

// Coalescing metrics
orchestration_coalesce_batches_total
orchestration_coalesce_messages_per_batch  // Histogram
orchestration_coalesce_wait_duration_seconds  // Histogram
```

---

## Open Questions

1. **Lock extension strategy**: Should we proactively extend locks on a timer, or extend on each message processed?

2. **Cache coherence**: What happens if an external event is sent to a cached instance from a different worker?

3. **Graceful degradation**: When cache is full, should we evict LRU or reject new cache entries?

4. **Provider changes**: Should coalescing be a provider responsibility or runtime responsibility?

5. **Memory limits**: Should we track total cache memory usage, not just instance count?

---

## Next Steps

1. Discuss and refine this proposal
2. Choose implementation phases
3. Design detailed API for selected options
4. Implement Phase 1 (Coalescing) as MVP
5. Measure impact before proceeding to Phase 2