# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.1.21] - 2026-03-06
**Release:** <https://crates.io/crates/duroxide/0.1.21>
### Fixed
- **Orphan queue message handling** — `QueueMessage` items enqueued before an orchestration
starts are now dropped (deleted) with a warning instead of being left in the queue (which
caused a busy-loop in the SQLite provider) or silently lost (in PG providers). Non-QueueMessage
work items (e.g., `CancelInstance`) that race with `StartOrchestration` are correctly kept
in the queue for retry.
### Added
- **`test_orphan_queue_messages_dropped`** — New provider validation test verifying that
QueueMessage items for non-existent instances are dropped, while QueueMessage items for
existing instances are kept and returned.
### Changed
- **`sample_config_hot_reload_persistent_events_fs`** e2e test adapted to start the
orchestration before enqueuing events (matching the corrected orphan message semantics).
### Documentation
- Updated ORCHESTRATION-GUIDE.md, external-events.md, provider-implementation-guide.md,
and provider-testing-guide.md to document pre-start event drop behavior.
## [0.1.20] - 2026-02-21
**Release:** <https://crates.io/crates/duroxide/0.1.20>
### Changed (Breaking for Provider Implementors)
- **Custom status as history events** — `set_custom_status()` and `reset_custom_status()` now emit
`CustomStatusUpdated` history events instead of writing to `ExecutionMetadata.custom_status`.
Custom status is now fully durable, replayable, and deterministic across turns.
- Removed `CustomStatusUpdate` enum from providers
- Removed `ExecutionMetadata.custom_status` field
- Provider `ack_orchestration_item()` must now scan `history_delta` for the last `CustomStatusUpdated`
event and apply it to the instances table (see provider-implementation-guide.md)
- `initial_custom_status` field added to `OrchestrationStarted` event and `ContinueAsNew` work item
for carry-forward across continue-as-new boundaries
### Added
- **`get_custom_status()` on OrchestrationContext** — Read the current custom status value,
reflecting all `set_custom_status` / `reset_custom_status` calls across turns and CAN boundaries
- **`short_poll_threshold()` on ProviderFactory** — Configurable timing for short polling
validation tests; remote-database providers can override with higher values (closes #51)
- **`test_orphan_activity_after_instance_force_deletion`** — Provider validation test verifying
graceful handling of activities orphaned by instance force-deletion (closes #37)
### Fixed
- `test_cancelling_nonexistent_activities_is_idempotent` now uses `execution_id: 1` instead of `99`
to correctly validate same-execution cancellation semantics (closes #40)
- Removed dead `ActivityContext::new` constructor (unused, runtime uses `new_with_cancellation`)
- Removed unused `clippy::clone_on_ref_ptr` suppression from observability.rs (closes #48)
## [0.1.19] - 2026-02-20
**Release:** <https://crates.io/crates/duroxide/0.1.19>
**Proposal:** [Custom Status Progress](https://github.com/microsoft/duroxide/blob/main/docs/proposals/custom-status-progress.md)
**Proposal:** [External Event Semantics](https://github.com/microsoft/duroxide/blob/main/docs/proposals/external-event-semantics.md)
**Proposal:** [Persistent Event Queuing](https://github.com/microsoft/duroxide/blob/main/docs/proposals/persistent-event-queuing.md)
### Added
- **Event Queue API** — Persistent FIFO event queues that survive `continue_as_new`
- `ctx.dequeue_event(queue)` and `ctx.dequeue_event_typed::<T>(queue)` for orchestrations
- `client.enqueue_event(instance, queue, data)` and `client.enqueue_event_typed::<T>()` for clients
- FIFO ordering with buffering — messages can arrive before orchestration subscribes
- Queue messages carry forward across continue-as-new boundaries
- **Custom Status** — Orchestration progress reporting visible to external clients
- `ctx.set_custom_status(json)` publishes structured progress from orchestrations
- `client.wait_for_status_change(instance, version, poll, timeout)` for efficient polling
- Status persists across continue-as-new boundaries
- New Provider trait methods: `set_custom_status()`, `get_custom_status()`
- Schema migration `20240108000000_add_custom_status.sql`
- **Retry on Session** — Combine retry policies with session affinity (closes #56)
- `ctx.schedule_activity_with_retry_on_session(name, input, policy, session_id)`
- `ctx.schedule_activity_with_retry_on_session_typed::<In, Out>()`
- All retry attempts pinned to the same worker session
- **Typed event helpers** — `client.raise_event_typed::<T>()` and `client.enqueue_event_typed::<T>()`
- **Provider validation** — `test_prune_bulk_includes_running_instances` catches providers that
exclude Running instances from bulk prune (closes #50)
- **Scenario test** — Copilot Chat pattern: multi-turn chat using dequeue_event + set_custom_status + CAN
### Changed
- Renamed persistent event internals: `ExternalRaisedPersistent` → `QueueMessage`, `ExternalSubscribedPersistent` → `QueueSubscribed`
### Deprecated
- `client.raise_event_persistent()` — use `client.enqueue_event()` instead
- `ctx.schedule_wait_persistent()` — use `ctx.dequeue_event()` instead
## [0.1.18] - 2026-02-16
**Release:** <https://crates.io/crates/duroxide/0.1.18>
**Proposal:** [Activity Implicit Sessions v2](https://github.com/microsoft/duroxide/blob/main/docs/proposals-impl/activity-implicit-sessions-v2.md)
### Added
- **Activity Session Affinity** — Route activities to the same worker for in-memory state reuse
- `ctx.schedule_activity_on_session(name, input, session_id)` pins activities by session ID
- `ctx.schedule_activity_on_session_typed()` for serde-based typed inputs/outputs
- `ActivityContext::session_id()` getter for process-local state lookup
- Two-timeout model: `session_lock_timeout` (heartbeat lease) + `session_idle_timeout` (inactivity expiry)
- Automatic session lifecycle: implicit creation, heartbeat renewal, idle unpin, crash recovery
- `SessionTracker` enforces `max_sessions_per_runtime` across all worker slots via RAII guards
- `worker_node_id` option for stable session identity across restarts (e.g., K8s StatefulSet pods)
- Session manager background task for lock renewal and orphan cleanup
- **Provider API changes (required)**
- `fetch_work_item()` gains `session: Option<&SessionFetchConfig>` parameter for session routing
- New `renew_session_lock()` method — batched heartbeat for owned non-idle sessions
- New `cleanup_orphaned_sessions()` method — sweep expired session rows with no pending work
- `ack_work_item()` and `renew_work_item_lock()` piggyback `last_activity_at` updates (guarded by `locked_until`)
- **New `RuntimeOptions` fields**
- `session_lock_timeout` (default 30s), `session_lock_renewal_buffer` (default 5s)
- `session_idle_timeout` (default 5min), `session_cleanup_interval` (default 5min)
- `max_sessions_per_runtime` (default 10), `worker_node_id` (default None)
- **33 provider validation tests** for session routing, locks, races, and cross-concern interactions
- **22 E2E tests** for single/multi-worker, fan-out, CAN, heterogeneous scenarios
- **Schema migration** `20240107000000_add_sessions.sql` — sessions table + worker_queue.session_id
### Changed
- `session_id: Option<String>` added to `Action::CallActivity`, `EventKind::ActivityScheduled`, and `WorkItem::ActivityExecute` (backward compatible via `serde(default)`)
- Existing `schedule_activity` calls are completely unaffected (`session_id = None`)
- Documentation updated across ORCHESTRATION-GUIDE, provider-implementation-guide, provider-testing-guide, README, and migration-guide
## [0.1.17] - 2026-02-09
**Release:** <https://crates.io/crates/duroxide/0.1.17>
**Proposal:** [Provider Capability Filtering](https://github.com/microsoft/duroxide/blob/main/docs/proposals-impl/provider-capability-filtering.md)
### Added
- **Provider Capability Filtering (Phase 1)** — Safe rolling upgrades in mixed-version clusters
- Orchestration dispatcher passes a version filter to the provider so it only returns
executions whose pinned `duroxide_version` falls within the runtime's supported range
- SQL-level filtering applied before lock acquisition and history deserialization
- NULL pinned version treated as always compatible (backward compat with pre-migration data)
- `RuntimeOptions::supported_replay_versions` for custom version range configuration
- Defense-in-depth: runtime-side compatibility check after fetch with 1-second abandon delay
- Startup log declaring supported version range; warning log on incompatible-version abandon
- **New types:** `SemverVersion`, `SemverRange`, `DispatcherCapabilityFilter`, `current_build_version()`
- **Provider API change:** `fetch_orchestration_item()` gains `filter: Option<&DispatcherCapabilityFilter>` parameter
- **History deserialization contract** — Providers must surface deserialization errors (not silently drop events)
- `history_error` field on fetched items for deserialization failures
- Transaction commits lock + attempt_count before returning errors (enables poison path)
- **ProviderFactory test helpers** — `corrupt_instance_history()` and `get_max_attempt_count()`
optional methods for provider-agnostic deserialization contract tests
- **38 new tests** — 20 provider validation + 18 e2e scenario tests covering filtering,
rolling deployment routing, metadata/migration, ContinueAsNew isolation, drain procedures,
and observability
### Changed
- **Migration:** `20240106000000_add_pinned_version.sql` adds `duroxide_version_major/minor/patch`
columns to `executions` table
- Provider validation test total: 114 tests (up from 94)
## [0.1.16] - 2026-02-02
**Release:** <https://crates.io/crates/duroxide/0.1.16>
**Proposal:** [Persist Cancellation Decisions in History](https://github.com/microsoft/duroxide/blob/main/docs/proposals-impl/history-cancellation-events.md)
### Added
- **Cancellation History Events** - Record cancellation decisions as durable history breadcrumbs
- New `ActivityCancelRequested` and `SubOrchestrationCancelRequested` event kinds
- Dropped futures (select losers, terminal cleanup) now recorded in history
- Enables observability: history answers "was this cancelled?"
- Enables replay determinism: detect when cancellation decisions differ on replay
- Idempotent: side-channel cancellations only emitted once per decision
- **Nondeterminism Tests for Completion Validation** - 12 new tests covering:
- Completion kind mismatches (timer/activity/sub-orchestration cross-checks)
- Duplicate completion detection for closed schedules
- OrchestrationChained mismatch validation
### Fixed
- **Duplicate Completion Detection** - Fixed oversight where `open_schedules.remove()` was never called after delivering completions in replay engine
- Previously, duplicate completions in history were silently accepted
- Now properly triggers nondeterminism error on duplicate completions
### Changed
- **Housekeeping** - Moved implemented/rejected proposals to `docs/proposals-impl/`:
- `metrics-facade-migration.md` (implemented)
- `replay-simplification-PROGRESS.md` (completed)
- `activity-cancellation-queue-flag.md` (rejected/superseded by lock-stealing)
---
## [0.1.15] - 2026-01-30
**Release:** <https://crates.io/crates/duroxide/0.1.15>
### Changed
- **Simplified Metrics Facade** - Internal observability uses consistent atomic counters with a cleaner facade pattern
### Added
- **Code Coverage Improvements** - Test coverage improved to 91.9% with better organization
- New provider validation tests for error handling, management interface, and observability
- Removed duplicate tests that overlapped with provider validation suite
- **Code Coverage Guide** - New `docs/code-coverage-guide.md` with llvm-cov setup instructions
- **Copilot Skill for Coverage** - AI assistant skill for code coverage workflows
### Fixed
- **README Markdown Formatting** - Fixed section heading syntax for better rendering
---
## [0.1.14] - 2026-01-24
**Release:** <https://crates.io/crates/duroxide/0.1.14>
### Fixed
- **Fire-and-Forget Orchestrations Now Record History Events** - `ctx.schedule_orchestration()` (detached/chained orchestrations) now correctly creates `OrchestrationChained` events in history
- Previously, these fire-and-forget calls were not recorded, breaking determinism detection on replay
- If an orchestration scheduled a detached orchestration followed by an activity, replay would fail with nondeterminism error
- Added proper action-to-event conversion and event matching in replay engine
### Added
- **Action-to-Event Recording Tests** - New test module `tests/replay_engine/action_to_event.rs` verifying all scheduling actions create corresponding history events
- **E2E Test for Detached + Activity Pattern** - `sample_detached_then_activity_fs` validates the fix end-to-end
---
## [0.1.13] - 2026-01-24 [YANKED]
**Release:** <https://crates.io/crates/duroxide/0.1.13>
**Proposal:** [System Calls as Real Activities](https://github.com/microsoft/duroxide/blob/main/docs/proposals-impl/system-calls-as-activities.md)
### Changed
- **System Calls Reimplemented as Regular Activities** - `ctx.new_guid()` and `ctx.utc_now()` now use normal activity infrastructure
- Simplifies replay engine by removing special-case SystemCall handling
- Fixes determinism bugs where syscalls returned fresh values on replay
- Reserved activity prefix `__duroxide_syscall:` prevents user collisions
- Builtin activities injected automatically at runtime startup
- **API Rename: `utcnow()` → `utc_now()`** - Consistent with Rust naming conventions
### Added
- **Reserved Activity Prefix Validation** - `ActivityRegistry` rejects names starting with `__duroxide_syscall:`
- **Comprehensive Syscall Tests** - Replay determinism, ordering, single-thread mode, cancellation
### Removed
- `SystemCall` variants from `Action`, `EventKind`, `CompletionResult`
- SystemCall handling from replay engine (no more re-poll loop)
- `EVENT_TYPE_SYSTEM_CALL` from sqlite provider
### Documentation
- Updated ORCHESTRATION-GUIDE and durable-futures-internals for new syscall semantics
- Reorganized proposals: moved 11 implemented proposals to `docs/proposals-impl/`
- Updated merge prompt to require squash-only merges
### Breaking Changes
- `utcnow()` renamed to `utc_now()` - update all call sites
- Histories containing `SystemCall` events will not replay (pre-1.0, acceptable)
---
## [0.1.12] - 2026-01-23
**Release:** <https://crates.io/crates/duroxide/0.1.12>
### Added
- **Unobserved Future Cancellation** - Futures that are scheduled but never awaited are now properly cancelled
- New `DurableFuture` implementation with proper drop semantics
- Cancellation events recorded in history for deterministic replay
- Comprehensive test coverage in `tests/replay_engine/unobserved_futures.rs`
- **AI Skills System** - New `docs/skills/` folder for AI coding assistant context
- Installation instructions for VS Code Copilot, Claude Code, and Cursor
- `duroxide-provider-implementation` skill for provider developers
- **Provider Validation** - New cancellation validation tests
- `test_activity_cancellation_via_lock_stealing`
- Additional lock stealing edge case tests
### Changed
- **Major Documentation Refactor**
- Rewrote `provider-implementation-guide.md` with better structure and pedagogy
- Rewrote `architecture.md` with cleaner ASCII diagrams (removed mermaid)
- Merged `replay-engine.md` into `durable-futures-internals.md`
- Added long-polling vs short-polling explanation
- Added Performance Considerations and ProviderAdmin sections
- **Simplified ActivityRegistry API** - Now takes value instead of Arc
- **Improved Dispatcher Backoff Logic** - Better stale activity handling
### Fixed
- **Polling Model Documentation** - Corrected to multi-poll (not single-poll) model
### Code Statistics (vs v0.1.11)
| Core (src/) | 15 | +2,355 | -1,999 | +356 |
| Tests (tests/) | 63 | +7,616 | -3,353 | +4,263 |
| Docs (docs/) | 43 | +17,477 | -4,004 | +13,473 |
| **Total** | 134 | +17,459 | -9,536 | **+7,923** |
---
## [0.1.11] - 2026-01-07
**Release:** <https://crates.io/crates/duroxide/0.1.11>
### Fixed
- **Issue #49: WorkItemReader version extraction during completion-only replay**
Fixed a bug where `WorkItemReader` did not extract all fields from history during
completion-only replay (when no `Start` or `ContinueAsNew` item is present in the
work item batch). This caused nondeterminism errors when:
- A versioned orchestration was started
- The runtime restarted (or lock expired) mid-execution
- Activity completion arrived without a start item
- The runtime incorrectly used the `Latest` version policy instead of the
version recorded in history
The fix extracts all tuple fields (`orchestration_name`, `input`, `version`,
`parent_instance`, `parent_id`) from `HistoryManager` during completion-only replay,
ensuring deterministic handler resolution.
### Added
- **New scenario tests** for issue #49 regression prevention:
- `e2e_replay_completion_only_must_use_version_from_history` - First execution replay
- `e2e_replay_completion_only_after_can_must_use_version_from_history` - Nth execution (after CAN) replay
- Unit tests verifying all `WorkItemReader` tuple fields are correctly extracted
## [0.1.10] - 2026-01-06
**Release:** <https://crates.io/crates/duroxide/0.1.10>
### Added
- **Rolling Deployment Support** - Exponential backoff for unregistered handlers
Unregistered orchestrations and activities now use exponential backoff instead of
immediate failure, enabling graceful rolling deployments in multi-node clusters:
- Messages abandoned with backoff (1s → 2s → 4s → ... up to 60s max)
- Bounce between nodes until one with the handler registered picks it up
- Eventually fail as `ErrorDetails::Poison` if handler never becomes available
- Configurable via `UnregisteredBackoffConfig` (defaults: 1s base, 60s max, 6 exponent cap)
- **New scenario tests** for rolling deployments
- `e2e_rolling_deployment_new_activity` - Multi-node deployment with new activity
- `e2e_rolling_deployment_version_upgrade` - Version upgrade via continue-as-new
- **Consolidated unregistered handler tests** in `tests/unregistered_backoff_tests.rs`
- `unknown_version_fails_with_poison` - Version mismatch handling
- `continue_as_new_to_missing_version_fails_with_poison` - CAN to missing version
- `delete_poisoned_orchestration` - Cleanup after poison
- Plus existing backoff behavior tests
### Changed
- **BREAKING:** `ConfigErrorKind::MissingVersion` removed - unregistered handlers now use backoff/poison path
- `config_error` metric now only tracks nondeterminism (unregistered handlers result in `poison`)
- Updated `docs/metrics-specification.md` with new error type behaviors
- Updated `docs/ORCHESTRATION-GUIDE.md` error handling section
### Removed
- `tests/unknown_activity_tests.rs` - consolidated into `unregistered_backoff_tests.rs`
- `tests/unknown_orchestration_tests.rs` - consolidated into `unregistered_backoff_tests.rs`
## [0.1.9] - 2026-01-05
**Release:** <https://crates.io/crates/duroxide/0.1.9>
### Added
- **Management API for Instance Deletion and Pruning** - Comprehensive instance lifecycle management
**Client API:**
- `delete_instance(id, force)` - Delete single instance with cascading
- `delete_instance_bulk(filter)` - Bulk delete with filters (IDs, timestamp, limit)
- `prune_executions(id, options)` - Prune old executions from long-running instances
- `prune_executions_bulk(filter, options)` - Bulk prune across multiple instances
- `get_instance_tree(id)` - Inspect instance hierarchy before deletion
**Provider API (ProviderAdmin trait):**
- `delete_instance(id, force)` - Provider-level single deletion
- `delete_instance_bulk(filter)` - Provider-level bulk deletion
- `delete_instances_atomic(ids)` - Atomic batch deletion for cascading
- `prune_executions(id, options)` - Provider-level pruning
- `prune_executions_bulk(filter, options)` - Provider-level bulk pruning
- `get_instance_tree(id)` - Provider-level tree traversal
- `list_children(id)` - List direct child sub-orchestrations
- `get_parent_id(id)` - Get parent instance ID
**Safety Guarantees:**
- Running instances protected (skip or error based on API)
- Current execution never pruned
- Sub-orchestrations cannot be deleted directly (must delete root)
- Atomic cascading deletes (all-or-nothing)
- Force delete available for stuck instances
- **102 new provider validation tests** - Deletion, bulk deletion, pruning, cascading deletes, filter combinations, safety tests
### Changed
- Provider implementation guide with deletion/pruning contracts
- Provider testing guide updates
- Continue-as-new docs with pruning section
- README instance management section
- Enhanced management-api-deletion proposal with force delete semantics
## [0.1.8] - 2026-01-02
**Release:** <https://crates.io/crates/duroxide/0.1.8>
### Added
- **Lock-stealing activity cancellation** - New mechanism for cancelling in-flight activities
- Activities are cancelled by deleting their worker queue entries ("lock stealing")
- Workers detect cancellation when lock renewal fails (entry missing)
- More efficient than polling execution state on every renewal
- Enables batch cancellation of multiple activities atomically
- **`ScheduledActivityIdentifier`** - New struct for identifying activities in worker queue
- Fields: `instance` (String), `execution_id` (u64), `activity_id` (u64)
- Used by `ack_orchestration_item` to specify activities to cancel
- Exported from `duroxide::providers`
- **Provider validation tests for lock-stealing** - 5 new tests
- `test_cancelled_activities_deleted_from_worker_queue` - Verify deletion during ack
- `test_ack_work_item_fails_when_entry_deleted` - Verify permanent error on stolen lock
- `test_renew_fails_when_entry_deleted` - Verify renewal fails on stolen lock
- `test_cancelling_nonexistent_activities_is_idempotent` - Verify no error for missing entries
- `test_batch_cancellation_deletes_multiple_activities` - Verify batch deletion
- **Worker queue activity identity columns** - Store activity identity for cancellation
- New migration: `20240104000000_add_worker_activity_identity.sql`
- SQLite provider stores `instance_id`, `execution_id`, `activity_id` on ActivityExecute items
### Changed
- **BREAKING:** `Provider::ack_orchestration_item` signature changed
- Added 7th parameter: `cancelled_activities: Vec<ScheduledActivityIdentifier>`
- Provider must delete matching worker queue entries atomically in same transaction
- **BREAKING:** `Provider::fetch_work_item` return type simplified
- Changed from `(WorkItem, String, u32, ExecutionState)` to `(WorkItem, String, u32)`
- Removed `ExecutionState` - cancellation detected via lock renewal failure instead
- **BREAKING:** `Provider::renew_work_item_lock` return type changed
- Changed from `Result<ExecutionState, ProviderError>` to `Result<(), ProviderError>`
- Failure indicates lock was stolen (activity cancelled) or expired
- **BREAKING:** `Provider::ack_work_item` must fail when entry missing
- Returns permanent error if work item entry was deleted (lock stolen)
- Signals to worker that activity was cancelled
- Provider validation test count: 80 tests (up from 75)
### Removed
- **`ExecutionState` enum removed from Provider API** - No longer needed
- Was used for state-polling cancellation approach
- Lock-stealing provides more efficient cancellation mechanism
- Provider validation tests for ExecutionState still exist (legacy support during migration)
### Migration Guide
**Provider implementers - Required changes:**
1. Update `ack_orchestration_item` signature:
```rust
async fn ack_orchestration_item(
&self,
lock_token: &str,
execution_id: u64,
history_delta: Vec<Event>,
worker_items: Vec<WorkItem>,
orchestrator_items: Vec<WorkItem>,
metadata: ExecutionMetadata,
cancelled_activities: Vec<ScheduledActivityIdentifier>, ) -> Result<(), ProviderError>;
```
2. Update `fetch_work_item` return type:
```rust
async fn fetch_work_item(...) -> Result<Option<(WorkItem, String, u32)>, ProviderError>;
```
3. Update `renew_work_item_lock` return type:
```rust
async fn renew_work_item_lock(...) -> Result<(), ProviderError>;
```
4. Update `ack_work_item` to fail on missing entry:
```rust
if rows_affected == 0 {
return Err(ProviderError::permanent("ack_work_item", "Entry not found (lock stolen)"));
}
```
5. Store activity identity on worker queue entries:
```sql
ALTER TABLE worker_queue ADD COLUMN instance_id TEXT;
ALTER TABLE worker_queue ADD COLUMN execution_id INTEGER;
ALTER TABLE worker_queue ADD COLUMN activity_id INTEGER;
CREATE INDEX idx_worker_queue_activity ON worker_queue(instance_id, execution_id, activity_id);
```
6. Implement batch deletion in `ack_orchestration_item`:
```rust
for activity in cancelled_activities {
DELETE FROM worker_queue
WHERE instance_id = activity.instance
AND execution_id = activity.execution_id
AND activity_id = activity.activity_id;
}
```
## [0.1.7] - 2025-12-28
**Release:** <https://crates.io/crates/duroxide/0.1.7>
### Added
- **Cooperative activity cancellation** - Activities can detect when their parent orchestration has been cancelled or completed
- `ActivityContext` now provides cancellation awareness via `is_cancelled()` and `cancelled()` methods
- Activities can cooperatively respond to cancellation by checking the cancellation token
- Use `tokio::select!` with `ctx.cancelled()` for responsive cancellation in async activities
- Configurable grace period before forced activity termination
- **ExecutionState enum** - Providers now report orchestration state with activity work items
- `ExecutionState::Running` - Orchestration is active, activity should proceed
- `ExecutionState::Terminal { status }` - Orchestration completed/failed/continued, activity result won't be observed
- `ExecutionState::Missing` - Orchestration instance deleted, activity should abort
- **Provider validation tests for cancellation** - 13 new tests in `provider_validation::cancellation`
- Verifies `ExecutionState` is correctly returned by `fetch_work_item` and `renew_work_item_lock`
- Tests for Running, Terminal (Completed/Failed/ContinuedAsNew), and Missing states
- Tests for state transitions during activity execution
- **Single-threaded runtime support** - Full compatibility with `tokio::runtime::Builder::new_current_thread()`
- Essential for embedding in single-threaded environments (e.g., pgrx PostgreSQL extensions)
- New scenario tests in `tests/scenarios/single_thread.rs`
- Use `RuntimeOptions { orchestration_concurrency: 1, worker_concurrency: 1, .. }` for 1x1 mode
- **Configurable wait timeout for stress tests** - `StressTestConfig::wait_timeout_secs` field
- Default: 60 seconds
- Increase for high-latency remote database providers
- Uses `#[serde(default)]` for backward compatibility with existing configs
### Changed
- **BREAKING:** `Provider::fetch_work_item` now returns 4-tuple: `(WorkItem, String, u32, ExecutionState)`
- Added `ExecutionState` as fourth element to report parent orchestration state
- Required for activity cancellation support
- **BREAKING:** `Provider::renew_work_item_lock` now returns `ExecutionState` instead of `()`
- Allows runtime to detect orchestration state changes during long-running activities
- Triggers cancellation token when orchestration becomes terminal
- Provider validation test count increased from 62 to 75
- Documentation updates:
- Added "Runtime Polling Configuration" section to provider-implementation-guide
- Default polling interval (10ms) is aggressive; configure for remote/cloud providers
- Updated provider-testing-guide with new test count and wait_timeout_secs examples
### Fixed
- **test_worker_lock_renewal_extends_timeout** - Fixed timing sensitivity (GitHub #34)
- Test now creates proper orchestration with Running status before testing renewal
- Uses 0.6x pre-renewal wait + 0.4x post-renewal wait for reliable timing
- **test_multi_threaded_lock_expiration_recovery** - Fixed race condition (GitHub #32)
- Uses `tokio::sync::Barrier` to synchronize thread start times
- Eliminates false failures from connection pool cold-start latency
### Migration Guide
**Provider implementers:**
```rust
// fetch_work_item now returns ExecutionState
async fn fetch_work_item(
&self,
lock_timeout: Duration,
poll_timeout: Duration,
) -> Result<Option<(WorkItem, String, u32, ExecutionState)>, ProviderError>;
// renew_work_item_lock now returns ExecutionState
async fn renew_work_item_lock(
&self,
token: &str,
extend_for: Duration,
) -> Result<ExecutionState, ProviderError>;
```
**Determining ExecutionState:**
```rust
// Query the execution status for the work item's instance/execution_id
let state = match (instance_exists, execution_status) {
(false, _) => ExecutionState::Missing,
(true, None) => ExecutionState::Missing,
(true, Some(status)) if status == "Running" => ExecutionState::Running,
(true, Some(status)) => ExecutionState::Terminal { status },
};
```
**Activity authors (using cancellation):**
```rust
// Check cancellation periodically
if ctx.is_cancelled() {
return Err("Cancelled".into());
}
process(item).await;
}
Ok("done".into())
});
// Or use select! for responsive cancellation
result = do_work(input) => result,
_ = ctx.cancelled() => Err("Cancelled".into()),
}
});
```
## [0.1.6] - 2025-12-21
**Release:** <https://crates.io/crates/duroxide/0.1.6>
### Added
- **Large payload stress test** - New memory-intensive stress test scenario
- Tests large event payloads (10KB, 50KB, 100KB) and longer histories (~80-100 events)
- New binary: `large-payload-stress` for running the test standalone
- Uses the same `ProviderStressFactory` trait as parallel orchestrations test
- Configurable payload sizes and activity/sub-orchestration counts
- See `docs/provider-testing-guide.md` for usage
- **Stress test monitoring** - Resource usage tracking in `run-stress-tests.sh`
- Peak RSS (Resident Set Size) measurement
- Average CPU usage tracking
- Sampling every 500ms during test execution
- New documentation: `STRESS_TEST_MONITORING.md`
- Supports `--parallel-only` and `--large-payload` flags
### Changed
- **Memory optimization** - Reduced allocations in history processing
- Added `HistoryManager::full_history_len()` - get count without allocation
- Added `HistoryManager::is_full_history_empty()` - check emptiness without allocation
- Added `HistoryManager::full_history_iter()` - iterate without allocation
- Updated runtime to use efficient methods in hot paths
- Improved child cancellation to use iterator instead of collecting full history
- **Orchestration naming** - Renamed "FanoutWorkflow" to "FanoutOrchestration" for consistency
### Fixed
- Child sub-orchestration cancellation now uses iterator-based approach for better memory efficiency
## [0.1.5] - 2025-12-18
**Release:** <https://crates.io/crates/duroxide/0.1.5>
### Added
- **Provider identity API** - Providers now expose `name()` and `version()` methods
- `Provider::name()` returns provider name (e.g., "sqlite")
- `Provider::version()` returns provider version
- Default implementations return "unknown" and "0.0.0"
- SQLite provider returns "sqlite" and the crate version
- **Runtime startup banner** - Version information logged on startup
- Logs duroxide version and provider name/version
- Example: `duroxide runtime (0.1.4) starting with provider sqlite (0.1.4)`
- **Worker queue visibility control** - Worker queue now uses `visible_at` for delayed visibility
- Added `visible_at` column to worker_queue (matches orchestrator queue pattern)
- `abandon_work_item` with delay now sets `visible_at` instead of keeping `locked_until`
- Cleaner semantics: `visible_at` controls when item becomes visible, `locked_until` only for lock expiry
- Migration file included for existing databases
- **New provider validation tests** - 2 additional queue semantics tests
- `test_worker_item_immediate_visibility` - Verify newly enqueued items are immediately visible
- `test_worker_delayed_visibility_skips_future_items` - Verify items with future visible_at are skipped
### Changed
- **Reduced default `dispatcher_long_poll_timeout`** from 5 minutes to 30 seconds
- More responsive shutdown behavior
- Better suited for typical workloads
## [0.1.3] - 2025-12-14
### Added
- **Provider validation tests** - 4 new tests for abandon and poison handling
- `test_abandon_work_item_releases_lock` - Verify abandon_work_item releases lock immediately
- `test_abandon_work_item_with_delay` - Verify abandon_work_item with delay defers refetch
- `max_attempt_count_across_message_batch` - Verify MAX attempt_count returned for batched messages
### Changed
- Provider validation test count increased from 58 to 62
### Fixed
- `abandon_work_item` with delay now correctly keeps lock_token to prevent immediate refetch
## [0.1.2] - 2025-12-14
### Added
- **Poison message handling** - Automatic detection and failure of messages that exceed `max_attempts` (default: 10)
- `RuntimeOptions::max_attempts` configuration option
- `ErrorDetails::Poison` variant with detailed context
- `PoisonMessageType` enum distinguishing orchestration vs activity poison
- Dedicated metrics: `duroxide_orchestration_poison_total`, `duroxide_activity_poison_total`
- **Lock renewal for orchestrations** - Prevents lock expiration during long orchestration turns
- `Provider::renew_orchestration_item_lock()` method
- `RuntimeOptions::orchestrator_lock_renewal_buffer` configuration (default: 2s)
- Automatic background renewal task in orchestration dispatcher
- **Work item abandon with retry** - Explicit lock release for failed activities
- `Provider::abandon_work_item()` method with optional delay
- Called automatically when `ack_work_item` fails
- **Attempt count management** - `ignore_attempt` parameter for abandon methods
- `abandon_work_item(..., ignore_attempt: bool)` - decrement count on transient failures
- `abandon_orchestration_item(..., ignore_attempt: bool)` - same for orchestrations
- Prevents false poison detection from infrastructure errors
- **Provider validation tests** - 8 new poison message tests
- `orchestration_attempt_count_starts_at_one`
- `orchestration_attempt_count_increments_on_refetch`
- `worker_attempt_count_starts_at_one`
- `worker_attempt_count_increments_on_lock_expiry`
- `attempt_count_is_per_message`
- `abandon_work_item_ignore_attempt_decrements`
- `abandon_orchestration_item_ignore_attempt_decrements`
- `ignore_attempt_never_goes_negative`
### Changed
- **BREAKING:** SQLite provider is now optional - enable with `features = ["sqlite"]`
- **BREAKING:** `Provider::fetch_work_item` now returns `(WorkItem, String, u32)` tuple (added `attempt_count`)
- **BREAKING:** `Provider::fetch_orchestration_item` now returns `(OrchestrationItem, String, u32)` tuple (added `attempt_count`)
- **BREAKING:** `Provider::abandon_work_item` now requires `ignore_attempt: bool` parameter
- **BREAKING:** `Provider::abandon_orchestration_item` now requires `ignore_attempt: bool` parameter
- `OrchestrationItem` struct no longer contains `lock_token` (moved to return tuple)
- Provider validation test count increased from 50 to 58
### Migration Guide
**Cargo.toml (if using SQLite provider):**
```toml
# Before
duroxide = "0.1.1"
# After - SQLite now requires explicit feature
duroxide = { version = "0.1.2", features = ["sqlite"] }
```
**Provider implementers:**
```rust
// fetch_work_item now returns attempt_count
async fn fetch_work_item(...) -> Result<Option<(WorkItem, String, u32)>, ProviderError>;
// fetch_orchestration_item now returns attempt_count
async fn fetch_orchestration_item(...) -> Result<Option<(OrchestrationItem, String, u32)>, ProviderError>;
// abandon methods now have ignore_attempt parameter
async fn abandon_work_item(&self, token: &str, delay: Option<Duration>, ignore_attempt: bool) -> Result<(), ProviderError>;
async fn abandon_orchestration_item(&self, token: &str, delay: Option<Duration>, ignore_attempt: bool) -> Result<(), ProviderError>;
// New method for orchestration lock renewal
async fn renew_orchestration_item_lock(&self, token: &str, extend_for: Duration) -> Result<(), ProviderError>;
```
**Runtime users:**
```rust
RuntimeOptions {
max_attempts: 10, // NEW - poison threshold
orchestrator_lock_renewal_buffer: Duration::from_secs(2), // NEW
..Default::default()
}
```
## [0.1.1] - 2025-12-10
### Added
- **Long polling support** - Providers can now block waiting for work, reducing CPU usage and latency
- `dispatcher_long_poll_timeout` configuration option (default: 5 minutes)
- `poll_timeout: Duration` parameter to `Provider::fetch_orchestration_item` and `Provider::fetch_work_item`
- Long polling validation tests in `duroxide::provider_validations::long_polling`
### Changed
- **BREAKING:** `Provider::fetch_orchestration_item` now requires `poll_timeout: Duration` parameter
- **BREAKING:** `Provider::fetch_work_item` now requires `poll_timeout: Duration` parameter
- **BREAKING:** `RuntimeOptions::dispatcher_idle_sleep` renamed to `dispatcher_min_poll_interval`
- **BREAKING:** `continue_as_new()` now returns an awaitable future (use `return ctx.continue_as_new(input).await`)
### Migration Guide
**Provider implementers:**
```rust
// Add poll_timeout parameter to both fetch methods
async fn fetch_orchestration_item(
&self,
lock_timeout: Duration,
poll_timeout: Duration, // NEW - ignore for short-polling, block for long-polling
) -> Result<Option<OrchestrationItem>, ProviderError>;
```
**Runtime users:**
```rust
// Rename dispatcher_idle_sleep to dispatcher_min_poll_interval
RuntimeOptions {
dispatcher_min_poll_interval: Duration::from_millis(100),
dispatcher_long_poll_timeout: Duration::from_secs(300), // NEW
..Default::default()
}
```
**Orchestration authors using continue_as_new:**
```rust
// Before: ctx.continue_as_new(input);
// After:
return ctx.continue_as_new(input).await;
```
## [0.1.0] - 2025-12-01
### Added
- Initial release
- Deterministic orchestration execution with replay
- Activity scheduling with automatic retries
- Timer support (create_timer)
- Sub-orchestration support
- External event handling
- Continue-as-new for long-running workflows
- SQLite provider implementation
- OpenTelemetry metrics and structured logging
- Provider validation test suite
- Comprehensive documentation