duroxide 0.1.19

Durable code execution framework for Rust
Documentation
# Session Progress: Activity Sessions Feature

**Date:** 2026-02-16  
**Branch:** (current working branch)

---

## What We're Doing

Implementing **activity session affinity** — routing activities with the same `session_id` to the same worker process. Sessions are a pure routing/affinity mechanism (analogous to network flow affinity), not a transactional or ordering construct.

**Spec:** [docs/proposals/activity-implicit-sessions-v2.md](docs/proposals/activity-implicit-sessions-v2.md)

**Key design decisions made in this session:**
- Sessions are independent of activity locks (flow table analogy: expired flow entry doesn't drop in-flight packets)
- `worker_node_id` → process-level session identity (all slots share one owner, eliminating head-of-line blocking)
- `max_sessions_per_runtime` enforced runtime-side via `SessionTracker` (not provider-side)
- Provider `fetch_work_item` takes `Option<&SessionFetchConfig>``None` means non-session items only
- `renew_session_lock` takes `&[&str]` for batched renewal
- `ack_work_item` checks `locked_until > now` (rejects expired activity locks)
- Piggyback `last_activity_at` updates guarded by `locked_until > now` (best-effort ownership check)

---

## Completed Work

### Provider Trait Changes
- [x] Added `SessionFetchConfig` struct (`owner_id`, `lock_timeout`)
- [x] Changed `fetch_work_item` signature: 5 params → 3 params with `Option<&SessionFetchConfig>`
- [x] Changed `renew_session_lock` signature: `&str``&[&str]` (batched)
- [x] Renamed `worker_id``owner_id` across session methods
- [x] Removed `max_sessions` from provider (deferred to runtime)
- [x] Updated doc contract: storage-agnostic language (no "rows", "tables", "database")
- [x] Added `cleanup_orphaned_sessions` doc: convenience hook, providers can return 0

### SQLite Provider Implementation
- [x] LEFT JOIN fetch query (replaces nested NOT IN subqueries)
- [x] Session upsert with `rows_affected() == 0` ownership check
- [x] `DELETE ... RETURNING session_id` in `ack_work_item` (one query instead of two)
- [x] `UPDATE ... RETURNING session_id` in `renew_work_item_lock` (eliminates subquery)
- [x] `ack_work_item` checks `locked_until > now` (rejects expired locks)
- [x] Piggyback `last_activity_at` guarded by `AND locked_until > ?now`
- [x] Batched `renew_session_lock` with dynamic IN clause

### Runtime (Worker Dispatcher)
- [x] Simplified worker ID derivation (single `suffix` variable, no flags)
- [x] `SessionTracker` with `HashMap<String, usize>` for distinct session counting
- [x] `SessionGuard` RAII for increment/decrement on drop
- [x] Pre-fetch capacity check (`distinct_count() >= max_sessions_per_runtime` → pass `None`)
- [x] Post-fetch TOCTOU re-check (abandon if over capacity after guard creation)
- [x] Shutdown ticker (5s) for session manager responsiveness
- [x] Session owner IDs collected during worker loop (no duplicate format construction)
- [x] `schedule_activity` and `schedule_activity_on_session` refactored to shared `schedule_activity_internal`

### Tests Added

**Provider Validation Tests (src/provider_validation/sessions.rs):** 33 tests
- Basic routing (6), lock/expiration (3), session=None/Some (2), renewal (3+), cleanup (3)
- Piggybacking with tight idle window (2), edge cases (2), process-level identity (1)
- Race conditions: concurrent claim (1), takeover (1), cleanup+recreate (1), abandon (2)
- Cross-concern locks: activity expires/session valid (1), both expire (1), session expires/activity valid (1), renewal extension (1), renew after expiry returns 0 (1), original worker reclaims (1)
- Session lock expiry: new owner gets redelivery (1), same worker reacquires (1)

**Provider Validation Tests (src/provider_validation/lock_expiration.rs):**
- Worker ack fails after lock expiry (1)
- Orchestration lock renewal after expiry (1)

**E2E Tests (tests/session_e2e_tests.rs):** 22 tests
- Basic session, context visibility, mixed, multiple sessions, typed API
- Process-level identity: completes, parallel, ephemeral, context has session_id
- Concurrency: two slots same session concurrent, ephemeral serialized
- Fan-out/fan-in: session fan-out, mixed fan-out, multiple-per-session mixed
- Continue-as-new: survives CAN, versioned upgrade
- Capacity: max_sessions enforced (parallel join), same session shares slot, blocks then unblocks
- Config: session_idle_timeout validation panic
- Multi-worker: complex orchestration (fan-out + CAN across 2 runtimes), heterogeneous config (max_sessions overflow + asymmetric lock timeouts)

**Scenario Tests (tests/scenarios/sessions.rs):** 1 test
- Durable-copilot-SDK scaled-out pattern: 2 worker runtimes, 4 concurrent conversations (2 simple + 2 with wait/timer/CAN), per-worker SessionManager isolation, session affinity verification across CAN boundaries

**Unit Tests (src/runtime/dispatchers/worker.rs):** 7 tests
- SessionTracker: empty, increment/decrement, same session = 1, different = 2, partial drop, full cleanup, mixed sequence

### Documentation Updates
- [x] Proposal spec (v2): flow analogy, updated signatures, max_sessions deferred
- [x] ORCHESTRATION-GUIDE: `schedule_activity_on_session` API reference + Session Affinity Pattern
- [x] provider-implementation-guide: session methods in trait table, schema, validation checklist
- [x] sqlite-provider-design: `session_id` column, `sessions` table, flow table design decision
- [x] architecture.md: Session Manager in component tree
- [x] execution-model.md: Session Affinity section
- [x] continue-as-new.md: sessions survive CAN note
- [x] durable-futures-internals.md: session_id flow through replay engine
- [x] provider-testing-guide: Category 16 (30 session tests)
- [x] provider-observability.md: session instrumentation table
- [x] metrics-specification.md: session metrics future note
- [x] observability-guide.md: session log event examples
- [x] migration-guide.md: backward compatibility notes
- [x] versioning-best-practices.md: rolling upgrade caveat
- [x] Old proposal (v1) marked as superseded

---

## Known Issues / Deferred

1. **`max_sessions_per_runtime` not in `SessionFetchConfig`** — capacity is enforced runtime-side via `SessionTracker`, not provider-side. Provider sees `Some` (all sessions claimable) or `None` (non-session only). Deferred in proposal.

2. **Piggyback ownership check is best-effort**`AND locked_until > ?now` on `last_activity_at` updates prevents bumping on expired-and-unclaimed sessions, but doesn't prevent bumping if another worker already reclaimed the session (their `locked_until` is in the future). Full fix would require `worker_id` in `ack_work_item`/`renew_work_item_lock` or storing `owner_id` on worker_queue rows.

3. **Multi-worker E2E tests** — Two representative multi-worker tests implemented in `session_e2e_tests.rs` (`test_multi_worker_complex_orchestration` and `test_multi_worker_heterogeneous_config`). Additional scenario coverage in `tests/scenarios/sessions.rs`.

---

## Test Counts

| Category | Count |
|----------|-------|
| Provider validation (sessions) | 33 |
| Provider validation (lock expiry) | 2 |
| E2E session tests | 22 |
| Scenario tests (sessions) | 1 |
| Unit tests (SessionTracker) | 7 |
| **Total new tests** | **65** |
| Full suite (all tests) | 800 |

---

## Files Changed (Key)

### Core
- `src/providers/mod.rs``SessionFetchConfig`, `fetch_work_item` signature, `renew_session_lock` signature, doc contract
- `src/providers/sqlite.rs` — All session implementation (LEFT JOIN, upsert, RETURNING, piggyback)
- `src/providers/instrumented.rs` — Pass-through signature updates
- `src/runtime/dispatchers/worker.rs``SessionTracker`, `SessionGuard`, capacity enforcement, session manager
- `src/runtime/mod.rs``worker_node_id` doc, `session_idle_timeout` validation
- `src/lib.rs``schedule_activity_internal` refactor, `SessionFetchConfig` re-export

### Tests
- `src/provider_validation/sessions.rs` — 33 provider validation tests
- `src/provider_validation/lock_expiration.rs` — 2 lock expiry boundary tests
- `tests/session_e2e_tests.rs` — 22 e2e tests (incl. 2 multi-worker)
- `tests/scenarios/sessions.rs` — 1 scenario test (durable-copilot-SDK scaled-out pattern)
- `tests/sqlite_provider_validations.rs` — Wiring for all new validation tests
- `tests/common/fault_injection.rs` — Updated Provider impl signatures
- `tests/cancellation_tests.rs`, `tests/long_poll_tests.rs` — Signature updates

### Docs
- `docs/proposals/activity-implicit-sessions-v2.md` — Flow analogy, signature updates, deferred items
- `docs/proposals/activity-implicit-sessions.md` — Marked superseded
- `docs/ORCHESTRATION-GUIDE.md` — Session API + pattern
- `docs/provider-implementation-guide.md` — Schema, methods, checklist
- 10+ other docs with session references added