langcontinuation 0.1.0

Continuation-passing workflow engine for durable Rust programs and AI agent systems.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
# Batch Observability Design

This document records the observability decisions for `langcontinuation` batch
execution. It is a design record, not an implementation status report.

## Philosophy

Observability is part of the durable workflow contract. The primary record
should be a durable workflow event ledger, not logs, traces, or metrics.
Logs, metrics, and traces are projections of the ledger.

The workflow snapshot remains the resumable state machine. The event ledger
lives beside the workflow as append-only history keyed by workflow run id.

The batch executor owns the strict contract: event writes are committed
transactionally with workflow state changes. Live execution can reuse the same
event vocabulary, but live observation is best-effort unless a future durable
live store is added.

The event ledger answers "what happened, in what exact per-run order, and what
caused it." Existing workflow and continuation rows answer "where is it now."

## Storage Scope

The Rust event model should be runtime-generic. The SQL table in this crate
should follow the existing batch namespace and be named `batch_workflow_events`.

Events have a different lifecycle than workflow rows. Workflow, continuation,
and provider rows can become quiescent and eligible for cleanup. Event rows are
immutable facts and should be retained or archived by an independent policy.
Do not add `quiescent` to event rows.

Update `migrations/0001_batch.sql` as the canonical fresh schema. Do not add
compatibility `ALTER TABLE` statements for databases that already ran the old
schema.

## Schema

Add workflow-level event allocation and causality metadata:

```sql
root_run_id TEXT NOT NULL,
next_event_ordinal BIGINT NOT NULL DEFAULT 0 CHECK (next_event_ordinal >= 0),
causal_cursor JSONB NOT NULL
```

`root_run_id` is storage metadata. For a top-level workflow it equals `run_id`.
For branches it is copied from the parent workflow row. It should not be added
to serialized `Workflow`.

`causal_cursor` has no SQL default because it depends on the run. Insert code
must set it explicitly.

Add continuation-level causality metadata:

```sql
causal_cursor JSONB NOT NULL
```

Continuations have their own cursor because provider and external lifecycle
events happen while the workflow row is waiting.

Add the event table:

```sql
CREATE TABLE batch_workflow_events (
    event_id UUID PRIMARY KEY,
    root_run_id TEXT NOT NULL,
    run_id TEXT NOT NULL,
    parent_run_id TEXT,
    fork_name TEXT,
    event_ordinal BIGINT NOT NULL CHECK (event_ordinal >= 0),
    caused_by JSONB NOT NULL,
    event_type TEXT NOT NULL,
    event_version SMALLINT NOT NULL CHECK (event_version > 0),
    continuation_id TEXT,
    event JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (run_id, event_ordinal)
);
```

`run_id` must always be a real workflow run id. Provider batches are referenced
objects, not subjects in `batch_workflow_events`.

Keep these out of the v1 top-level event schema:

- `workflow_status`
- `provider_batch_id`
- `attempt`
- `trace_id` or external correlation id
- `fork_path`
- `quiescent`
- top-level actor fields

`continuation_id` is a top-level nullable column because it is a common
operational correlation key.

Recommended indexes:

```sql
CREATE INDEX batch_workflow_events_root_idx
    ON batch_workflow_events (root_run_id, created_at, event_id);

CREATE INDEX batch_workflow_events_type_idx
    ON batch_workflow_events (event_type, created_at);

CREATE INDEX batch_workflow_events_continuation_idx
    ON batch_workflow_events (continuation_id)
    WHERE continuation_id IS NOT NULL;
```

The primary key covers event lookup. The unique `(run_id, event_ordinal)`
constraint covers exact per-run ordering. Do not add a caused-by index in v1.

Validate event names in Rust, not with a SQL `CHECK`. Keep the SQL event type
column general.

## Identity And Ordering

Use client-generated UUIDv7 event ids with the standard `uuid` crate:

```toml
uuid = { version = "1", features = ["serde", "v7"] }
```

Store event ids as native Postgres `UUID`, not `TEXT`. Enable SQLx UUID support
as needed by the batch feature.

Use zero-based exact per-run ordinals:

```text
run-123 #0 workflow.enqueued
run-123 #1 local_call.started
```

`event_ordinal` and `next_event_ordinal` are `BIGINT` in SQL and `i64` in
SQL-facing Rust records. `event_version` is `SMALLINT` in SQL and `i16` in
SQL-facing Rust records.

Do not create a root-wide ordinal. Avoid a root-wide logical order. Fork
branches execute independently, and a root-wide counter would add false
coordination and misleading semantics. Cross-run views can sort by
`created_at, event_id` while showing each run's exact ordinal.

## Causality

Use a mandatory top-level JSONB causal reference:

```json
{ "type": "run_id", "run_id": "run-123" }
```

or:

```json
{ "type": "event_id", "event_id": "018f..." }
```

Name this field `caused_by` on event rows and `causal_cursor` on mutable
workflow and continuation rows. Do not call it `caused_by_event_id` because it
can point to a run anchor.

`CausalRef::EventId` should include only the event UUID. The target run id and
ordinal can be fetched from the event row.

`CausalRef::RunId` is a known workflow run anchor. In batch mode, validate it
against `batch_workflows`, with same-transaction insertion allowed. Run anchors
are allowed after the first event, but should be uncommon.

Every event must be walkable:

- automatic events normally point to the current causal cursor
- custom explicit-cause events may point to a valid run or event
- events within the same atomic batch may point to earlier pending events in
  that same batch
- event-id causes that are neither in-batch nor already durable should reject
  the commit

The storage layer provides the causal cursor when executing or resuming a
workflow. The cursor is not serialized inside `Workflow`.

Branches should be caused by the parent fork event. A branch workflow row can
therefore start with a `causal_cursor` that points to an event from another run.
This is expected. The field is a causal cursor, not "last event emitted by this
run."

## Atomic Append

Durable event writes are state-transition critical. If the batch executor cannot
append the required event rows, the corresponding workflow state mutation must
not commit.

Projection to logs, metrics, traces, or external processors happens only after
durable commit and is best-effort/replayable.

Allocate ordinals with a guarded `UPDATE ... RETURNING`, not `max() + 1`:

```sql
UPDATE batch_workflows
SET workflow = $workflow,
    status = $status,
    next_event_ordinal = next_event_ordinal + $event_count,
    causal_cursor = $new_cursor,
    updated_at = now()
WHERE run_id = $run_id
  AND causal_cursor = $expected_cursor
RETURNING next_event_ordinal - $event_count AS first_ordinal;
```

Use the same pattern for event-only commits by omitting workflow/status changes.

Inside a transaction, run the guarded workflow update first to allocate
ordinals. Insert event rows second. If insertion fails, roll the transaction
back.

Cursor mismatch during a workflow mutation is a hard error. Do not silently
rebase durable workflow mutations.

Pre-enqueue custom events are the exception: storage emits `workflow.enqueued`
first and chains automatic pending events after it because there is no durable
cursor yet.

## Workflow Observability State

`Workflow` should gain private skipped execution metadata:

```rust
#[serde(skip, default)]
observability: WorkflowObservabilityState
```

This state contains:

- pending in-memory events
- current in-memory causal cursor

It is not resumable workflow state. If local execution crashes before commit,
pending events and generated event ids disappear.

Expose a low-level runtime API:

```rust
pub struct ObservabilityContext {
    pub causal_cursor: CausalRef,
}

pub fn set_observability_context(&mut self, context: ObservabilityContext)
```

Normal workflow code should not call this. Custom durable runtimes need it.

## Custom Events

Application code records custom events on `Workflow`:

```rust
workflow.record_event("ticket.classified", payload)?;
workflow.record_event_caused_by("ticket.routed", payload, cause)?;
```

`record_event` should:

- accept `impl Serialize`
- own the event type string in pending state
- return the generated UUIDv7 event id
- be fallible with structured crate errors
- assign the event id immediately
- update the in-memory causal cursor immediately
- validate event name, version, and payload size

`record_event_caused_by` should preserve the explicit cause and still advance
the in-memory cursor to the new pending event.

Pending events should distinguish automatic causes from explicit causes so the
durable append path can validate or rebase correctly where allowed.

First-party event prefixes are reserved. Custom events must not use reserved
first-party prefixes. First-party events use names such as:

```text
workflow.*
local_call.*
anthropic.*
openai.*
human.*
tool_call.*
fork_join.*
continuation.*
```

Custom event redaction is caller responsibility. The crate should validate
names and size, not infer sensitive fields.

## Event Payload Limits

Add `ObservabilityConfig` on `Trampoline`:

```rust
pub struct ObservabilityConfig {
    pub max_env_changes: usize,
    pub max_event_payload_bytes: usize,
}
```

Defaults:

- `max_env_changes = 64`
- `max_event_payload_bytes = 32 * 1024`

Oversized event payloads are hard errors. Do not silently truncate payloads.
For env diffs, cap the number of changed-key entries but include complete
counts, truncation metadata, and env digests.

## Redaction And Summaries

First-party event payloads must avoid raw sensitive or large bodies. They should
not include raw env values, prompts, tool inputs, tool outputs, human context,
provider requests, provider responses, or model text.

First-party events may include:

- function, tool, provider, model, output-key names
- run, continuation, provider batch, provider message, and request ids
- counts, durations, usage, and retryability
- storage refs
- byte sizes
- value shapes
- digests
- structured error summaries and refs

Use `setsum`, not `two_five_six`, for env and value fingerprints. Format digest
strings as lowercase hex with an algorithm/version prefix:

```text
setsum:v1:<lowerhex>
```

Use setsum at two levels:

- value digest: setsum over one normalized JSON value
- env digest: setsum over length-prefixed `(key, value_digest)` elements

Normalize JSON object keys recursively before per-value digesting. Preserve
array order.

Value shape enum:

```text
missing
null
bool
number
string
array
object
```

Changed env key summaries should include:

- key
- change kind: `added`, `removed`, or `modified`
- before summary
- after summary

Sort env diff keys lexicographically before truncation for deterministic
payload readability and tests. Env-level digest ordering should still come from
setsum.

## Trampoline API

Make the breaking change:

```rust
pub async fn run(&self, workflow: Workflow) -> Result<WorkflowOutcome, WorkflowError>
```

`WorkflowOutcome` contains:

```rust
pub struct WorkflowOutcome {
    pub result: WorkflowResult,
    pub events: Vec<PendingWorkflowEvent>,
}
```

`WorkflowError` contains the partially mutated workflow for debug extraction,
not for durable resume:

```rust
pub struct WorkflowError {
    pub workflow: Workflow,
    pub function: Option<String>,
    pub env_changes: Vec<EnvChangeSummary>,
    pub flow: Option<FlowSummary>,
    pub events: Vec<PendingWorkflowEvent>,
    pub source: handled::SError,
}
```

Implement `Display` and `std::error::Error` for custom errors. Implement
`From<WorkflowError> for handled::SError`, which gives `Into<SError>`
automatically. Conversion should minimally enrich the source error with fields
such as `run_id` and `pending_event_count`; do not serialize event payloads
into the error.

Add a stepwise API for durable executors:

```rust
pub fn next_action(&self, workflow: &Workflow) -> WorkflowNext;

pub async fn run_one_local_call(
    &self,
    workflow: Workflow,
) -> Result<WorkflowStepOutcome, WorkflowError>;
```

Keep the private `Step` enum private. Expose a sanitized `WorkflowNext` view,
for example:

```rust
pub enum WorkflowNext {
    Halt,
    LocalCall { function: String },
    Anthropic { provider: String, output_key: String },
    Human { output_key: String },
    ToolCall { tool_names: Vec<String>, output_key: String },
    OpenAI,
    ForkJoin { branch_run_id: serde_json::Map<String, serde_json::Value> },
}
```

`run_one_local_call` computes env and flow summaries because it has access to
workflow internals. On failure it returns the partial workflow and summaries.

## Batch Execution Semantics

`local_call.started` must commit before running user code. This requires two
transactions around a local call:

```text
TX1:
  append local_call.started
  advance cursor to started event

run user function

TX2:
  append pending custom events
  append local_call.completed or local_call.failed
  append boundary or terminal events when applicable
  mutate workflow row
```

A crash or hang inside user code can leave `local_call.started` without a
matching completion. That is honest and useful.

Before invoking the local function, the runtime sets the workflow's in-memory
causal cursor to the committed `local_call.started` event. Custom events
recorded inside the call chain from that event.

Completion/failure follows the last custom event from the call:

```text
local_call.started
ticket.loaded
ticket.classification_attempted
local_call.failed
workflow.failed
```

Boundary events follow `local_call.completed`:

```text
local_call.started
ticket.classified
local_call.completed
anthropic.suspended
```

If a local call advances to another local call, do not emit a boundary event.
The next call emits its own `local_call.started`.

Local-call events should include duration in completed/failed events. Do not
include local-call attempt numbers in v1; event id and ordinal are sufficient.

## Event Vocabulary

Use past-tense fact names. Avoid generic `workflow.status_changed` and generic
`continuation.selected`.

Top-level workflow lifecycle:

- `workflow.enqueued`
- `workflow.halted`
- `workflow.failed`

Local calls:

- `local_call.started`
- `local_call.completed`
- `local_call.failed`

Anthropic:

- `anthropic.suspended`
- `anthropic.submitted`
- `anthropic.completed`
- `anthropic.resumed`
- `anthropic.failed`
- `anthropic.retried`

Human:

- `human.blocked`
- `human.resumed`

OpenAI:

- `openai.blocked`
- `openai.resumed`

Tool calls:

- `tool_call.started`
- `tool_call.completed`
- `tool_call.failed`

Fork/join:

- `fork_join.started`
- `fork_join.completed`
- `fork_join.failed`

Terminal workflow events should be explicit. Intermediate workflow states should
be represented by specific events such as `anthropic.suspended` and
`human.blocked`, not by generic workflow status events.

## Enqueue And Halt

`workflow.enqueued` is emitted by durable enqueue/storage, not by
`Workflow::new`.

Top-level enqueue emits `workflow.enqueued` first. Caller-supplied pending
custom events are chained after it unless they have explicit valid causes.

Branch workflows also get `workflow.enqueued`, caused by the parent
`fork_join.started` event.

`workflow.halted` is emitted in the same transaction that marks the workflow
row halted. It should distinguish:

- `explicit_halt`
- `stack_exhausted`

If `Continuation::halt()` discards deferred continuation steps, the event
payload should include the discarded continuation depth.

Terminal workflow events should include env digest and env key count. Failure
terminal events should include both committed and partial env digests, clearly
labeled.

## Provider And Continuation Events

Provider request and response bodies stay in existing durable rows or artifact
stores. Events carry refs, digests, sizes, selected metadata, and summaries.

Use typed storage refs in payloads:

```json
{
  "kind": "batch_continuation_request",
  "continuation_id": "..."
}
```

V1 storage ref kinds:

- `batch_continuation_request`
- `batch_continuation_response`
- `batch_continuation_error`
- `batch_workflow_error`

`anthropic.suspended` should record requested model, provider, output key,
request summary, request digest, request bytes, and storage ref. It should
include best-effort counts and sizes such as message count, tool count, max
tokens, and system prompt bytes when available.

`anthropic.submitted` should reference `continuation_id`,
`provider_batch_id`, and `provider_message_id` when available. Provider batch
id stays in payload for v1, not a top-level event column.

`anthropic.completed` should record usage and response summary, including
response model, content block count, text byte count, tool use count, and
provider ids when available. Usage belongs in events. Cost does not; cost is
derived later using pricing tables and a pricing version.

Requested model and response model are captured separately.

Provider request ids and message ids should be included when available, but are
not required.

`anthropic.completed` and `anthropic.resumed` are separate events. Completion
records provider result metadata. Resume records insertion of the response into
the workflow env and flow advancement.

Human and current low-level OpenAI resumes use one event each:

- `human.resumed`
- `openai.resumed`

These include env/flow summaries. Actor metadata belongs inside payload JSON,
not as a top-level column.

Continuation retry keeps current behavior: a failed Anthropic continuation
makes the workflow failed. `retry_continuation` revives it and emits
`anthropic.retried`. The new continuation inherits the retry event as its
cursor. Do not add a separate `workflow.retried` event in v1.

Continuation attempt numbers already exist in `batch_continuations.attempt` and
should be included in relevant continuation events.

## Provider Batches

Do not put `causal_cursor` on `batch_provider_batches` in v1.

A provider batch can group continuations from multiple roots. Giving it one
cursor would imply false single-lineage causality. Represent provider-batch
causality through per-continuation events that reference `provider_batch_id`.
Provider batch rows remain aggregate operational state.

Example:

```text
run-a #12 anthropic.suspended -> continuation c-a
run-b #7  anthropic.suspended -> continuation c-b

run-a #13 anthropic.submitted { continuation_id: c-a, provider_batch_id: pb-9 }
run-b #8  anthropic.submitted { continuation_id: c-b, provider_batch_id: pb-9 }
```

## Tool Calls

Batch tool calls remain inline and at-least-once for v1. Do not turn them into
persisted continuations yet.

Observe tool dispatch at batch level, not per individual tool invocation:

- `tool_call.started`
- `tool_call.completed`
- `tool_call.failed`

`tool_call.completed` should inspect returned `ToolResultBlock`s and count
model-visible tool errors. Include tool names and tool call ids, not tool input
or output bodies.

Do not add `tool_call.resumed` in v1. `tool_call.completed` covers dispatch and
resume, including env and flow summaries.

## Fork/Join

The event contract should already use branch-name maps, even though the current
API is binary. Populate `"lhs"` and `"rhs"` for now.

Example payload shape:

```json
{
  "branch_run_id": {
    "lhs": "run-parent:lhs",
    "rhs": "run-parent:rhs"
  },
  "terminal_event_id": {
    "lhs": "018f...",
    "rhs": "018f..."
  },
  "join_function": "merge_reports"
}
```

Use `terminal_event_id`, spelled correctly. Branch map keys are arbitrary
non-empty strings. Do not modify `ForkBranch` yet; branch names remain
storage/event metadata for v1.

Parent ledger:

- `fork_join.started`
- `fork_join.completed` or `fork_join.failed`

Child ledgers:

- `workflow.enqueued` caused by the parent fork event
- normal child execution events
- child `workflow.halted` or `workflow.failed`

`fork_join.completed` is caused by the parent's current causal cursor, usually
the parent `fork_join.started` event. Child terminal event ids belong in payload
maps because join is fan-in and `caused_by` is single-parent.

## Errors

Failure event payloads should include structured summaries and refs. Full error
text should live in the owning status row when possible:

- continuation failures reference `batch_continuations.error_sexpr`
- workflow failures reference `batch_workflows.error_sexpr`

For local-call failures with no separate owner row, event payloads can include
the needed structured summary and a ref to the workflow error row once written.
Avoid duplicating large or sensitive strings into events where a row ref
suffices.

On terminal failures, emit both operation-specific failure and
`workflow.failed`:

```text
local_call.failed
workflow.failed
```

or:

```text
anthropic.failed
workflow.failed
```

On local-call failure, commit pending custom events recorded before the error,
then `local_call.failed`, then `workflow.failed`. Preserve the partial workflow
only for event/debug extraction. Do not persist the partially mutated workflow
as the new durable resumable state.

Failure events should include partial flow summaries and local-call duration.

## Actor Metadata

Actor metadata belongs inside event payloads. Keep it flexible JSON with
conventional fields:

```json
{
  "actor": {
    "kind": "user",
    "id_hash": "setsum:v1:...",
    "metadata": {}
  }
}
```

Avoid raw emails or usernames by default. Applications may opt into more
specific identity metadata.

## Event Inspection APIs

Expose raw event inspection APIs in `batch::Executor`, matching the existing
record-inspection style:

```rust
pub struct WorkflowEventRecord {
    pub event_id: uuid::Uuid,
    pub root_run_id: String,
    pub run_id: String,
    pub parent_run_id: Option<String>,
    pub fork_name: Option<String>,
    pub event_ordinal: i64,
    pub caused_by: CausalRef,
    pub event_type: String,
    pub event_version: i16,
    pub continuation_id: Option<String>,
    pub event: serde_json::Value,
    pub created_at: ...
}
```

Initial query methods:

```rust
load_workflow_event(event_id)
load_workflow_events(run_id)
load_root_workflow_events(root_run_id)
```

Return raw records in v1. Do not require typed first-party event decoding in the
inspection API yet.

`batch::Executor::poll` should continue returning `PollSummary`, not event
payloads. Add an event count such as `events_committed`.

## Live Execution

Live execution can use the same event vocabulary. It does not provide the same
durable contract unless a future durable live store is explicitly configured.

`live::Executor::run_workflow` should switch to a richer `live::RunError`:

```rust
pub struct RunError {
    pub workflow: Option<Workflow>,
    pub events: Vec<PendingWorkflowEvent>,
    pub source: handled::SError,
}
```

Implement `Display`, `Error`, and `From<RunError> for handled::SError`.
Conversion should minimally add breadcrumbs such as `run_id` and
`pending_event_count`.

Do not add a public observer trait in the first implementation. Build the
durable batch ledger first and keep internal types shaped so an observer can be
added later.

## Implementation Checklist

- Add `uuid` and `setsum` dependencies with the required feature flags.
- Define `CausalRef`, pending event types, committed event records, first-party
  payload structs, value summaries, env summaries, flow summaries, and
  observability config.
- Add skipped observability state to `Workflow`.
- Add custom event recording APIs on `Workflow`.
- Change `Trampoline::run` to return `WorkflowOutcome` and `WorkflowError`.
- Add `next_action` and `run_one_local_call`.
- Compute env and flow summaries inside the trampoline layer.
- Update `live::Executor` for the new trampoline return types and richer
  `RunError`.
- Update `migrations/0001_batch.sql` with workflow/continuation cursors and
  `batch_workflow_events`.
- Update batch enqueue to emit `workflow.enqueued` and pending initial custom
  events transactionally.
- Update batch local-call execution to commit `local_call.started` before user
  code and completion/failure events after user code.
- Update batch suspension, resume, tool-call, fork/join, retry, halt, and
  failure paths to append first-party events transactionally.
- Add event insertion helpers using guarded `UPDATE ... RETURNING`.
- Add event cause validation, including same-batch event references.
- Add raw event inspection APIs.
- Add unit tests for event ordering, causality, pending-event skip behavior,
  failure preservation, env summaries, payload size limits, and branch event
  maps.
- Add or update integration tests for batch event persistence around enqueue,
  local calls, Anthropic suspension/resume, human/OpenAI block/resume, tool
  calls, fork/join, and retry.