langcontinuation 0.1.0

# Batch Observability Design

This document records the observability decisions for `langcontinuation` batch
execution. It is a design record, not an implementation status report.

## Philosophy

Observability is part of the durable workflow contract. The primary record
should be a durable workflow event ledger, not logs, traces, or metrics.
Logs, metrics, and traces are projections of the ledger.

The workflow snapshot remains the resumable state machine. The event ledger
lives beside the workflow as append-only history keyed by workflow run id.

The batch executor owns the strict contract: event writes are committed
transactionally with workflow state changes. Live execution can reuse the same
event vocabulary, but live observation is best-effort unless a future durable
live store is added.

The event ledger answers "what happened, in what exact per-run order, and what
caused it." Existing workflow and continuation rows answer "where is it now."

## Storage Scope

The Rust event model should be runtime-generic. The SQL table in this crate
should follow the existing batch namespace and be named `batch_workflow_events`.

Events have a different lifecycle than workflow rows. Workflow, continuation,
and provider rows can become quiescent and eligible for cleanup. Event rows are
immutable facts and should be retained or archived by an independent policy.
Do not add `quiescent` to event rows.

Update `migrations/0001_batch.sql` as the canonical fresh schema. Do not add
compatibility `ALTER TABLE` statements for databases that already ran the old
schema.

## Schema

Add workflow-level event allocation and causality metadata:

```sql
root_run_id TEXT NOT NULL,
next_event_ordinal BIGINT NOT NULL DEFAULT 0 CHECK (next_event_ordinal >= 0),
causal_cursor JSONB NOT NULL
```

`root_run_id` is storage metadata. For a top-level workflow it equals `run_id`.
For branches it is copied from the parent workflow row. It should not be added
to serialized `Workflow`.

`causal_cursor` has no SQL default because it depends on the run. Insert code
must set it explicitly.

Add continuation-level causality metadata:

```sql
causal_cursor JSONB NOT NULL
```

Continuations have their own cursor because provider and external lifecycle
events happen while the workflow row is waiting.

Add the event table:

```sql
CREATE TABLE batch_workflow_events (
    event_id UUID PRIMARY KEY,
    root_run_id TEXT NOT NULL,
    run_id TEXT NOT NULL,
    parent_run_id TEXT,
    fork_name TEXT,
    event_ordinal BIGINT NOT NULL CHECK (event_ordinal >= 0),
    caused_by JSONB NOT NULL,
    event_type TEXT NOT NULL,
    event_version SMALLINT NOT NULL CHECK (event_version > 0),
    continuation_id TEXT,
    event JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (run_id, event_ordinal)
);
```

`run_id` must always be a real workflow run id. Provider batches are referenced
objects, not subjects in `batch_workflow_events`.

Keep these out of the v1 top-level event schema:

- `workflow_status`
- `provider_batch_id`
- `attempt`
- `trace_id` or external correlation id
- `fork_path`
- `quiescent`
- top-level actor fields

`continuation_id` is a top-level nullable column because it is a common
operational correlation key.

Recommended indexes:

```sql
CREATE INDEX batch_workflow_events_root_idx
    ON batch_workflow_events (root_run_id, created_at, event_id);

CREATE INDEX batch_workflow_events_type_idx
    ON batch_workflow_events (event_type, created_at);

CREATE INDEX batch_workflow_events_continuation_idx
    ON batch_workflow_events (continuation_id)
    WHERE continuation_id IS NOT NULL;
```

The primary key covers event lookup. The unique `(run_id, event_ordinal)`
constraint covers exact per-run ordering. Do not add a caused-by index in v1.

Validate event names in Rust, not with a SQL `CHECK`. Keep the SQL event type
column general.

## Identity And Ordering

Use client-generated UUIDv7 event ids with the standard `uuid` crate:

```toml
uuid = { version = "1", features = ["serde", "v7"] }
```

Store event ids as native Postgres `UUID`, not `TEXT`. Enable SQLx UUID support
as needed by the batch feature.

Use zero-based exact per-run ordinals:

```text
run-123 #0 workflow.enqueued
run-123 #1 local_call.started
```

`event_ordinal` and `next_event_ordinal` are `BIGINT` in SQL and `i64` in
SQL-facing Rust records. `event_version` is `SMALLINT` in SQL and `i16` in
SQL-facing Rust records.

Do not create a root-wide ordinal. Avoid a root-wide logical order. Fork
branches execute independently, and a root-wide counter would add false
coordination and misleading semantics. Cross-run views can sort by
`created_at, event_id` while showing each run's exact ordinal.

## Causality

Use a mandatory top-level JSONB causal reference:

```json
{ "type": "run_id", "run_id": "run-123" }
```

or:

```json
{ "type": "event_id", "event_id": "018f..." }
```

Name this field `caused_by` on event rows and `causal_cursor` on mutable
workflow and continuation rows. Do not call it `caused_by_event_id` because it
can point to a run anchor.

`CausalRef::EventId` should include only the event UUID. The target run id and
ordinal can be fetched from the event row.

`CausalRef::RunId` is a known workflow run anchor. In batch mode, validate it
against `batch_workflows`, with same-transaction insertion allowed. Run anchors
are allowed after the first event, but should be uncommon.

Every event must be walkable:

- automatic events normally point to the current causal cursor
- custom explicit-cause events may point to a valid run or event
- events within the same atomic batch may point to earlier pending events in
  that same batch
- event-id causes that are neither in-batch nor already durable should reject
  the commit

The storage layer provides the causal cursor when executing or resuming a
workflow. The cursor is not serialized inside `Workflow`.

Branches should be caused by the parent fork event. A branch workflow row can
therefore start with a `causal_cursor` that points to an event from another run.
This is expected. The field is a causal cursor, not "last event emitted by this
run."

## Atomic Append

Durable event writes are state-transition critical. If the batch executor cannot
append the required event rows, the corresponding workflow state mutation must
not commit.

Projection to logs, metrics, traces, or external processors happens only after
durable commit and is best-effort/replayable.

Allocate ordinals with a guarded `UPDATE ... RETURNING`, not `max() + 1`:

```sql
UPDATE batch_workflows
SET workflow = $workflow,
    status = $status,
    next_event_ordinal = next_event_ordinal + $event_count,
    causal_cursor = $new_cursor,
    updated_at = now()
WHERE run_id = $run_id
  AND causal_cursor = $expected_cursor
RETURNING next_event_ordinal - $event_count AS first_ordinal;
```

Use the same pattern for event-only commits by omitting workflow/status changes.

Inside a transaction, run the guarded workflow update first to allocate
ordinals. Insert event rows second. If insertion fails, roll the transaction
back.

Cursor mismatch during a workflow mutation is a hard error. Do not silently
rebase durable workflow mutations.

Pre-enqueue custom events are the exception: storage emits `workflow.enqueued`
first and chains automatic pending events after it because there is no durable
cursor yet.

## Workflow Observability State

`Workflow` should gain private skipped execution metadata:

```rust
#[serde(skip, default)]
observability: WorkflowObservabilityState
```

This state contains:

- pending in-memory events
- current in-memory causal cursor

It is not resumable workflow state. If local execution crashes before commit,
pending events and generated event ids disappear.

Expose a low-level runtime API:

```rust
pub struct ObservabilityContext {
    pub causal_cursor: CausalRef,
}

pub fn set_observability_context(&mut self, context: ObservabilityContext)
```

Normal workflow code should not call this. Custom durable runtimes need it.

## Custom Events

Application code records custom events on `Workflow`:

```rust
workflow.record_event("ticket.classified", payload)?;
workflow.record_event_caused_by("ticket.routed", payload, cause)?;
```

`record_event` should:

- accept `impl Serialize`
- own the event type string in pending state
- return the generated UUIDv7 event id
- be fallible with structured crate errors
- assign the event id immediately
- update the in-memory causal cursor immediately
- validate event name, version, and payload size

`record_event_caused_by` should preserve the explicit cause and still advance
the in-memory cursor to the new pending event.

Pending events should distinguish automatic causes from explicit causes so the
durable append path can validate or rebase correctly where allowed.

First-party event prefixes are reserved. Custom events must not use reserved
first-party prefixes. First-party events use names such as:

```text
workflow.*
local_call.*
anthropic.*
openai.*
human.*
tool_call.*
fork_join.*
continuation.*
```

Custom event redaction is caller responsibility. The crate should validate
names and size, not infer sensitive fields.

## Event Payload Limits

Add `ObservabilityConfig` on `Trampoline`:

```rust
pub struct ObservabilityConfig {
    pub max_env_changes: usize,
    pub max_event_payload_bytes: usize,
}
```

Defaults:

- `max_env_changes = 64`
- `max_event_payload_bytes = 32 * 1024`

Oversized event payloads are hard errors. Do not silently truncate payloads.
For env diffs, cap the number of changed-key entries but include complete
counts, truncation metadata, and env digests.

## Redaction And Summaries

First-party event payloads must avoid raw sensitive or large bodies. They should
not include raw env values, prompts, tool inputs, tool outputs, human context,
provider requests, provider responses, or model text.

First-party events may include:

- function, tool, provider, model, output-key names
- run, continuation, provider batch, provider message, and request ids
- counts, durations, usage, and retryability
- storage refs
- byte sizes
- value shapes
- digests
- structured error summaries and refs

Use `setsum`, not `two_five_six`, for env and value fingerprints. Format digest
strings as lowercase hex with an algorithm/version prefix:

```text
setsum:v1:<lowerhex>
```

Use setsum at two levels:

- value digest: setsum over one normalized JSON value
- env digest: setsum over length-prefixed `(key, value_digest)` elements

Normalize JSON object keys recursively before per-value digesting. Preserve
array order.

Value shape enum:

```text
missing
null
bool
number
string
array
object
```

Changed env key summaries should include:

- key
- change kind: `added`, `removed`, or `modified`
- before summary
- after summary

Sort env diff keys lexicographically before truncation for deterministic
payload readability and tests. Env-level digest ordering should still come from
setsum.

## Trampoline API

Make the breaking change:

```rust
pub async fn run(&self, workflow: Workflow) -> Result<WorkflowOutcome, WorkflowError>
```

`WorkflowOutcome` contains:

```rust
pub struct WorkflowOutcome {
    pub result: WorkflowResult,
    pub events: Vec<PendingWorkflowEvent>,
}
```

`WorkflowError` contains the partially mutated workflow for debug extraction,
not for durable resume:

```rust
pub struct WorkflowError {
    pub workflow: Workflow,
    pub function: Option<String>,
    pub env_changes: Vec<EnvChangeSummary>,
    pub flow: Option<FlowSummary>,
    pub events: Vec<PendingWorkflowEvent>,
    pub source: handled::SError,
}
```

Implement `Display` and `std::error::Error` for custom errors. Implement
`From<WorkflowError> for handled::SError`, which gives `Into<SError>`
automatically. Conversion should minimally enrich the source error with fields
such as `run_id` and `pending_event_count`; do not serialize event payloads
into the error.

Add a stepwise API for durable executors:

```rust
pub fn next_action(&self, workflow: &Workflow) -> WorkflowNext;

pub async fn run_one_local_call(
    &self,
    workflow: Workflow,
) -> Result<WorkflowStepOutcome, WorkflowError>;
```

Keep the private `Step` enum private. Expose a sanitized `WorkflowNext` view,
for example:

```rust
pub enum WorkflowNext {
    Halt,
    LocalCall { function: String },
    Anthropic { provider: String, output_key: String },
    Human { output_key: String },
    ToolCall { tool_names: Vec<String>, output_key: String },
    OpenAI,
    ForkJoin { branch_run_id: serde_json::Map<String, serde_json::Value> },
}
```

`run_one_local_call` computes env and flow summaries because it has access to
workflow internals. On failure it returns the partial workflow and summaries.

## Batch Execution Semantics

`local_call.started` must commit before running user code. This requires two
transactions around a local call:

```text
TX1:
  append local_call.started
  advance cursor to started event

run user function

TX2:
  append pending custom events
  append local_call.completed or local_call.failed
  append boundary or terminal events when applicable
  mutate workflow row
```

A crash or hang inside user code can leave `local_call.started` without a
matching completion. That is honest and useful.

Before invoking the local function, the runtime sets the workflow's in-memory
causal cursor to the committed `local_call.started` event. Custom events
recorded inside the call chain from that event.

Completion/failure follows the last custom event from the call:

```text
local_call.started
ticket.loaded
ticket.classification_attempted
local_call.failed
workflow.failed
```

Boundary events follow `local_call.completed`:

```text
local_call.started
ticket.classified
local_call.completed
anthropic.suspended
```

If a local call advances to another local call, do not emit a boundary event.
The next call emits its own `local_call.started`.

Local-call events should include duration in completed/failed events. Do not
include local-call attempt numbers in v1; event id and ordinal are sufficient.

## Event Vocabulary

Use past-tense fact names. Avoid generic `workflow.status_changed` and generic
`continuation.selected`.

Top-level workflow lifecycle:

- `workflow.enqueued`
- `workflow.halted`
- `workflow.failed`

Local calls:

- `local_call.started`
- `local_call.completed`
- `local_call.failed`

Anthropic:

- `anthropic.suspended`
- `anthropic.submitted`
- `anthropic.completed`
- `anthropic.resumed`
- `anthropic.failed`
- `anthropic.retried`

Human:

- `human.blocked`
- `human.resumed`

OpenAI:

- `openai.blocked`
- `openai.resumed`

Tool calls:

- `tool_call.started`
- `tool_call.completed`
- `tool_call.failed`

Fork/join:

- `fork_join.started`
- `fork_join.completed`
- `fork_join.failed`

Terminal workflow events should be explicit. Intermediate workflow states should
be represented by specific events such as `anthropic.suspended` and
`human.blocked`, not by generic workflow status events.

## Enqueue And Halt

`workflow.enqueued` is emitted by durable enqueue/storage, not by
`Workflow::new`.

Top-level enqueue emits `workflow.enqueued` first. Caller-supplied pending
custom events are chained after it unless they have explicit valid causes.

Branch workflows also get `workflow.enqueued`, caused by the parent
`fork_join.started` event.

`workflow.halted` is emitted in the same transaction that marks the workflow
row halted. It should distinguish:

- `explicit_halt`
- `stack_exhausted`

If `Continuation::halt()` discards deferred continuation steps, the event
payload should include the discarded continuation depth.

Terminal workflow events should include env digest and env key count. Failure
terminal events should include both committed and partial env digests, clearly
labeled.

## Provider And Continuation Events

Provider request and response bodies stay in existing durable rows or artifact
stores. Events carry refs, digests, sizes, selected metadata, and summaries.

Use typed storage refs in payloads:

```json
{
  "kind": "batch_continuation_request",
  "continuation_id": "..."
}
```

V1 storage ref kinds:

- `batch_continuation_request`
- `batch_continuation_response`
- `batch_continuation_error`
- `batch_workflow_error`

`anthropic.suspended` should record requested model, provider, output key,
request summary, request digest, request bytes, and storage ref. It should
include best-effort counts and sizes such as message count, tool count, max
tokens, and system prompt bytes when available.

`anthropic.submitted` should reference `continuation_id`,
`provider_batch_id`, and `provider_message_id` when available. Provider batch
id stays in payload for v1, not a top-level event column.

`anthropic.completed` should record usage and response summary, including
response model, content block count, text byte count, tool use count, and
provider ids when available. Usage belongs in events. Cost does not; cost is
derived later using pricing tables and a pricing version.

Requested model and response model are captured separately.

Provider request ids and message ids should be included when available, but are
not required.

`anthropic.completed` and `anthropic.resumed` are separate events. Completion
records provider result metadata. Resume records insertion of the response into
the workflow env and flow advancement.

Human and current low-level OpenAI resumes use one event each:

- `human.resumed`
- `openai.resumed`

These include env/flow summaries. Actor metadata belongs inside payload JSON,
not as a top-level column.

Continuation retry keeps current behavior: a failed Anthropic continuation
makes the workflow failed. `retry_continuation` revives it and emits
`anthropic.retried`. The new continuation inherits the retry event as its
cursor. Do not add a separate `workflow.retried` event in v1.

Continuation attempt numbers already exist in `batch_continuations.attempt` and
should be included in relevant continuation events.

## Provider Batches

Do not put `causal_cursor` on `batch_provider_batches` in v1.

A provider batch can group continuations from multiple roots. Giving it one
cursor would imply false single-lineage causality. Represent provider-batch
causality through per-continuation events that reference `provider_batch_id`.
Provider batch rows remain aggregate operational state.

Example:

```text
run-a #12 anthropic.suspended -> continuation c-a
run-b #7  anthropic.suspended -> continuation c-b

run-a #13 anthropic.submitted { continuation_id: c-a, provider_batch_id: pb-9 }
run-b #8  anthropic.submitted { continuation_id: c-b, provider_batch_id: pb-9 }
```

## Tool Calls

Batch tool calls remain inline and at-least-once for v1. Do not turn them into
persisted continuations yet.

Observe tool dispatch at batch level, not per individual tool invocation:

- `tool_call.started`
- `tool_call.completed`
- `tool_call.failed`

`tool_call.completed` should inspect returned `ToolResultBlock`s and count
model-visible tool errors. Include tool names and tool call ids, not tool input
or output bodies.

Do not add `tool_call.resumed` in v1. `tool_call.completed` covers dispatch and
resume, including env and flow summaries.

## Fork/Join

The event contract should already use branch-name maps, even though the current
API is binary. Populate `"lhs"` and `"rhs"` for now.

Example payload shape:

```json
{
  "branch_run_id": {
    "lhs": "run-parent:lhs",
    "rhs": "run-parent:rhs"
  },
  "terminal_event_id": {
    "lhs": "018f...",
    "rhs": "018f..."
  },
  "join_function": "merge_reports"
}
```

Use `terminal_event_id`, spelled correctly. Branch map keys are arbitrary
non-empty strings. Do not modify `ForkBranch` yet; branch names remain
storage/event metadata for v1.

Parent ledger:

- `fork_join.started`
- `fork_join.completed` or `fork_join.failed`

Child ledgers:

- `workflow.enqueued` caused by the parent fork event
- normal child execution events
- child `workflow.halted` or `workflow.failed`

`fork_join.completed` is caused by the parent's current causal cursor, usually
the parent `fork_join.started` event. Child terminal event ids belong in payload
maps because join is fan-in and `caused_by` is single-parent.

## Errors

Failure event payloads should include structured summaries and refs. Full error
text should live in the owning status row when possible:

- continuation failures reference `batch_continuations.error_sexpr`
- workflow failures reference `batch_workflows.error_sexpr`

For local-call failures with no separate owner row, event payloads can include
the needed structured summary and a ref to the workflow error row once written.
Avoid duplicating large or sensitive strings into events where a row ref
suffices.

On terminal failures, emit both operation-specific failure and
`workflow.failed`:

```text
local_call.failed
workflow.failed
```

or:

```text
anthropic.failed
workflow.failed
```

On local-call failure, commit pending custom events recorded before the error,
then `local_call.failed`, then `workflow.failed`. Preserve the partial workflow
only for event/debug extraction. Do not persist the partially mutated workflow
as the new durable resumable state.

Failure events should include partial flow summaries and local-call duration.

## Actor Metadata

Actor metadata belongs inside event payloads. Keep it flexible JSON with
conventional fields:

```json
{
  "actor": {
    "kind": "user",
    "id_hash": "setsum:v1:...",
    "metadata": {}
  }
}
```

Avoid raw emails or usernames by default. Applications may opt into more
specific identity metadata.

## Event Inspection APIs

Expose raw event inspection APIs in `batch::Executor`, matching the existing
record-inspection style:

```rust
pub struct WorkflowEventRecord {
    pub event_id: uuid::Uuid,
    pub root_run_id: String,
    pub run_id: String,
    pub parent_run_id: Option<String>,
    pub fork_name: Option<String>,
    pub event_ordinal: i64,
    pub caused_by: CausalRef,
    pub event_type: String,
    pub event_version: i16,
    pub continuation_id: Option<String>,
    pub event: serde_json::Value,
    pub created_at: ...
}
```

Initial query methods:

```rust
load_workflow_event(event_id)
load_workflow_events(run_id)
load_root_workflow_events(root_run_id)
```

Return raw records in v1. Do not require typed first-party event decoding in the
inspection API yet.

`batch::Executor::poll` should continue returning `PollSummary`, not event
payloads. Add an event count such as `events_committed`.

## Live Execution

Live execution can use the same event vocabulary. It does not provide the same
durable contract unless a future durable live store is explicitly configured.

`live::Executor::run_workflow` should switch to a richer `live::RunError`:

```rust
pub struct RunError {
    pub workflow: Option<Workflow>,
    pub events: Vec<PendingWorkflowEvent>,
    pub source: handled::SError,
}
```

Implement `Display`, `Error`, and `From<RunError> for handled::SError`.
Conversion should minimally add breadcrumbs such as `run_id` and
`pending_event_count`.

Do not add a public observer trait in the first implementation. Build the
durable batch ledger first and keep internal types shaped so an observer can be
added later.

## Implementation Checklist

- Add `uuid` and `setsum` dependencies with the required feature flags.
- Define `CausalRef`, pending event types, committed event records, first-party
  payload structs, value summaries, env summaries, flow summaries, and
  observability config.
- Add skipped observability state to `Workflow`.
- Add custom event recording APIs on `Workflow`.
- Change `Trampoline::run` to return `WorkflowOutcome` and `WorkflowError`.
- Add `next_action` and `run_one_local_call`.
- Compute env and flow summaries inside the trampoline layer.
- Update `live::Executor` for the new trampoline return types and richer
  `RunError`.
- Update `migrations/0001_batch.sql` with workflow/continuation cursors and
  `batch_workflow_events`.
- Update batch enqueue to emit `workflow.enqueued` and pending initial custom
  events transactionally.
- Update batch local-call execution to commit `local_call.started` before user
  code and completion/failure events after user code.
- Update batch suspension, resume, tool-call, fork/join, retry, halt, and
  failure paths to append first-party events transactionally.
- Add event insertion helpers using guarded `UPDATE ... RETURNING`.
- Add event cause validation, including same-batch event references.
- Add raw event inspection APIs.
- Add unit tests for event ordering, causality, pending-event skip behavior,
  failure preservation, env summaries, payload size limits, and branch event
  maps.
- Add or update integration tests for batch event persistence around enqueue,
  local calls, Anthropic suspension/resume, human/OpenAI block/resume, tool
  calls, fork/join, and retry.