swink-agent 0.8.0

Core scaffolding for running LLM-powered agentic loops
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
# Streaming Interface

**Source files:** `src/stream.rs`, `adapters/src/proxy.rs`, `adapters/src/ollama.rs`, `adapters/src/anthropic.rs`, `adapters/src/openai.rs`, `adapters/src/convert.rs`
**Related:** [PRD Β§7](../../planning/PRD.md#7-streaming-interface)

The streaming interface is the single boundary between the harness and LLM providers. The harness never holds provider credentials or SDK clients. All inference flows through a `StreamFn` implementation. Nine remote implementations ship in the adapters crate, plus `LocalStreamFn` in the local-llm crate:

| Implementation | Crate | Transport | Endpoint |
|---|---|---|---|
| `ProxyStreamFn` | `swink-agent-adapters` | **SSE** (Server-Sent Events via `eventsource-stream`) | `POST /v1/stream` on a caller-managed proxy |
| `OllamaStreamFn` | `swink-agent-adapters` | **NDJSON** (newline-delimited JSON over chunked HTTP) | `POST /api/chat` on an Ollama server |
| `AnthropicStreamFn` | `swink-agent-adapters` | **SSE** (Server-Sent Events) | `POST /v1/messages` on the Anthropic Messages API |
| `OpenAiStreamFn` | `swink-agent-adapters` | **SSE** (Server-Sent Events) | `POST /v1/chat/completions` on any OpenAI-compatible API |
| `AzureStreamFn` | `swink-agent-adapters` | **SSE** | Azure OpenAI endpoint |
| `BedrockStreamFn` | `swink-agent-adapters` | **SSE** (+ AWS SigV4) | AWS Bedrock endpoint |
| `GeminiStreamFn` | `swink-agent-adapters` | **SSE** | Google Gemini API |
| `MistralStreamFn` | `swink-agent-adapters` | **SSE** | Mistral API |
| `XAiStreamFn` | `swink-agent-adapters` | **SSE** | xAI API |
| `LocalStreamFn` | `swink-agent-local-llm` | Local inference | On-device (SmolLM3-3B) |

All implementations produce the same `Stream<AssistantMessageEvent>` output. The transport difference is internal: `ProxyStreamFn` parses SSE frames with named event types, `OllamaStreamFn` splits raw newline-delimited JSON lines and maps Ollama's response schema into harness events, `AnthropicStreamFn` connects directly to the Anthropic Messages API, and `OpenAiStreamFn` connects to any OpenAI-compatible endpoint. Callers can also supply a fully custom `StreamFn` for any other provider.

All adapters use the `tracing` crate for structured logging (`debug!`, `warn!`, `error!`), providing consistent observability across providers.

---

## L2 β€” Components

```mermaid
flowchart TB
    subgraph CallerLayer["πŸ‘€ Caller"]
        CallerStreamFn["Custom StreamFn<br/>(direct provider SDK)"]
    end

    subgraph StreamLayer["πŸ“‘ Streaming Interface (core)"]
        StreamFnTrait["StreamFn (trait)<br/>stream(model, context, options)<br/>β†’ Stream&lt;AssistantMessageEvent&gt;"]
        StreamOptions["StreamOptions<br/>temperature Β· max_tokens<br/>session_id Β· transport"]
        EventTypes["AssistantMessageEvent<br/>(start/delta/end protocol)"]
        Delta["AssistantMessageDelta<br/>TextDelta Β· ThinkingDelta Β· ToolCallDelta"]
    end

    subgraph ProxyLayer["πŸ”€ Proxy StreamFn (adapters crate)"]
        ProxyStreamFn["ProxyStreamFn"]
        SSEParser["SSE Parser<br/>(eventsource-stream)"]
        Reconstructor["Message Reconstructor<br/>(delta β†’ partial AssistantMessage)"]
    end

    subgraph OllamaLayer["πŸ”Œ Ollama Adapter (adapters crate)"]
        OllamaStreamFn["OllamaStreamFn"]
        NDJSONParser["NDJSON Parser<br/>(newline-delimited JSON)"]
        OllamaMapper["Event Mapper<br/>(Ollama chunks β†’ AssistantMessageEvent)"]
    end

    subgraph AnthropicLayer["πŸ”Œ Anthropic Adapter (adapters crate)"]
        AnthropicStreamFn["AnthropicStreamFn"]
        AnthropicSSEParser["SSE Parser<br/>(Anthropic event types)"]
        AnthropicMapper["Event Mapper<br/>(Anthropic β†’ AssistantMessageEvent)"]
    end

    subgraph OpenAiLayer["πŸ”Œ OpenAI Adapter (adapters crate)"]
        OpenAiStreamFn["OpenAiStreamFn"]
        OpenAiSSEParser["SSE Parser<br/>(OpenAI event types)"]
        OpenAiMapper["Event Mapper<br/>(OpenAI β†’ AssistantMessageEvent)"]
    end

    subgraph SharedLayer["πŸ”§ Shared Adapter Infrastructure"]
        MessageConverter["MessageConverter (trait)<br/>convert harness messages<br/>to provider-specific format"]
        TracingInfra["tracing crate<br/>(debug / warn / error logging)"]
    end

    subgraph ExternalLayer["🌐 External"]
        DirectProvider["LLM Provider API<br/>(direct)"]
        ProxyServer["LLM Proxy Server<br/>(HTTP/SSE)"]
        BackendProvider["LLM Provider API<br/>(via proxy)"]
        OllamaServer["Ollama Server<br/>(HTTP/NDJSON)"]
        AnthropicAPI["Anthropic Messages API<br/>(HTTP/SSE)"]
        OpenAiAPI["OpenAI-compatible API<br/>(HTTP/SSE)"]
    end

    CallerStreamFn -->|"implements"| StreamFnTrait
    ProxyStreamFn -->|"implements"| StreamFnTrait
    OllamaStreamFn -->|"implements"| StreamFnTrait
    AnthropicStreamFn -->|"implements"| StreamFnTrait
    OpenAiStreamFn -->|"implements"| StreamFnTrait
    StreamFnTrait --> StreamOptions
    StreamFnTrait --> EventTypes
    EventTypes --> Delta
    ProxyStreamFn --> SSEParser
    SSEParser --> Reconstructor
    Reconstructor --> EventTypes
    OllamaStreamFn --> NDJSONParser
    NDJSONParser --> OllamaMapper
    OllamaMapper --> EventTypes
    AnthropicStreamFn --> AnthropicSSEParser
    AnthropicSSEParser --> AnthropicMapper
    AnthropicMapper --> EventTypes
    OpenAiStreamFn --> OpenAiSSEParser
    OpenAiSSEParser --> OpenAiMapper
    OpenAiMapper --> EventTypes
    AnthropicStreamFn -->|"uses"| MessageConverter
    OpenAiStreamFn -->|"uses"| MessageConverter
    OllamaStreamFn -->|"uses"| MessageConverter
    CallerStreamFn -->|"direct calls"| DirectProvider
    ProxyStreamFn -->|"POST /v1/stream<br/>Bearer token (SSE)"| ProxyServer
    ProxyServer -->|"proxied request"| BackendProvider
    OllamaStreamFn -->|"POST /api/chat<br/>(NDJSON stream)"| OllamaServer
    AnthropicStreamFn -->|"POST /v1/messages<br/>x-api-key header (SSE)"| AnthropicAPI
    OpenAiStreamFn -->|"POST /v1/chat/completions<br/>Bearer token (SSE)"| OpenAiAPI

    classDef callerStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
    classDef streamStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    classDef proxyStyle fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
    classDef adapterStyle fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef externalStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000

    classDef sharedStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000

    class CallerStreamFn callerStyle
    class StreamFnTrait,StreamOptions,EventTypes,Delta streamStyle
    class ProxyStreamFn,SSEParser,Reconstructor proxyStyle
    class OllamaStreamFn,NDJSONParser,OllamaMapper adapterStyle
    class AnthropicStreamFn,AnthropicSSEParser,AnthropicMapper adapterStyle
    class OpenAiStreamFn,OpenAiSSEParser,OpenAiMapper adapterStyle
    class MessageConverter,TracingInfra sharedStyle
    class DirectProvider,ProxyServer,BackendProvider,OllamaServer,AnthropicAPI,OpenAiAPI externalStyle
```

---

## L3 β€” AssistantMessageEvent Protocol

Events follow a strict start/delta/end protocol per content block. Each block has a `content_index` that identifies its position in the final message's content vec.

```mermaid
flowchart LR
    subgraph StreamEvents["AssistantMessageEvent variants"]
        Start["Start<br/>(stream open)"]

        subgraph TextBlock["Text block lifecycle"]
            TextStart["TextStart(content_index)"]
            TextDelta["TextDelta(content_index, delta: String)"]
            TextEnd["TextEnd(content_index)"]
        end

        subgraph ThinkingBlock["Thinking block lifecycle"]
            ThinkStart["ThinkingStart(content_index)"]
            ThinkDelta["ThinkingDelta(content_index, delta: String)"]
            ThinkEnd["ThinkingEnd(content_index, signature: Option&lt;String&gt;)"]
        end

        subgraph ToolBlock["Tool call block lifecycle"]
            ToolStart["ToolCallStart(content_index, id, name)"]
            ToolDelta["ToolCallDelta(content_index, json_fragment: String)"]
            ToolEnd["ToolCallEnd(content_index)"]
        end

        Done["Done(stop_reason, usage, cost)"]
        Error["Error(stop_reason, error_message,<br/>usage: Option&lt;Usage&gt;, error_kind: Option&lt;StreamErrorKind&gt;)"]
    end

    Start --> TextBlock
    Start --> ThinkingBlock
    Start --> ToolBlock
    TextBlock --> Done
    ThinkingBlock --> Done
    ToolBlock --> Done
    TextBlock --> Error
    ThinkingBlock --> Error
    ToolBlock --> Error

    classDef eventStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    classDef termStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000

    class Start,TextStart,TextDelta,TextEnd,ThinkStart,ThinkDelta,ThinkEnd,ToolStart,ToolDelta,ToolEnd eventStyle
    class Done,Error termStyle
```

### StreamErrorKind

Adapters can attach a `StreamErrorKind` to an `Error` event so the agent loop can classify errors structurally instead of relying on string matching on `error_message`.

| Variant | Meaning |
|---|---|
| `Throttled` | The provider throttled the request (HTTP 429 / rate limit). |
| `ContextWindowExceeded` | The request exceeded the model's context window. |
| `Auth` | Authentication or authorization failure (HTTP 401/403). |
| `Network` | Transient network or server error (connection drop, 5xx, etc.). |

### Error Constructor Helpers

`AssistantMessageEvent` provides five constructor helpers for adapters. All set `stop_reason: StopReason::Error` and `usage: None`.

| Constructor | `error_kind` | Use case |
|---|---|---|
| `error(message)` | `None` | Generic error; agent loop falls back to string-based classification. |
| `error_throttled(message)` | `Some(Throttled)` | Rate-limit / HTTP 429 errors. |
| `error_context_overflow(message)` | `Some(ContextWindowExceeded)` | Context window exceeded; triggers context compaction. |
| `error_auth(message)` | `Some(Auth)` | Authentication failure; non-retryable. |
| `error_network(message)` | `Some(Network)` | Transient network/server error; retryable. |

---

## L3 β€” ProxyStreamFn Architecture

The proxy strips the full partial message from delta events to reduce bandwidth. The client reconstructs it locally by accumulating deltas into a `partial: AssistantMessage`.

```mermaid
flowchart TB
    subgraph ProxyServer["πŸ–₯️ Proxy Server (external)"]
        ServerRecv["Receive POST /v1/stream"]
        ServerAuth["Verify Bearer token"]
        ServerForward["Forward to LLM Provider"]
        ServerSSE["Stream SSE response<br/>(partial field stripped)"]
    end

    subgraph ProxyClient["πŸ”€ ProxyStreamFn (harness)"]
        HTTPPost["POST /v1/stream<br/>(model + context + options)"]
        SSERead["Read SSE stream<br/>(eventsource-stream)"]
        ParseEvent["Parse SseEventData JSON"]
        Accumulate["Accumulate into<br/>partial: AssistantMessage"]
        EmitEvent["Emit AssistantMessageEvent<br/>(with partial attached)"]
    end

    subgraph Output["πŸ“€ Output"]
        HarnessStream["Stream&lt;AssistantMessageEvent&gt;<br/>consumed by run_loop"]
    end

    HTTPPost --> ServerRecv
    ServerRecv --> ServerAuth
    ServerAuth --> ServerForward
    ServerForward --> ServerSSE
    ServerSSE --> SSERead
    SSERead --> ParseEvent
    ParseEvent --> Accumulate
    Accumulate --> EmitEvent
    EmitEvent --> HarnessStream

    classDef serverStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000
    classDef clientStyle fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
    classDef outputStyle fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000

    class ServerRecv,ServerAuth,ServerForward,ServerSSE serverStyle
    class HTTPPost,SSERead,ParseEvent,Accumulate,EmitEvent clientStyle
    class HarnessStream outputStyle
```

---

## L3 β€” OllamaStreamFn Architecture

The Ollama adapter connects to Ollama's `/api/chat` endpoint, which streams newline-delimited JSON (NDJSON) rather than SSE. Each line is a self-contained JSON object with a `message` field and a `done` boolean. The adapter maintains a state machine that tracks open content blocks (thinking, text, tool calls) and emits the same `AssistantMessageEvent` protocol that `ProxyStreamFn` produces.

```mermaid
flowchart TB
    subgraph OllamaServer["πŸ–₯️ Ollama Server"]
        ServerRecv["Receive POST /api/chat"]
        ServerInfer["Run model inference"]
        ServerNDJSON["Stream NDJSON response<br/>(one JSON object per line)"]
    end

    subgraph OllamaClient["πŸ”Œ OllamaStreamFn (adapters crate)"]
        HTTPPost["POST /api/chat<br/>(model + messages + tools)"]
        NDJSONRead["Read NDJSON stream<br/>(chunked HTTP body)"]
        ParseChunk["Parse OllamaChatChunk"]
        StateMachine["State Machine<br/>(track open blocks:<br/>thinking, text, tool calls)"]
        EmitEvent["Emit AssistantMessageEvent<br/>(start/delta/end)"]
    end

    subgraph Output["πŸ“€ Output"]
        HarnessStream["Stream&lt;AssistantMessageEvent&gt;<br/>consumed by run_loop"]
    end

    HTTPPost --> ServerRecv
    ServerRecv --> ServerInfer
    ServerInfer --> ServerNDJSON
    ServerNDJSON --> NDJSONRead
    NDJSONRead --> ParseChunk
    ParseChunk --> StateMachine
    StateMachine --> EmitEvent
    EmitEvent --> HarnessStream

    classDef serverStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000
    classDef clientStyle fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef outputStyle fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000

    class ServerRecv,ServerInfer,ServerNDJSON serverStyle
    class HTTPPost,NDJSONRead,ParseChunk,StateMachine,EmitEvent clientStyle
    class HarnessStream outputStyle
```

**Key differences from `ProxyStreamFn`:**

| Aspect | `ProxyStreamFn` (SSE) | `OllamaStreamFn` (NDJSON) |
|---|---|---|
| Transport | SSE with named event types | Newline-delimited JSON |
| Parsing library | `eventsource-stream` | Custom `ndjson_lines` splitter |
| Message reconstruction | Accumulates deltas into a `partial: AssistantMessage` | State machine tracks open blocks, emits events directly |
| Tool call delivery | Streamed as incremental JSON fragments | Delivered as complete objects in a single chunk |
| Authentication | Bearer token header | None (local server) |
| Cost tracking | Provider-dependent | Always zero (local inference) |
| Thinking support | Depends on upstream proxy | Streaming thinking blocks supported |

---

## L3 β€” AnthropicStreamFn Architecture

**Source file:** `adapters/src/anthropic.rs` (~795 lines)

The Anthropic adapter connects directly to the Anthropic Messages API at `POST /v1/messages`. It handles the full Anthropic SSE event protocol, including thinking blocks with budget management and signature extraction.

**Key features:**

- **Authentication:** Uses `x-api-key` header (not Bearer token) per Anthropic API convention.
- **Thinking blocks:** Supports extended thinking with budget management. When thinking is enabled, temperature is forced to `1` as required by the Anthropic API.
- **Signature extraction:** Extracts thinking block signatures from `ThinkingEnd` events for downstream verification.
- **Message conversion:** Uses the `MessageConverter` trait (from `adapters/src/convert.rs`) to transform harness messages into Anthropic's expected format.
- **Tracing:** Uses `tracing` crate for structured debug, warn, and error logging throughout the streaming lifecycle.

```mermaid
flowchart TB
    subgraph AnthropicServer["πŸ–₯️ Anthropic Messages API"]
        ServerRecv["Receive POST /v1/messages"]
        ServerAuth["Verify x-api-key header"]
        ServerInfer["Run model inference"]
        ServerSSE["Stream SSE response<br/>(Anthropic event types)"]
    end

    subgraph AnthropicClient["πŸ”Œ AnthropicStreamFn (adapters crate)"]
        HTTPPost["POST /v1/messages<br/>(model + messages + tools)"]
        SSERead["Read SSE stream"]
        ParseEvent["Parse Anthropic SSE events<br/>(message_start, content_block_start,<br/>content_block_delta, message_delta)"]
        ThinkingMgmt["Thinking Budget Management<br/>(force temperature=1,<br/>extract signatures)"]
        EmitEvent["Emit AssistantMessageEvent<br/>(start/delta/end)"]
    end

    subgraph Output["πŸ“€ Output"]
        HarnessStream["Stream&lt;AssistantMessageEvent&gt;<br/>consumed by run_loop"]
    end

    HTTPPost --> ServerRecv
    ServerRecv --> ServerAuth
    ServerAuth --> ServerInfer
    ServerInfer --> ServerSSE
    ServerSSE --> SSERead
    SSERead --> ParseEvent
    ParseEvent --> ThinkingMgmt
    ThinkingMgmt --> EmitEvent
    EmitEvent --> HarnessStream

    classDef serverStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000
    classDef clientStyle fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef outputStyle fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000

    class ServerRecv,ServerAuth,ServerInfer,ServerSSE serverStyle
    class HTTPPost,SSERead,ParseEvent,ThinkingMgmt,EmitEvent clientStyle
    class HarnessStream outputStyle
```

---

## L3 β€” OpenAiStreamFn Architecture

**Source file:** `adapters/src/openai.rs` (~735 lines)

The OpenAI adapter connects to any OpenAI-compatible API at `POST /v1/chat/completions`. It supports multiple providers that implement the OpenAI chat completions protocol.

**Key features:**

- **Authentication:** Uses standard Bearer token authentication (`Authorization: Bearer <key>`).
- **Multi-provider support:** Works with any OpenAI-compatible endpoint including vLLM, LM Studio, Groq, and Together AI. The base URL is configurable.
- **Tool call streaming:** Accumulates tool call state across multiple SSE chunks. Tool calls arrive as incremental fragments (function name, argument JSON pieces) that are assembled into complete tool calls.
- **Message conversion:** Uses the `MessageConverter` trait (from `adapters/src/convert.rs`) to transform harness messages into OpenAI's chat completions format.
- **Tracing:** Uses `tracing` crate for structured debug, warn, and error logging throughout the streaming lifecycle.

```mermaid
flowchart TB
    subgraph OpenAiServer["πŸ–₯️ OpenAI-Compatible API"]
        ServerRecv["Receive POST /v1/chat/completions"]
        ServerAuth["Verify Bearer token"]
        ServerInfer["Run model inference"]
        ServerSSE["Stream SSE response<br/>(data: [JSON] lines)"]
    end

    subgraph OpenAiClient["πŸ”Œ OpenAiStreamFn (adapters crate)"]
        HTTPPost["POST /v1/chat/completions<br/>(model + messages + tools)"]
        SSERead["Read SSE stream"]
        ParseEvent["Parse OpenAI SSE chunks<br/>(choices[].delta with<br/>content, tool_calls)"]
        ToolAccum["Tool Call State Accumulation<br/>(assemble fragments into<br/>complete tool calls)"]
        EmitEvent["Emit AssistantMessageEvent<br/>(start/delta/end)"]
    end

    subgraph Output["πŸ“€ Output"]
        HarnessStream["Stream&lt;AssistantMessageEvent&gt;<br/>consumed by run_loop"]
    end

    subgraph Providers["🌐 Compatible Providers"]
        OpenAI["OpenAI"]
        vLLM["vLLM"]
        LMStudio["LM Studio"]
        Groq["Groq"]
        Together["Together AI"]
    end

    HTTPPost --> ServerRecv
    ServerRecv --> ServerAuth
    ServerAuth --> ServerInfer
    ServerInfer --> ServerSSE
    ServerSSE --> SSERead
    SSERead --> ParseEvent
    ParseEvent --> ToolAccum
    ToolAccum --> EmitEvent
    EmitEvent --> HarnessStream
    OpenAI --> ServerRecv
    vLLM --> ServerRecv
    LMStudio --> ServerRecv
    Groq --> ServerRecv
    Together --> ServerRecv

    classDef serverStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000
    classDef clientStyle fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef outputStyle fill:#f5f5f5,stroke:#616161,stroke-width:2px,color:#000
    classDef providerStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000

    class ServerRecv,ServerAuth,ServerInfer,ServerSSE serverStyle
    class HTTPPost,SSERead,ParseEvent,ToolAccum,EmitEvent clientStyle
    class HarnessStream outputStyle
    class OpenAI,vLLM,LMStudio,Groq,Together providerStyle
```

---

## L3 β€” MessageConverter Trait and Shared Infrastructure

**Source file:** `adapters/src/convert.rs`

The `MessageConverter` trait provides the shared infrastructure for converting harness messages into provider-specific formats. Each adapter (`AnthropicStreamFn`, `OpenAiStreamFn`, `OllamaStreamFn`) implements this trait to handle the differences in how each provider expects messages, tool definitions, and tool results to be structured.

This keeps provider-specific serialization logic isolated from the streaming machinery, so adding a new provider only requires implementing `MessageConverter` and the `StreamFn` trait.

---

## L3 β€” Adapter Comparison

| Aspect | `ProxyStreamFn` | `OllamaStreamFn` | `AnthropicStreamFn` | `OpenAiStreamFn` |
|---|---|---|---|---|
| Transport | SSE | NDJSON | SSE | SSE |
| Endpoint | `POST /v1/stream` | `POST /api/chat` | `POST /v1/messages` | `POST /v1/chat/completions` |
| Authentication | Bearer token | None (local) | `x-api-key` header | Bearer token |
| Thinking support | Depends on proxy | Streaming thinking blocks | Thinking blocks with budget mgmt, forced temp=1, signature extraction | N/A |
| Tool calls | Streamed fragments | Complete objects | Streamed fragments | Streamed fragments with state accumulation |
| Message conversion | N/A (passthrough) | `MessageConverter` | `MessageConverter` | `MessageConverter` |
| Tracing | N/A | `tracing` crate | `tracing` crate | `tracing` crate |
| Multi-provider | No (single proxy) | No (Ollama only) | No (Anthropic only) | Yes (vLLM, LM Studio, Groq, Together) |

---

## L4 β€” Delta Accumulation Sequence

This sequence shows how the harness reconstructs a complete `AssistantMessage` from individual delta events, including a text block and a tool call block arriving in the same stream.

```mermaid
sequenceDiagram
    participant Provider as LLM Provider / Proxy
    participant Stream as ProxyStreamFn / StreamFn
    participant RunLoop as run_loop

    Provider-->>Stream: Start
    Stream->>RunLoop: emit MessageStart (empty AssistantMessage)

    Provider-->>Stream: TextStart(index=0)
    Provider-->>Stream: TextDelta(index=0, "Hello")
    Provider-->>Stream: TextDelta(index=0, " world")
    Provider-->>Stream: TextEnd(index=0)
    Stream->>RunLoop: emit MessageUpdate(TextDeltaΓ—3)

    Provider-->>Stream: ToolCallStart(index=1, id="c1", name="search")
    Provider-->>Stream: ToolCallDelta(index=1, '{"q":')
    Provider-->>Stream: ToolCallDelta(index=1, '"rust"}')
    Provider-->>Stream: ToolCallEnd(index=1)
    Stream->>RunLoop: emit MessageUpdate(ToolCallDeltaΓ—2)

    Provider-->>Stream: Done(stop_reason=tool_use, usage={…})
    Stream->>RunLoop: emit MessageEnd (finalised AssistantMessage)
    Note over RunLoop: message.content = [Text("Hello world"), ToolCall("search", {q:"rust"})]
```

---

## L4 β€” Proxy Error Handling

Proxy failures are classified into `AgentError` variants based on the nature of the failure. This determines whether the harness will retry the request (via `RetryStrategy`) or surface the error immediately to the caller.

| Failure mode | AgentError variant | Retryable? | Notes |
|---|---|---|---|
| **Connection failure** (proxy unreachable, DNS failure, TCP timeout) | `AgentError::NetworkError` | Yes | Retryable via `RetryStrategy`. |
| **Authentication failure** (invalid/expired bearer token, 401/403 response) | `AgentError::StreamError` | No | Not retryable β€” caller must fix credentials. |
| **SSE stream drop** (connection lost mid-stream) | `AgentError::NetworkError` | Yes | The harness does not attempt partial message recovery β€” the entire turn is retried. |
| **Proxy timeout** (proxy returns 504 or similar gateway timeout) | `AgentError::NetworkError` | Yes | Retryable via `RetryStrategy`. |
| **Malformed SSE event** (unparseable JSON in event data) | `AgentError::StreamError` | No | Not retryable β€” indicates a proxy bug. |
| **Rate limiting from proxy** (429 response from the proxy itself) | `AgentError::ModelThrottled` | Yes | Retryable via `RetryStrategy`. |

```mermaid
flowchart TB
    subgraph Failures["πŸ”΄ Proxy Failure Modes"]
        ConnFail["Connection Failure<br/>(unreachable, DNS, TCP timeout)"]
        AuthFail["Auth Failure<br/>(401 / 403)"]
        StreamDrop["SSE Stream Drop<br/>(mid-stream disconnect)"]
        ProxyTimeout["Proxy Timeout<br/>(504 gateway timeout)"]
        Malformed["Malformed Event<br/>(unparseable JSON)"]
        RateLimit["Rate Limited<br/>(429 from proxy)"]
    end

    subgraph ErrorTypes["⚠️ AgentError Mapping"]
        NetErr["NetworkError<br/>(retryable)"]
        StreamErr["StreamError<br/>(not retryable)"]
        Throttled["ModelThrottled<br/>(retryable)"]
    end

    ConnFail --> NetErr
    StreamDrop --> NetErr
    ProxyTimeout --> NetErr
    AuthFail --> StreamErr
    Malformed --> StreamErr
    RateLimit --> Throttled

    classDef failStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    classDef errStyle fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000

    class ConnFail,AuthFail,StreamDrop,ProxyTimeout,Malformed,RateLimit failStyle
    class NetErr,StreamErr,Throttled errStyle
```