phi-core 0.10.0

Simple, effective agent loop with tool execution and event streaming
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
<!-- Last verified: 2026-05-16 by Claude Code -->
# phi-core — System Architecture

## 1. Component Map

### Agent trait + BasicAgent (`src/agents/`)
**Responsibility:** `Agent` (trait, `agents/agent.rs`) defines the runtime interface — prompting, state access, control, and steering queues. `BasicAgent` (struct, `agents/basic_agent.rs`) is the default in-memory implementation: owns the conversation, tools, and `ModelConfig` (provider identity), and is the application-facing entry point. Construction: `BasicAgent::new(ModelConfig::anthropic(...))`. The optional `provider_override` field bypasses `ProviderRegistry` for custom or test providers. `SubAgentTool` (`agents/sub_agent.rs`) implements `AgentTool` to delegate tasks to a child `agent_loop()`.
**Public interface:**
- `prompt(text)` — Send a text prompt; returns an event stream receiver.
- `prompt_messages(messages)` — Send one or more messages as a prompt; returns an event stream receiver.
- `prompt_with_sender(text, tx)` — Send a text prompt, streaming events to a caller-provided sender.
- `continue_loop()` — Resume from existing context with `ContinuationKind::Default`; returns an event stream receiver.
- `continue_loop_with_sender(tx, kind)` — Resume with an explicit `ContinuationKind` (`Default`, `Rerun { tag }`, or `Branch { tag }`), streaming events to a caller-provided sender.
- `steer(msg)` — Queue a message that will be injected mid-run between tool executions.
- `follow_up(msg)` — Queue a message to be processed after the agent would otherwise stop.
- `abort()` — Cancel the in-progress run by signalling the cancellation token.
- `reset()` — Clear messages, queues, streaming state, and cancel token to return the agent to its initial state.
- `save_messages()` — Serialize the current conversation to a JSON string.
- `restore_messages(json)` — Replace the current conversation with messages deserialized from a JSON string.
- `with_skills(skill_set)` — Load skills and append their XML index to the system prompt per the AgentSkills standard.
- `with_mcp_server_stdio(cmd, args, env)` — Connect to an MCP server by spawning a child process and add its tools to the agent.
- `with_mcp_server_http(url)` — Connect to an MCP server via HTTP and add its tools to the agent.
- `with_openapi_file(path, config, filter)` — Load tools from an OpenAPI spec file and add them to the agent.
- `with_openapi_url(url, config, filter)` — Fetch an OpenAPI spec from a URL and add its tools to the agent.
- `with_openapi_spec(spec_str, config, filter)` — Parse an OpenAPI spec string (JSON or YAML) and add its tools to the agent.
- `new_session()` — Immediately rotate to a new `session_id`; resets loop counters and `last_loop_id`; returns the new session id.
- `check_and_rotate(threshold)` — Rotate to a new session if the agent has been idle longer than `threshold` since the last `prompt_*` call; returns `Some(new_session_id)` on rotation, `None` otherwise.

**BasicAgent state relevant to session management:**

| Field | Type | Description |
|---|---|---|
| `agent_id` | `String` | Stable identifier across all sessions for this instance |
| `session_id` | `String` | Current session identifier; updated by `new_session()` |
| `loop_counters` | `HashMap<String, usize>` | Per-config loop counter; cleared on `new_session()` |
| `last_loop_id` | `Option<String>` | Most recent loop; cleared on `new_session()` |
| `last_active_at` | `Option<DateTime<Utc>>` | Timestamp of last `prompt_*` call; used by `check_and_rotate()` |

### AgentLoop (`src/agent_loop/`)
**Responsibility:** The core execution engine. Manages the turn loop, tool dispatch, steering injection, follow-up processing, and lifecycle event emission.
**Public interface:**
- `agent_loop(prompts, context, config, tx, cancel)` — Start an agent run from new prompt messages, applying input filters, emitting lifecycle events, and returning all new messages produced.
- `agent_loop_continue(context, config, tx, cancel)` — Resume from existing context (no new prompts); used for retries after errors or mid-conversation continuation.
- `agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult` — Run N `AgentLoopConfig`s concurrently and evaluate results via `EvaluationStrategy`. When `prompts` is non-empty, each branch uses `agent_loop`; when `prompts` is empty, each branch uses `agent_loop_continue` (the user query is already the last message in `base_context`). `base_context` is cloned per branch (tools `Arc`-shared; message history deep-copied). All branches share the same `session_id`; each gets a distinct `loop_id`. `ParallelLoopOutcome.original_context_len` marks the base/branch message boundary. Emits `ParallelLoopStart`/`ParallelLoopEnd` events. `selected_context` feeds into `agent_loop_continue()` for normal session resumption.
- `derive_config_segment(config) -> String` *(pub crate)* — Derives the stable `{config_segment}` portion of a `loop_id` from `config.config_id` or provider/model/thinking fields.

### EvaluationLoop (`src/agent_loop/evaluation.rs`)
**Responsibility:** Pluggable strategy for selecting among parallel loop outcomes. Decoupled from `src/agent_loop/` to allow custom implementations without a circular dependency (trait is defined in `src/types/`; implementations live here).
**Public interface:**
- `EvaluationStrategy` *(trait, defined in `src/types/`)* — `evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage)`
- `EvaluationDecision` *(enum, defined in `src/types/`)* — `Select(usize)` — 0-based index of the winning outcome.
- `ParallelLoopOutcome.original_context_len: usize` — Number of messages in the cloned context at dispatch time. Allows strategies to split "original context" from "new branch output" messages without separate bookkeeping. Identical across all outcomes (same base context); `outcomes[0]` is the idiomatic source.
- `TransparentEvaluation` — Single-branch pass-through; panics if `> 1` outcome.
- `PickFirstEvaluation` — Always selects index 0. Useful for testing.
- `TokenEfficientEvaluation` — Selects the outcome with the lowest total token usage.
- `ElaborateEvaluation` — Selects the outcome with the highest total token usage.
- `LlmJudgeEvaluation { judge_config, system_prompt }` — Runs a separate LLM call to select the best branch. Supports both `agent_loop` mode (query from `prompts`) and `agent_loop_continue` mode (query extracted from last `Message::User` in `context.messages[..original_context_len]`). Includes prior conversation context in the judge prompt. Applies **2-iteration compaction**: Iteration 1 compacts only prior context (3 tiers: tail-truncate → paragraph-summary → hard char limit), keeping outputs intact; Iteration 2 (if needed) compacts both context and outputs independently through the same tier pipeline. Budget derived from `judge_config.context_config.max_context_tokens`. Emits a `ProgressMessage` warning if comprehension criteria cannot be satisfied after iteration 2.

### ContextManager (`src/context/`)
**Responsibility:** Token estimation, tiered context compaction, and execution limit tracking.
**Public interface:**
- `estimate_tokens(text)` — Rough token count heuristic: ~4 characters per token.
- `compact_messages(messages, config)` — Reduce message list to fit token budget using a tiered strategy: truncate tool outputs → summarize old turns → drop middle messages.
- `CompactionStrategy` *(trait)* — Interface for custom compaction logic; default implementation uses the tiered cascade (legacy `compact_messages()`; modern: CompactionBlock overlays).
- `ContextTracker` — Tracks context window usage by combining provider-reported token counts with local estimates for recent messages.
- `ExecutionTracker` — Tracks turns, cumulative tokens, and elapsed time against configured limits; signals when any limit is exceeded.
- `ContextConfig` — Tuning knobs for compaction: token budget, system-prompt overhead, head/tail message preservation counts, per-tool-output line limit.
- `ExecutionLimits` — Hard caps on agent execution: max turns, max total tokens, max wall-clock duration.

### ProviderRegistry (`src/provider/registry.rs`, `src/provider/mod.rs`)
**Responsibility:** Dispatches `StreamConfig` to the correct provider implementation based on `model_config.api: ApiProtocol`. Built inline per `agent_loop()` call; zero allocation for a registry with all built-in providers pre-registered.
**Public interface:**
- `ProviderRegistry::default()` — Pre-registers all 7 built-in providers; used automatically by `agent_loop()` when `AgentLoopConfig.provider_override` is `None`.
- `ProviderRegistry::new()` — Create an empty registry for custom provider sets.
- Provider resolution: `model_config.api` selects the wire-protocol handler; `model_config` fields (`id`, `api_key`, `base_url`, `compat`, etc.) differentiate services within the same protocol.

### StreamProvider implementations (`src/provider/`)
**Responsibility:** Translate the unified `StreamConfig` into provider-specific HTTP requests and parse streaming responses back into `StreamEvent`s.
**Providers:** `AnthropicProvider`, `OpenAiCompatProvider` (15+ backends), `OpenAiResponsesProvider`, `AzureOpenAiProvider`, `GoogleProvider`, `GoogleVertexProvider`, `BedrockProvider`, `MockProvider`.
**Public interface:**
- `StreamProvider::stream(config, tx, cancel) -> Result<Message, ProviderError>` — stream a single LLM response.
- `StreamProvider::provider_id() -> &str` — stable lowercase identifier for this provider (e.g. `"anthropic"`, `"openai"`, `"google"`, `"bedrock"`). Used as the first segment of the auto-derived `config_id` in `loop_id` construction.

### ToolSystem (`src/tools/`)
**Responsibility:** Built-in tool implementations. Each implements `AgentTool`.
**Tools:** `BashTool` (shell execution), `ReadFileTool` (text + image files), `WriteFileTool` (create/overwrite), `EditFileTool` (surgical search/replace), `ListFilesTool` (directory listing), `SearchTool` (grep/ripgrep).
**Public interface:**
- `default_tools()` — Returns the standard built-in toolset: bash, read-file, write-file, edit-file, list-files, search.
- `AgentTool::name()` — Unique tool identifier used in LLM tool-use calls and event correlation.
- `AgentTool::label()` — Human-readable display name for UI.
- `AgentTool::description()` — Free-text description sent to the LLM to explain when to use the tool.
- `AgentTool::parameters_schema()` — JSON Schema object describing the tool's accepted parameters.
- `AgentTool::execute(params, ctx)` — Run the tool with resolved parameters and a context carrying the cancellation token and progress callbacks.

### SubAgentTool (`src/agents/sub_agent.rs`)
**Responsibility:** Implements `AgentTool` to delegate tasks to a child `agent_loop()` with isolated context, its own toolset, and a turn limit. The child gets its own `agent_id`, `session_id`, and `loop_id`; its `parent_loop_id` is linked back to the calling loop via `with_parent_loop_id`.
**Public interface:**
- `SubAgentTool::new(name, model_config).with_*(...)` — Construct a sub-agent tool with its own `ModelConfig` (provider identity), system prompt, toolset, and turn limit, then register it as an `AgentTool`.
- `SubAgentTool::with_provider_override(provider)` — Bypass `ProviderRegistry` dispatch; used in tests to inject `MockProvider`.
- `SubAgentTool::with_parent_loop_id(loop_id)` — Supply the parent loop's `loop_id` so the child `AgentStart` event carries `parent_loop_id`, enabling ancestry tracing across the event stream.

### SkillSystem (`src/context/skills.rs`)
**Responsibility:** Loads `SKILL.md` files from one or more directories, parses YAML frontmatter, and formats them as an XML index injected into the system prompt.
**Public interface:**
- `SkillSet::load(dirs)` — Load skills from multiple directories; later entries override earlier ones on name conflict.
- `SkillSet::load_dir(dir, source)` — Load skills from a single directory, tagging each with a source label.
- `SkillSet::merge(other)` — Merge another `SkillSet` in; the other's skills override on name conflict.
- `SkillSet::format_for_prompt()` — Render the skill list as an `<available_skills>` XML block ready for system-prompt injection.

### McpClient (`src/mcp/`)
**Responsibility:** MCP client that connects to external tool servers over stdio or HTTP. Adapts discovered tools into `AgentTool` instances.
**Public interface:**
- `McpClient::connect_stdio(cmd, args, env)` — Spawn a child process, complete the JSON-RPC initialize handshake, and return a connected client.
- `McpClient::connect_http(url)` — Connect to an HTTP-based MCP server and complete the initialize handshake.
- `McpToolAdapter::from_client(client)` — Query the server for available tools and return one `AgentTool` adapter per tool.

### OpenApiAdapter (`src/openapi/`, feature-gated)
**Responsibility:** Parses OpenAPI 3.x specs and generates one `AgentTool` per operation. Each tool makes an HTTP request to the spec's base URL.
**Public interface:**
- `from_file(path, config, filter)` — Parse an OpenAPI spec from a local file and return one tool adapter per matching operation.
- `from_url(url, config, filter)` — Fetch an OpenAPI spec over HTTP and return one tool adapter per matching operation.
- `from_str(spec, config, filter)` — Parse an OpenAPI spec from an in-memory string (auto-detects JSON vs YAML) and return one tool adapter per matching operation.
**Availability:** Only compiled when the `openapi` feature flag is enabled.

### SessionStore (`src/session/`)
**Responsibility:** Persistent session layer. Records every `AgentEvent` into a structured tree of `Session` + `LoopRecord` objects, and provides both free-function and trait-based APIs for flat JSON-file persistence.
**Public interface:**
- `SessionRecorder::new(config)` — Create a recorder; call `on_event(event)` for every event on the agent's `tx` channel.
- `SessionRecorder::flush()` — Finalize all open loops (status → `Aborted`) and move them into their sessions.
- `SessionRecorder::drain_completed()` — Consume and return all completed sessions.
- `SessionRecorder::sessions()` — Iterate all known sessions (completed + in-progress).
- `SessionRecorder::get_session(id)` — Look up a session by `session_id`.
- `SessionRecorder::current_loop(id)` — Look up an in-progress `LoopRecord` by `loop_id`.
- `save_session(session, dir)` — Write `{dir}/{session_id}.json` (creates `dir` if needed). Atomic via tmp-file + rename.
- `load_session(session_id, dir)` — Read `{dir}/{session_id}.json`.
- `list_session_ids(dir)` — List all session ids in `dir`, newest first.
- `load_sessions_for_agent(agent_id, dir)` — Load all sessions matching `agent_id`.
- `delete_session(session_id, dir)` — Remove `{dir}/{session_id}.json`.
- `SessionStore` trait — async `save` / `load` / `list_ids` / `delete` / `list_for_agent` for callers that want a pluggable store (custom backends, mocks). *(Added 0.7.0)*
- `FileSystemSessionStore::new(dir)` — In-tree async impl of `SessionStore`. Adds advisory `fs2` exclusive lock on save (returns `SessionError::Locked` if a concurrent writer holds it). *(Added 0.7.0)*
**File format:** Pretty-printed JSON. Flat directory — one file per session, no index. Writes are atomic (tmp + rename) regardless of API surface used.

### RetryEngine (`src/provider/retry.rs`)
**Responsibility:** Computes exponential-backoff delay with ±20% jitter. Classifies which errors are retryable.
**Public interface:**
- `RetryConfig` — Parameters for automatic retry: initial delay, backoff multiplier, max delay, max attempt count.
- `RetryConfig::delay_for_attempt(attempt)` — Compute the sleep duration before attempt N using exponential backoff with ±20% jitter.
- `is_retryable()` *(on `ProviderError`)* — Returns true only for `RateLimited` and `Network` variants; all other errors fail immediately.
- `retry_after()` *(on `ProviderError`)* — Extracts the server-specified retry delay from a `RateLimited { retry_after_ms: Some(...) }` error, if present.

---

## 2. Dependency Graph

```mermaid
graph TD
    App["Application Code"] --> Agent
    Agent --> AgentLoop["AgentLoop\nagent_loop/"]
    AgentLoop --> ContextManager["ContextManager\ncontext/"]
    AgentLoop --> ProviderRegistry["Provider\ntraits.rs / registry.rs"]
    AgentLoop --> ToolSystem["ToolSystem\ntools/"]
    AgentLoop --> RetryEngine["RetryEngine\nprovider/retry.rs"]
    ProviderRegistry --> Anthropic["AnthropicProvider"]
    ProviderRegistry --> OpenAI["OpenAiCompatProvider\n(15+ backends)"]
    ProviderRegistry --> OpenAIResp["OpenAiResponsesProvider"]
    ProviderRegistry --> Azure["AzureOpenAiProvider"]
    ProviderRegistry --> Google["GoogleProvider"]
    ProviderRegistry --> Vertex["GoogleVertexProvider"]
    ProviderRegistry --> Bedrock["BedrockProvider"]
    ProviderRegistry --> Mock["MockProvider\n(tests)"]
    Agent --> SkillSystem["SkillSystem\ncontext/skills.rs"]
    Agent --> McpClient["McpClient\nmcp/"]
    Agent --> OpenApiAdapter["OpenApiAdapter\nopenapi/ (feature)"]
    McpClient --> ToolSystem
    OpenApiAdapter --> ToolSystem
    SubAgent["SubAgentTool\nsub_agent.rs"] --> AgentLoop
    ToolSystem --> SubAgent
    Types["types/\n(shared types)"] --> Agent
    Types --> AgentLoop
    Types --> ToolSystem
    Types --> ProviderRegistry
    SessionStore["SessionStore\nsession/"] --> Types
    App --> SessionStore
```

---

## 3. Data Flow

### 3.1 Simple Text Prompt (no tool calls)

```mermaid
sequenceDiagram
    participant App
    participant Agent
    participant AgentLoop
    participant Provider
    participant EventCh as EventChannel

    App->>Agent: prompt("What is 2+2?")
    Agent->>AgentLoop: agent_loop(prompts, context, config, tx, cancel)
    AgentLoop->>EventCh: AgentStart
    AgentLoop->>EventCh: TurnStart
    AgentLoop->>EventCh: MessageStart (user)
    AgentLoop->>EventCh: MessageEnd (user)
    AgentLoop->>Provider: stream(StreamConfig)
    Provider-->>EventCh: StreamEvent::Start
    Provider-->>EventCh: StreamEvent::TextDelta x N
    Provider-->>EventCh: StreamEvent::Done(Message)
    AgentLoop->>EventCh: MessageStart (assistant placeholder)
    AgentLoop->>EventCh: MessageUpdate x N (deltas)
    AgentLoop->>EventCh: MessageEnd (assistant final)
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd(messages)
    App->>EventCh: receives events via rx.recv()
```

### 3.2 Tool Call Cycle

```mermaid
sequenceDiagram
    participant AgentLoop
    participant Provider
    participant BashTool
    participant EventCh as EventChannel

    AgentLoop->>Provider: stream(config with tool defs)
    Provider-->>AgentLoop: Done(Message{stop_reason: ToolUse, content: [ToolCall{...}]})
    AgentLoop->>EventCh: TurnEnd(assistant message)
    AgentLoop->>AgentLoop: extract tool_calls from assistant content
    AgentLoop->>EventCh: ToolExecutionStart(id, name, args)
    AgentLoop->>BashTool: execute(params, ToolContext)
    BashTool-->>EventCh: ProgressMessage (via on_progress callback)
    BashTool-->>AgentLoop: Ok(ToolResult)
    AgentLoop->>EventCh: ToolExecutionEnd(id, name, result, is_error=false)
    AgentLoop->>EventCh: MessageStart(ToolResult message)
    AgentLoop->>EventCh: MessageEnd(ToolResult message)
    AgentLoop->>AgentLoop: append tool results to context.messages
    AgentLoop->>Provider: stream(config, now includes tool results)
    Provider-->>AgentLoop: Done(Message{stop_reason: Stop})
    AgentLoop->>EventCh: TurnEnd
    AgentLoop->>EventCh: AgentEnd
```

### 3.3 Context Compaction Trigger

```mermaid
sequenceDiagram
    participant AgentLoop
    participant ContextManager
    participant Provider

    AgentLoop->>ContextManager: compact(messages, config)
    ContextManager->>ContextManager: total_tokens(messages) > budget?
    alt Level 1 fits
        ContextManager-->>AgentLoop: truncated tool outputs
    else Level 2 fits
        ContextManager-->>AgentLoop: old turns summarized
    else Level 3
        ContextManager-->>AgentLoop: first + recent kept, middle dropped
    end
    AgentLoop->>Provider: stream(config with compacted messages)
```

### 3.4 Sub-Agent Delegation

```mermaid
sequenceDiagram
    participant ParentLoop as Parent AgentLoop
    participant SubAgentTool
    participant ChildLoop as Child AgentLoop
    participant ChildProvider as Provider

    ParentLoop->>SubAgentTool: execute({task: "..."}, ToolContext)
    SubAgentTool->>SubAgentTool: build AgentContext with child identity<br/>(new agent_id, session_id, loop_id="{child_session}.sub.1",<br/>parent_loop_id = parent's loop_id)
    SubAgentTool->>ChildLoop: agent_loop([task_prompt], context, config, tx, cancel)
    ChildLoop->>ChildLoop: emit AgentStart{loop_id, parent_loop_id}
    ChildLoop->>ChildProvider: stream(...)
    ChildProvider-->>ChildLoop: streaming events
    ChildLoop-->>SubAgentTool: Vec<AgentMessage> (final messages)
    SubAgentTool->>SubAgentTool: extract_final_text(messages)
    SubAgentTool-->>ParentLoop: Ok(ToolResult{text, child_loop_id: Some(loop_id)})
    Note over ParentLoop: ToolExecutionEnd{child_loop_id} emitted<br/>→ parent stream records child ancestry
```

---

## 4. Data Models

### Content
```
Entity: Content (enum)
  Variant Text:
    text: String               [the text content]
  Variant Image:
    data: String               [base64-encoded binary]
    mime_type: String          [e.g. "image/png", "image/jpeg"]
  Variant Thinking:
    thinking: String           [internal reasoning text]
    signature: Option<String>  [provider-specific thinking signature, optional]
  Variant ToolCall:
    id: String                 [unique call ID, e.g. UUID]
    name: String               [tool name matching AgentTool::name()]
    arguments: JSON            [parameter values matching tool's JSON Schema]

Serialization: tagged by "type" field ("text", "image", "thinking", "toolCall")
```

### Message
```
Entity: Message (enum)
  Variant User:
    content: Vec<Content>      [usually a single Text block]
    timestamp: u64             [unix milliseconds]
  Variant Assistant:
    content: Vec<Content>      [text, thinking, tool call blocks]
    stop_reason: StopReason    [why the model stopped]
    model: String              [model ID returned by provider]
    provider: String           [provider name, e.g. "anthropic"]
    usage: Usage               [token counts for this turn]
    timestamp: u64             [unix milliseconds]
    error_message: Option<String>  [set when stop_reason == Error]
  Variant ToolResult:
    tool_call_id: String       [matches Content::ToolCall.id]
    tool_name: String          [matches Content::ToolCall.name]
    content: Vec<Content>      [tool output, usually a Text block]
    is_error: bool             [true if tool execution failed]
    timestamp: u64             [unix milliseconds]

Lifecycle: User messages are created by the caller. Assistant messages are
           created by the provider after streaming completes. ToolResult messages
           are created by the agent loop after tool execution.
```

### AgentMessage
```
Entity: AgentMessage (enum, untagged)
  Variant Llm(LlmMessage)       [sent to the LLM; user/assistant/toolResult roles; LlmMessage wraps Message + Option<TurnId>]
  Variant Extension(ExtensionMessage)  [not sent to LLM; app-only metadata]

Note: stored in Agent.messages and AgentContext.messages
      Extension messages are filtered out before LLM calls
```

### ExtensionMessage
```
Entity: ExtensionMessage
  role: String        [always "extension"]
  kind: String        [app-defined event type, e.g. "ui_update"]
  data: JSON          [arbitrary app-defined payload]
```

### StopReason
```
Entity: StopReason (enum)
  Stop      -> model completed naturally
  Length    -> max_tokens limit hit
  ToolUse   -> model returned tool calls (loop must continue)
  Error     -> provider or streaming error occurred
  Aborted   -> cancellation token was triggered

Serialization: camelCase ("stop", "length", "toolUse", "error", "aborted")
```

### Usage
```
Entity: Usage
  input: u64          [prompt tokens processed]
  output: u64         [completion tokens generated]
  cache_read: u64     [tokens served from prompt cache]
  cache_write: u64    [tokens written to prompt cache]
  total_tokens: u64   [sum, may be 0 if not reported]

Derived: cache_hit_rate() = cache_read / (input + cache_read + cache_write)
```

### AgentEvent
```
Entity: AgentEvent (enum, #[serde(tag = "type")])

Every variant except AgentStart, ParallelLoopStart, and ParallelLoopEnd now carries
loop_id: String so that events from concurrent parallel branches can be reliably
attributed to the correct LoopRecord even when they are interleaved on one tx channel.

  AgentStart {
    agent_id:          String                    [stable agent instance identifier]
    session_id:        String                    [groups all loops in one session]
    loop_id:           String                    ["{session_id}.{config_id}.{N}" — unique per call]
    parent_loop_id:    Option<String>            [None for origin calls; Some for continuations/sub-agents]
    continuation_kind: Option<ContinuationKind>  [None=origin; Some(Default/Rerun/Branch)=continuation]
    timestamp:         DateTime<Utc>
    metadata:          Option<JSON>
  }
  AgentEnd {
    loop_id:  String                         [← identifies the loop]
    messages: Vec<AgentMessage>              [all new messages produced by this loop]
    usage:    Usage
    timestamp: DateTime<Utc>
    rejection: Option<String>               [Some if input filter blocked the run]
  }
  TurnStart {
    loop_id:      String
    turn_index:   u32
    timestamp:    DateTime<Utc>
    triggered_by: TurnTrigger               [what caused this turn to begin]
  }
  TurnEnd {
    loop_id:      String
    message:      AgentMessage
    usage:        Usage
    timestamp:    DateTime<Utc>
    tool_results: Vec<Message>
  }
  MessageStart  { loop_id: String, message }            [message streaming began]
  MessageUpdate { loop_id: String, message, delta }     [content delta arrived]
  MessageEnd    { loop_id: String, message }            [message complete]
  ToolExecutionStart  { loop_id: String, tool_call_id, tool_name, args }
  ToolExecutionUpdate { loop_id: String, tool_call_id, tool_name, partial_result }
  ToolExecutionEnd {
    loop_id:       String
    tool_call_id:  String
    tool_name:     String
    result:        ToolResult
    is_error:      bool
    child_loop_id: Option<String>           [Some only when tool spawned a sub-agent loop]
  }
  ProgressMessage { loop_id: String, tool_call_id, tool_name, text }
  InputRejected   { loop_id: String, reason }           [input filter blocked the prompt]
  ParallelLoopStart {                                   [loop_id NOT on this variant]
    session_id: String
    loop_ids:   Vec<String>                 [one loop_id per branch, in config order]
    timestamp:  DateTime<Utc>
  }
  ParallelLoopEnd {                                     [loop_id NOT on this variant]
    session_id:             String
    selected_loop_id:       String
    selected_config_index:  usize
    evaluation_usage:       Usage
    timestamp:              DateTime<Utc>
  }
```

### StreamDelta
```
Entity: StreamDelta (enum)
  Text { delta: String }              [text content chunk]
  Thinking { delta: String }          [thinking content chunk]
  ToolCallDelta { delta: String }     [tool call argument chunk]
```

### ToolContext
```
Entity: ToolContext
  tool_call_id: String               [for correlation with AgentEvent]
  tool_name: String                  [for correlation with AgentEvent]
  cancel: CancellationToken          [check is_cancelled() in long-running tools]
  on_update: Option<ToolUpdateFn>    [callback for streaming partial ToolResults]
  on_progress: Option<ProgressFn>    [callback for user-facing status text]
```

### ContinuationKind
```
Entity: ContinuationKind (enum)
  Default                 [unspecified continuation — preserves legacy semantics]
  Rerun { tag: String }   [retry from an equivalent context; tag is RFC 3339 UTC timestamp]
  Branch { tag: String }  [explore a different path from a branching point; tag is RFC 3339 UTC timestamp]

Set on AgentContext.continuation_kind before calling agent_loop_continue().
Surfaced in AgentStart.continuation_kind (None = origin call).
TurnTrigger semantics:
  Default / Rerun → first turn uses TurnTrigger::Continuation
  Branch          → first turn uses TurnTrigger::Branch
```

### TurnTrigger
```
Entity: TurnTrigger (enum)
  User      [first turn of an agent_loop() origin call with new user prompts]
  SubAgent  [first turn when running as a sub-agent via SubAgentTool]
  Continuation  [subsequent turns; tool round-trip, steering, or Default/Rerun continuation]
  Branch    [first turn of an agent_loop_continue(Branch) call]

Emitted in TurnStart.triggered_by.
Priority on first turn (run_loop):
  1. Branch continuation     → TurnTrigger::Branch
  2. Any other continuation  → TurnTrigger::Continuation
  3. Origin call             → config.first_turn_trigger (User or SubAgent)
Subsequent turns always use TurnTrigger::Continuation.
```

### ToolResult / ToolError
```
Entity: ToolResult
  content:       Vec<Content>    [tool output content blocks]
  details:       JSON            [structured metadata, not sent to LLM, e.g. exit_code]
  child_loop_id: Option<String>  [set by sub-agent tools; None for all other tools]

Entity: ToolError (enum)
  Failed(String)          [general execution failure]
  NotFound(String)        [tool name not in registry]
  InvalidArgs(String)     [parameter validation failed]
  Cancelled               [CancellationToken was triggered]
```

### ContextConfig
```
Entity: ContextConfig
  max_context_tokens: usize     [default: 100,000; total budget including system prompt]
  system_prompt_tokens: usize   [default: 4,000; reserved for system prompt]
  keep_recent: usize            [default: 10; messages always kept in full at tail]
  keep_first: usize             [default: 2; messages always kept at head]
  tool_output_max_lines: usize  [default: 50; L1 compaction per-tool-output limit]

Effective budget = max_context_tokens - system_prompt_tokens
```

### ExecutionLimits / ExecutionTracker
```
Entity: ExecutionLimits
  max_turns: usize              [default: 50; LLM calls before forced stop]
  max_total_tokens: usize       [default: 1,000,000; cumulative token budget]
  max_duration: Duration        [default: 600s; wall-clock time limit]

Entity: ExecutionTracker (runtime state)
  limits: ExecutionLimits       [immutable config]
  turns: usize                  [incremented after each LLM call]
  tokens_used: usize            [cumulative; updated from provider Usage]
  started_at: Instant           [set on construction]
```

### RetryConfig
```
Entity: RetryConfig
  max_retries: usize            [default: 3; 0 = no retries]
  initial_delay_ms: u64         [default: 1,000ms]
  backoff_multiplier: f64       [default: 2.0; exponential growth factor]
  max_delay_ms: u64             [default: 30,000ms; ceiling before jitter]
```

### CacheConfig / CacheStrategy
```
Entity: CacheConfig
  enabled: bool                 [master switch; default: true]
  strategy: CacheStrategy

Entity: CacheStrategy (enum)
  Auto                          [provider places breakpoints automatically]
  Disabled                      [no caching hints sent]
  Manual {
    cache_system: bool          [cache system prompt]
    cache_tools: bool           [cache tool definitions]
    cache_messages: bool        [cache second-to-last message]
  }
```

### StreamConfig (sent to provider)
```
Entity: StreamConfig
  model_config: ModelConfig     [REQUIRED — full provider identity: id, api_key, base_url, compat, cost]
  system_prompt: String
  messages: Vec<Message>        [LLM-only messages, Extension filtered out]
  tools: Vec<ToolDefinition>    [schema-only; no execute functions]
  thinking_level: ThinkingLevel
  max_tokens: Option<u32>       [overrides model_config.max_tokens when Some]
  temperature: Option<f32>
  cache_config: CacheConfig

Note: model identity (id, api_key, base_url, headers, compat) is accessed via
      model_config.id, model_config.api_key, etc. No top-level model or api_key fields.
```

### ToolDefinition (sent to LLM)
```
Entity: ToolDefinition
  name: String              [matches AgentTool::name()]
  description: String       [matches AgentTool::description()]
  parameters: JSON          [JSON Schema object matching AgentTool::parameters_schema()]
```

### Skill / SkillSet
```
Entity: Skill
  name: String              [from YAML frontmatter; skill identifier]
  description: String       [from YAML frontmatter; one-line capability summary]
  file_path: PathBuf        [absolute path to the SKILL.md file]
  base_dir: PathBuf         [absolute path to the skill's directory]
  source: String            [origin label: "dir:0", "dir:1", etc.]

Entity: SkillSet
  skills: Vec<Skill>

Lifecycle: Loaded from disk at startup via SkillSet::load(dirs).
           Formatted as XML via format_for_prompt() and appended to system prompt.
           Agent reads full SKILL.md on-demand when activating a skill via read_file tool.
```

### QueueMode
```
Entity: QueueMode (enum) — controls steering/follow-up queue delivery

  OneAtATime   pop and return exactly one message per call (default)
  All          drain and return all queued messages at once

Used in: Agent.steering_mode, Agent.follow_up_mode
```

### McpToolInfo / McpContent
```
Entity: McpToolInfo — tool metadata returned by MCP server
  name: String                  [tool identifier used in tools/call]
  description: Option<String>   [human-readable description; default empty string]
  inputSchema: JSON             [JSON Schema for the tool's parameters]

Entity: McpContent (enum) — content item in a tool call result
  Variant Text:
    type: "text"
    text: String
  Variant Image:
    type: "image"
    data: String    [base64-encoded]
    mimeType: String

Entity: McpToolCallResult
  content: Vec<McpContent>  [output from the tool]
  isError: bool             [true if the tool reported an error]
```

### OpenApiConfig / OpenApiAuth / OperationFilter
```
Entity: OpenApiConfig — configuration for OpenAPI tool generation
  base_url: Option<String>          [overrides spec servers[0].url; trailing slash stripped]
  auth: OpenApiAuth                 [authentication method]
  custom_headers: Map<String,String> [extra headers added to every request]
  max_response_bytes: usize         [default: 65536 (64KB); response body truncation limit]
  timeout_secs: u64                 [default: 30; per-request timeout]
  name_prefix: Option<String>       [if set, tool names formatted as "{prefix}__{operationId}"]

Entity: OpenApiAuth (enum)
  None                              [no authentication]
  Bearer(token: String)             [Authorization: Bearer {token}]
  ApiKey { header: String, value: String }  [custom header: {header}: {value}]

Note: Bearer token and ApiKey value are redacted as "****" in debug output.

Entity: OperationFilter (enum) — controls which API operations become tools
  All                               [include all operations that have an operationId]
  ByOperationId(Vec<String>)        [include only operations whose id is in the list]
  ByTag(Vec<String>)                [include operations tagged with any listed tag]
  ByPathPrefix(String)              [include operations whose path starts with the prefix]
```

### Session / LoopRecord / SessionRecorder
```
Entity: Session
  session_id:      String
  agent_id:        String
  created_at:      DateTime<Utc>
  last_active_at:  DateTime<Utc>
  formation:       SessionFormation  [Explicit | FirstLoop | InactivityTimeout{..}]
  parent_spawn_ref: Option<SpawnRef> [set when this session was a sub-agent spawn]
  loops:           Vec<LoopRecord>   [ordered by started_at]

Methods: root_loops(), children_of(loop_id), parallel_siblings(loop_id),
         get_loop(loop_id), total_usage()

Entity: LoopRecord
  loop_id:             String
  session_id:          String
  agent_id:            String
  parent_loop_id:      Option<String>
  continuation_kind:   Option<ContinuationKind>
  started_at:          DateTime<Utc>
  ended_at:            Option<DateTime<Utc>>
  status:              LoopStatus          [Pending | Running | Completed | Rejected | Aborted]
  rejection:           Option<String>
  config:              Option<LoopConfigSnapshot>  [model, provider, config_id + name, api, base_url, reasoning, context_window, max_tokens, thinking_level, temperature]
  messages:            Vec<AgentMessage>   [from AgentEnd.messages — authoritative]
  usage:               Usage
  metadata:            Option<JSON>
  events:              Vec<LoopEvent>      [full event stream; MessageUpdate opt-in]
  children_loop_ids:   Vec<String>         [same-session direct children]
  child_loop_refs:     Vec<ChildLoopRef>   [cross-session sub-agent spawn links]
  parallel_group:      Option<ParallelGroupRecord>

Entity: ChildLoopRef — outbound cross-session link on the parent LoopRecord
  tool_call_id:    String
  tool_name:       String
  child_loop_id:   String
  child_session_id: String

Entity: SpawnRef — inbound cross-session link on the child Session
  parent_session_id: String
  parent_loop_id:    String
  tool_call_id:      String
  tool_name:         String

Entity: ParallelGroupRecord
  all_loop_ids:         Vec<String>   [all branch loop_ids in config order]
  selected_loop_id:     String
  selected_config_index: usize
  evaluation_usage:     Usage
  is_selected:          bool          [true only on the winner's LoopRecord]

Entity: SessionRecorderConfig
  formation_policy:         SessionFormationPolicy  [PerSessionId | InactivityTimeout{secs}]
  include_streaming_events: bool                    [default: false — excludes MessageUpdate]
```

---

## 5. Integration Contracts

### Anthropic Messages API
- **Endpoint:** `https://api.anthropic.com/v1/messages`
- **Auth (standard):** `x-api-key: {ANTHROPIC_API_KEY}` + `anthropic-version: 2023-06-01`
- **Auth (OAuth):** `authorization: Bearer {TOKEN}` + beta headers `claude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14`; `x-app: cli`; `anthropic-dangerous-direct-browser-access: true`; `user-agent: claude-cli/2.1.2`
- **Request:** POST JSON with `model`, `system` (array of text blocks), `messages`, `tools`, `max_tokens` (default 8192), `stream: true`
- **Response:** Server-Sent Events stream; events: `message_start`, `content_block_start`, `content_block_delta`, `message_delta`, `message_stop`
- **Tool args:** Streamed as `InputJsonDelta` text fragments; buffered in `arguments["__partial_json"]`; parsed as complete JSON on `content_block_stop`
- **Thinking:** `ThinkingLevel` mapped to `{type:"enabled", budget_tokens: N}` — Minimal→128, Low→512, Medium→2048, High→8192
- **Prompt caching:** `cache_control: {type: "ephemeral"}` placed at system/last-tool-def/second-to-last-message per `CacheStrategy`
- **Content format:** `{type: "text"|"image"|"thinking"|"tool_use"|"tool_result", ...}`
- **Tool results:** Role "user", type "tool_result", fields: `tool_use_id`, `content`, `is_error`

### OpenAI-Compatible APIs (Chat Completions)
- **Endpoints:** `https://api.openai.com/v1/chat/completions` and 14+ compatible bases (xAI/Grok, Groq, Cerebras, Mistral, DeepSeek, etc.)
- **Auth:** `Authorization: Bearer {API_KEY}`
- **Request:** POST JSON with `model`, `messages`, `tools`, `stream: true`, `stream_options: {include_usage: true}`
- **max_tokens field name:** `"max_tokens"` (most) or `"max_completion_tokens"` (OpenAI) — controlled by `OpenAiCompat.max_tokens_field`
- **System prompt:** First message with role `"system"` or `"developer"` (OpenAI) — controlled by `supports_developer_role`
- **Thinking:** `reasoning_effort: "low"|"medium"|"high"` if `supports_reasoning_effort`; response in `delta.reasoning_content` (OpenAI) or `delta.reasoning` (xAI)
- **Response:** SSE stream; each chunk has `choices[0].delta`; tool args in `delta.tool_calls[].function.arguments` (incremental JSON string)

### OpenAI Responses API
- **Endpoint:** `{base_url}/responses`
- **Auth:** `Authorization: Bearer {OPENAI_API_KEY}`
- **System prompt:** `"instructions"` field (not `"messages"`)
- **Message format:** Different from Chat Completions — see Bedrock/Responses comparison below
- **Thinking:** `"reasoning": {effort: "low"|"medium"|"high"}` field
- **SSE events:** `response.output_text.delta`, `response.reasoning.delta`, `response.function_call_arguments.start/delta/done`, `response.completed`

### Azure OpenAI
- **Endpoint:** `{base_url}/responses?api-version=2025-01-01-preview` (base_url pattern: `https://{resource}.openai.azure.com/openai/deployments/{deployment}`)
- **Auth:** `api-key: {AZURE_OPENAI_API_KEY}` header (**not** `Authorization: Bearer`)
- **Request/Response:** Same format as OpenAI Responses API

### Google Generative AI (Gemini)
- **Endpoint:** `{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}`
- **Auth:** API key as URL query parameter `?key=`; no Authorization header
- **System prompt:** `"systemInstruction": {parts: [{text: "..."}]}`
- **Tools:** Single object `{functionDeclarations: [...]}` wrapping all tool definitions
- **Contents:** Role "user" or "model"; ToolResults sent as `{role:"user", parts:[{functionResponse:{name, response:{result: text}}}]}`
- **Tool args:** Delivered **complete** in one event (no streaming deltas); tool IDs auto-generated as `"google-fc-{index}"`
- **Response parsing:** Custom SSE parser (not standard library); splits on `\n\n`, extracts `data: ` line

### Google Vertex AI
- **Endpoint:** `https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent?alt=sse`
- **Auth:** `Authorization: Bearer {OAUTH_TOKEN}` (OAuth2, not API key in URL)
- **Request/Response:** Identical to Google Generative AI; tool IDs generated as `"vertex-fc-{index}"`

### Amazon Bedrock (ConverseStream)
- **Endpoint:** `{base_url}/model/{model}/converse-stream` (base_url: `https://bedrock-runtime.{region}.amazonaws.com`)
- **Auth:** `Authorization: Bearer {token}` or custom headers from `model_config.headers`; minimal SigV4 support
- **System prompt:** `"system"` array: `[{text: "..."}]`
- **Tools:** `toolConfig.tools`: `[{toolSpec: {name, description, inputSchema: {json: schema}}}]`
- **Tool results:** `{toolResult: {toolUseId, content: [...], status: "success"|"error"}}`
- **Streaming format:** Newline-delimited JSON (**not** standard SSE); events: `contentBlockDelta`, `contentBlockStart`, `contentBlockStop`, `messageStop`, `metadata`

### Model Context Protocol (MCP)
- **Protocol:** JSON-RPC 2.0
- **Message types:**
  - Request: `{jsonrpc:"2.0", id:u64, method:String, params:Option<Value>}`
  - Response: `{jsonrpc:"2.0", id:Option<u64>, result:Option<Value>, error:Option<{code:i64,message:String,data?}>}`
  - Request IDs: auto-incremented `AtomicU64` starting at 1

- **Initialization handshake (3 steps):**
  1. Client sends `initialize` with `{protocolVersion:"2024-11-05", capabilities:{}, clientInfo:{name:"phi-core",version:"<pkg>"}}`
  2. Server responds with `{protocolVersion, capabilities:{tools?,resources?,prompts?}, serverInfo:{name,version}}`
  3. Client sends `notifications/initialized` notification (no params; server may ignore id)

- **Tool discovery:** Client sends `tools/list` → server returns `{tools: [{name, description?, inputSchema}]}`
- **Tool execution:** Client sends `tools/call {name, arguments}` → server returns `{content:[{type:"text",text}|{type:"image",data,mimeType}], isError:bool}`

- **Stdio transport:** Spawns child process; newline-delimited JSON over stdin/stdout; `tokio::sync::Mutex` for concurrent access; shutdown: EOF on stdin then kill child
- **HTTP transport:** POST JSON-RPC body to configured URL; stateless (no persistent connection)

- **Tool adapter:** `McpToolAdapter` wraps `McpToolInfo` + `Arc<Mutex<McpClient>>`; optional `prefix` for namespace disambiguation (`{prefix}__{name}`)
- **Error enum:** `Transport(String)`, `Protocol(String)`, `JsonRpc{code,message}`, `Serialization`, `Io`, `ConnectionClosed`

### OpenAPI
- **Spec formats:** OpenAPI 3.x; auto-detected: first non-whitespace char `{` or `[` → JSON, else YAML
- **Sources:** `from_file(path)` (async read), `from_url(url)` (HTTP GET via reqwest), `from_str(text)` (in-memory)
- **Base URL resolution:** `config.base_url` → `spec.servers[0].url` → error if neither set; trailing slashes stripped

- **Parameter classification:**
  - `Path` parameters → URL `{param}` substitution (RFC 3986 percent-encoding); required
  - `Query` parameters → `.query()` chains; optional
  - `Header` parameters → `.header()` chains; optional
  - `Cookie` parameters → skipped (unsupported)
  - `RequestBody` (application/json only) → keyed as `"body"` (or `"_request_body"` on collision); required if `requestBody.required`

- **HTTP execution pipeline (per tool call):**
  1. Validate params is object (or null treated as `{}`)
  2. Substitute path params with percent-encoded values; error if any missing
  3. Build URL: `{base_url}{path}`
  4. Chain `.query()` for query params present in input
  5. Chain `.header()` for header params present in input
  6. Apply auth: `Bearer` → `.bearer_auth()`, `ApiKey` → `.header(header, value)`, `None` → nothing
  7. Apply `custom_headers`
  8. If `has_body`: `.json(params["body"])`
  9. Send request; read full body text; truncate to `max_response_bytes` at UTF-8 boundary
  10. Return: `"{METHOD} {URL} → {STATUS_CODE}\n\n{BODY}"`

- **Operation filter:** `OperationFilter::All|ByOperationId|ByTag|ByPathPrefix`; operations without `operationId` always skipped with warning
- **Tool naming:** Default = `operationId`; with prefix = `{prefix}__{operationId}`

### File System
- **Read:** `tokio::fs::read_to_string` for text (max 1MB), `tokio::fs::read` for images (max 20MB)
- **Write:** `tokio::fs::write` with automatic parent dir creation
- **Edit:** Read → string replace (exact match, once) → write
- **List:** Spawns `find` command via BashTool
- **Search:** Spawns `grep` or `rg` command via BashTool

### Shell
- **Execution:** `tokio::process::Command::new("bash").arg("-c").arg(command)`
- **Timeout:** `tokio::time::sleep` with default 120s, configurable
- **Output capture:** `stdout` + `stderr` piped, truncated at 256KB each
- **Safety:** Deny patterns checked before execution (substring match)
- **Exit code:** Returned in `ToolResult.details.exit_code`; tool always returns Ok (non-zero is not a ToolError)

---

## 6. State Management

### Agent-Level State (in `Agent` struct)

All fields on `Agent`:

| Field | Type | Notes |
|---|---|---|
| `system_prompt` | `String` | Immutable once set; injected into every LLM call |
| `model` | `String` | Model identifier |
| `api_key` | `String` | API authentication key |
| `thinking_level` | `ThinkingLevel` | Off/Minimal/Low/Medium/High |
| `max_tokens` | `Option<u32>` | Max completion tokens |
| `temperature` | `Option<f32>` | Sampling temperature |
| `model_config` | `Option<ModelConfig>` | Provider-specific extras (base_url, headers, compat flags) |
| `messages` | `Vec<AgentMessage>` | Grows on each `prompt()` call; reset by `reset()`; replaced by `restore_messages()` |
| `tools` | `Vec<Box<dyn AgentTool>>` | Tool instances (heap-allocated trait objects) |
| `provider` | `Box<dyn StreamProvider>` | **Boxed**, not Arc; owned exclusively by Agent |
| `steering_queue` | `Arc<Mutex<Vec<AgentMessage>>>` | Written by `steer()`, drained by agent loop before each tool execution check |
| `follow_up_queue` | `Arc<Mutex<Vec<AgentMessage>>>` | Written by `follow_up()`, drained when agent loop would stop |
| `steering_mode` | `QueueMode` | Default: `OneAtATime` |
| `follow_up_mode` | `QueueMode` | Default: `OneAtATime` |
| `context_config` | `Option<ContextConfig>` | If None, context compaction is disabled |
| `execution_limits` | `Option<ExecutionLimits>` | If None, no hard limits enforced |
| `cache_config` | `CacheConfig` | Prompt caching hints (Anthropic) |
| `tool_execution` | `ToolExecutionStrategy` | Parallel (default), Sequential, or Batched |
| `retry_config` | `RetryConfig` | Backoff for RateLimited/Network errors |
| `before_turn` | `Option<BeforeTurnFn>` | Signature: `fn(&[AgentMessage], turn_number: usize) -> bool`; return false to abort |
| `after_turn` | `Option<AfterTurnFn>` | Signature: `fn(&[AgentMessage], &Usage)` |
| `on_error` | `Option<OnErrorFn>` | Signature: `fn(&str)` |
| `input_filters` | `Vec<Arc<dyn InputFilter>>` | Applied in order before LLM call |
| *(compaction strategies)* | *(moved to `ContextConfig.compaction`)* | `in_memory_strategy` and `block_strategy` fields on `CompactionConfig` (G5) |
| `cancel` | `Option<CancellationToken>` | Created when `prompt()` starts, consumed by `abort()` |
| `is_streaming` | `bool` | Set true on `prompt()` entry, false on exit |
| `agent_id` | `String` | UUID v4 generated once at `Agent::new()`; stable for the Agent's lifetime. Injected into every `AgentContext` built by this agent. |
| `session_id` | `String` | UUID v4 generated once at `Agent::new()`; groups all loops under one session. Stable for the Agent's lifetime. |
| `loop_counters` | `HashMap<String, usize>` | Per-`"{session_id}.{config_id}"` monotonic counter; incremented by `next_loop_id()` to produce the `N` component of `loop_id`. |
| `last_loop_id` | `Option<String>` | `loop_id` of the most recently started loop; set after each `prompt_*` or `continue_loop_*` call. Becomes `parent_loop_id` on the next continuation. |
| `before_loop` | `Option<BeforeLoopFn>` | Hook called once before `AgentStart`. Signature: `fn(&[AgentMessage], loop_index: usize) -> bool`; return `false` to abort before `AgentStart`. |
| `after_loop` | `Option<AfterLoopFn>` | Hook called once after `AgentEnd`. Signature: `fn(&[AgentMessage], &Usage)`. |
| `before_tool_execution` | `Option<BeforeToolExecutionFn>` | Hook called before each `ToolExecutionStart`. Signature: `fn(&str, &str, &JSON) -> bool` (tool_name, call_id, args); return `false` to skip. |
| `after_tool_execution` | `Option<AfterToolExecutionFn>` | Hook called after each `ToolExecutionEnd`. Signature: `fn(&str, &str, bool)` (tool_name, call_id, is_error). |
| `before_tool_execution_update` | `Option<BeforeToolExecutionUpdateFn>` | Hook called before each `ToolExecutionUpdate`. Signature: `fn(&str, &str, &str) -> bool` (tool_name, call_id, text); return `false` to suppress the event. |
| `after_tool_execution_update` | `Option<AfterToolExecutionUpdateFn>` | Hook called after each `ToolExecutionUpdate` (only when not suppressed). Signature: `fn(&str, &str, &str)`. |

**Invariants:**
- `assert!(!self.is_streaming)` fires if `prompt()` is called while already running — callers must use `steer()` or `follow_up()` during active runs
- `cancel` is always `Some` while `is_streaming` is true
- `messages` must not end in an `Assistant` message before `agent_loop_continue()` is called
- `agent_id` and `session_id` are always `Some` in any `AgentContext` built by `Agent`; direct callers of `agent_loop_continue` must also set them

### AgentContext (per-run, passed into agent loop)
| State Element | Type | Description |
|---|---|---|
| `system_prompt` | `String` | Immutable for the duration of the run |
| `messages` | `Vec<AgentMessage>` | Mutated in-place: prompts appended, assistant messages appended, tool results appended; may be replaced by compaction |
| `tools` | `&[Box<dyn AgentTool>]` | Immutable for the duration of the run |
| `agent_id` | `Option<String>` | Stable agent instance ID. Set by `Agent::prompt_*`; also written back by `agent_loop` when `None`. Required (non-`None`) for `agent_loop_continue`. |
| `session_id` | `Option<String>` | Stable session ID. Same lifecycle as `agent_id`. |
| `loop_id` | `Option<String>` | Per-call identifier of the form `"{session_id}.{config_id}.{N}"`. Set by `Agent` before calling `agent_loop`/`agent_loop_continue`; falls back to UUID if `None` at loop entry. |
| `parent_loop_id` | `Option<String>` | `loop_id` of the loop this call continues from. `None` for origin calls. Set by `Agent::continue_loop_with_sender` to `Agent.last_loop_id`. |
| `continuation_kind` | `Option<ContinuationKind>` | How this call relates to prior loops. `None` for origin; `Some(Default\|Rerun\|Branch)` for continuations. |

### ExecutionTracker (per-run)
| State | Initial | Transitions |
|---|---|---|
| `turns` | 0 | Incremented after each LLM call |
| `tokens_used` | 0 | Incremented by token count of each LLM response |
| `started_at` | Instant::now() | Immutable; compared against `max_duration` on each check |

### Steering/Follow-up Queue Modes
- **`QueueMode::OneAtATime`** (default for both queues): on each read, lock mutex, pop the *first* message only, return as `Vec` of 1
- **`QueueMode::All`**: on each read, lock mutex, drain *all* queued messages, return the full vec

Both queues are passed to `AgentLoopConfig` as closures (`get_steering_messages`, `get_follow_up_messages`) that capture the `Arc<Mutex<>>` pointer, enabling external callers to enqueue messages while the agent loop is running on another task.

### Event Hook Ordering

All hooks fire in a **guaranteed strict order** relative to their paired events. This ordering is enforced at runtime and is an invariant of the system:

```
before_loop → AgentStart
  before_turn → TurnStart
    [MessageStart/End for initial prompts — first turn of agent_loop() only]
    [MessageStart/End for injected steering messages]
    [LLM: MessageStart → MessageUpdate* → MessageEnd]
    [per tool call:]
      before_tool_execution → ToolExecutionStart
        (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)*
      ToolExecutionEnd → after_tool_execution
  TurnEnd → after_turn
  (repeat inner block for each follow-up / steering-triggered turn)
AgentEnd → after_loop
```

**Short-circuit rules — hook returns `false`:**

| Hook | When `false` is returned | Behaviour |
|---|---|---|
| `before_loop` | Before `AgentStart` | Loop is aborted; `AgentEnd { messages: [] }` is emitted; function returns immediately |
| `before_turn` | Before `TurnStart` | Turn is skipped; `TurnStart`/`TurnEnd` are not emitted; `AgentEnd` is **not** guaranteed |
| `before_tool_execution` | Before `ToolExecutionStart` | Tool call is skipped; `ToolExecutionStart`/`End` are not emitted; a skipped error `ToolResult` is returned to the LLM |
| `before_tool_execution_update` | Before `ToolExecutionUpdate` | Event is suppressed; `after_tool_execution_update` is not called; tool keeps running and final `ToolResult` is unaffected |

---

## 7. Error Handling Strategy

### Provider Errors (`ProviderError`)
| Error | Retryable | Handling |
|---|---|---|
| `RateLimited { retry_after_ms }` | Yes | Exponential backoff; respects `Retry-After` header if present |
| `Network(msg)` | Yes | Exponential backoff |
| `Auth(msg)` | No | Propagated immediately as `StopReason::Error` message |
| `Api(msg)` | No | Propagated as `StopReason::Error` message |
| `ContextOverflow { msg }` | No | Detected on HTTP 400/413; triggers compaction on next turn (see below) |
| `Cancelled` | No | Loop exits cleanly, `AgentEnd` emitted |
| `Other(msg)` | No | Propagated as `StopReason::Error` message |

### Context Overflow Recovery
1. Provider returns HTTP 400/413 matching any of 15+ known overflow phrases.
2. `ProviderError::classify()` returns `ContextOverflow`.
3. The overflow may arrive as an HTTP error (caught in retry loop) or as a streaming error event (`StreamEvent::Error` with matching message), caught by `Message::is_context_overflow()`.
4. On the next turn, if `context_config` is set, `compact_messages()` is called before the LLM call.
5. If no `context_config` is set, the error message is included in conversation history and the loop continues — the LLM may self-recover or the next turn will also fail.

### Tool Errors (`ToolError`)
- **`Cancelled`:** Tool execution skipped; `ToolResult` content = "Skipped due to queued user message." with `is_error: true`
- **`Failed(msg)`:** Converted to `ToolResult` with error text; `is_error: true`; **always returned to LLM** so it can self-correct
- **`InvalidArgs(msg)`:** Same as Failed; LLM can retry with corrected parameters
- **`NotFound(msg)`:** Produced when tool name in `ToolCall` has no matching `AgentTool`; same handling as Failed

### Input Filter Errors
- **`Reject(reason)`:** Emits `AgentEvent::InputRejected`, immediately emits `AgentEvent::AgentEnd { messages: [] }`, returns empty message list
- **`Warn(msg)`:** Warning text appended to last user message content; loop continues

### Execution Limit Exhaustion
- When any limit is exceeded, a synthetic user message `[Agent stopped: {reason}]` is appended to context and emitted as events.
- Loop returns immediately after appending the message.
- No error is thrown; `AgentEnd` is emitted normally.

### Before-Turn Abort
- If `before_turn` callback returns `false`, the loop returns immediately with no `AgentEnd` emitted.
- This is the only path where `AgentEnd` is not guaranteed.

### Error Propagation Across Components
```
Provider → ProviderError → stream_assistant_response() → Message{stop_reason: Error}
                                                        ↓
                                            on_error callback invoked
                                                        ↓
                                        AgentEvent::TurnEnd emitted
                                                        ↓
                                           agent loop returns
```