phi-core 0.9.0

Simple, effective agent loop with tool execution and event streaming
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
<!-- Last verified: 2026-04-05 by Claude Code -->
# Implementation Roadmap

> Generated from: `../reference/glossary.md`, `../specs/architecture.md`, `../architecture/algorithms.md`
> Last updated: 2026-03-17
> Paradigm: Language-agnostic / Implementation-independent

This roadmap defines six progressive stages of implementation derived from the
reverse-engineered specification. Each level is a complete, testable stage.
Complete and stabilize each level fully before advancing to the next.

***

## Level 1 — Survive
> **Goal:** The system can start, load configuration, initialize its core
> structures, and confirm it is alive. Nothing works end-to-end yet,
> but nothing crashes either.

**Completion Criteria:** A smoke test confirms the Agent can be constructed
with a MockProvider, configured via builder methods, and all core data entities
can be instantiated without error. No LLM call is required to pass Level 1.

---

### Milestone 1.1 — Core Type System

- [x] **REQ-001:** Define the `Content` enum with four variants: `Text { text }`, `Image { data: base64, mime_type }`, `Thinking { thinking, signature }`, and `ToolCall { id, name, arguments }`. Serialized with a `"type"` discriminant field. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: All four variants instantiate; round-trip JSON serialization produces the correct tagged shape.

- [x] **REQ-002:** Define the `Message` enum with three variants: `User { content, timestamp }`, `Assistant { content, stop_reason, model, provider, usage, timestamp, error_message }`, and `ToolResult { tool_call_id, tool_name, content, is_error, timestamp }`. *(Source: [AR])*
  - Depends on: REQ-001, REQ-005, REQ-006
  - Definition of Done: All three variants instantiate; serialization preserves the `role` field with values `"user"`, `"assistant"`, `"toolResult"`.

- [x] **REQ-003:** Define `AgentMessage` as an untagged enum wrapping `Llm(LlmMessage)` and `Extension(ExtensionMessage)`. *(Source: [AR])*
  - Depends on: REQ-002, REQ-004
  - Definition of Done: Both variants serialize/deserialize correctly; an `Extension` variant round-trips without loss.

- [x] **REQ-004:** Define `ExtensionMessage` with fields `role: String` (always `"extension"`), `kind: String`, and `data: JSON`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Instantiates and serializes to `{role:"extension", kind:"...", data:{...}}`.

- [x] **REQ-005:** Define `StopReason` enum with variants `Stop`, `Length`, `ToolUse`, `Error`, `Aborted`. Serialized in camelCase. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: All variants serialize to their documented camelCase strings.

- [x] **REQ-006:** Define `Usage` struct with fields `input`, `output`, `cache_read`, `cache_write`, `total_tokens` (all `u64`). Include a `cache_hit_rate()` derived method. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: `cache_hit_rate()` returns `cache_read / (input + cache_read + cache_write)`.

- [x] **REQ-007:** Define `AgentEvent` enum with all variants: `AgentStart`, `AgentEnd { messages }`, `TurnStart`, `TurnEnd { message, tool_results }`, `MessageStart { message }`, `MessageUpdate { message, delta }`, `MessageEnd { message }`, `ToolExecutionStart { tool_call_id, tool_name, args }`, `ToolExecutionUpdate { tool_call_id, tool_name, partial_result }`, `ToolExecutionEnd { tool_call_id, tool_name, result, is_error }`, `ProgressMessage { tool_call_id, tool_name, text }`, `InputRejected { reason }`. *(Source: [AR])*
  - Depends on: REQ-002, REQ-008
  - Definition of Done: All variants instantiate.

- [x] **REQ-008:** Define `StreamDelta` enum with variants `Text { delta }`, `Thinking { delta }`, `ToolCallDelta { delta }`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: All variants instantiate and carry their string payload.

- [x] **REQ-009:** Define `ToolContext` struct with fields `tool_call_id`, `tool_name`, `cancel: CancellationToken`, `on_update: Option<ToolUpdateFn>`, `on_progress: Option<ProgressFn>`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Struct instantiates; callback fields accept closures/function pointers.

- [x] **REQ-010:** Define `ToolResult { content: Vec<Content>, details: JSON }` and `ToolError` enum with variants `Failed(String)`, `NotFound(String)`, `InvalidArgs(String)`, `Cancelled`. *(Source: [AR])*
  - Depends on: REQ-001
  - Definition of Done: All variants instantiate; `ToolError` converts to a display string.

- [x] **REQ-011:** Define `ContextConfig` struct with fields and defaults: `max_context_tokens` (100,000), `system_prompt_tokens` (4,000), `keep_recent` (10), `keep_first` (2), `tool_output_max_lines` (50). *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Default construction produces the documented default values.

- [x] **REQ-012:** Define `ExecutionLimits` struct with defaults `max_turns` (50), `max_total_tokens` (1,000,000), `max_duration` (600s); and `ExecutionTracker` runtime state with fields `limits`, `turns`, `tokens_used`, `started_at`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: `ExecutionTracker::new(limits)` initializes `turns=0`, `tokens_used=0`, `started_at=now`.

- [x] **REQ-013:** Define `RetryConfig` with defaults: `max_retries` (3), `initial_delay_ms` (1,000), `backoff_multiplier` (2.0), `max_delay_ms` (30,000). *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Default construction produces documented defaults.

- [x] **REQ-014:** Define `CacheConfig { enabled: bool, strategy: CacheStrategy }` and `CacheStrategy` enum with variants `Auto`, `Disabled`, `Manual { cache_system, cache_tools, cache_messages }`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: All variants instantiate; default `CacheConfig` has `enabled: true`, `strategy: Auto`.

- [x] **REQ-015:** Define `StreamConfig` struct with fields `model`, `system_prompt`, `messages: Vec<Message>`, `tools: Vec<ToolDefinition>`, `thinking_level`, `api_key`, `max_tokens`, `temperature`, `model_config`, `cache_config`. *(Source: [AR])*
  - Depends on: REQ-014, REQ-016
  - Definition of Done: Struct instantiates with all optional fields as `None`.

- [x] **REQ-016:** Define `ToolDefinition` struct with fields `name`, `description`, `parameters: JSON`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Struct instantiates and serializes to the expected JSON shape.

- [x] **REQ-017:** Define `QueueMode` enum with variants `OneAtATime` and `All`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Both variants exist; default is `OneAtATime`.

- [x] **REQ-018:** All types in the `AgentMessage` tree derive `Serialize` and `Deserialize`. *(Source: [OV])*
  - Depends on: REQ-001 through REQ-017
  - Definition of Done: Full round-trip JSON serialization of a `Vec<AgentMessage>` containing all message types is lossless.

- [x] **REQ-019:** Define `ThinkingLevel` enum with variants `Off`, `Minimal`, `Low`, `Medium`, `High`. *(Source: [OV])*
  - Depends on: —
  - Definition of Done: All variants exist.

---

### Milestone 1.2 — Core Traits

- [x] **REQ-020:** Define `StreamProvider` trait with a single method `stream(config: StreamConfig, tx: EventSender, cancel: CancellationToken) -> Result<Message, ProviderError>`. Define `ProviderError` enum with variants `Api(String)`, `Network(String)`, `Auth(String)`, `RateLimited { retry_after_ms: Option<u64> }`, `ContextOverflow { message: String }`, `Cancelled`, `Other(String)`. *(Source: [AR])*
  - Depends on: REQ-002, REQ-015
  - Definition of Done: Trait compiles; `ProviderError` variants all instantiate.

- [x] **REQ-021:** Define `AgentTool` trait with methods `name() -> &str`, `label() -> &str`, `description() -> &str`, `parameters_schema() -> JSON`, `execute(params: JSON, ctx: ToolContext) -> Result<ToolResult, ToolError>`. *(Source: [AR])*
  - Depends on: REQ-009, REQ-010
  - Definition of Done: Trait compiles; a minimal struct can implement it.

- [x] **REQ-022:** Define `InputFilter` trait with method `filter(text: &str) -> FilterResult` where `FilterResult` is `Pass`, `Warn(String)`, or `Reject(String)`. *(Source: [OV])*
  - Depends on: —
  - Definition of Done: Trait compiles; all three result variants exist.

- [x] **REQ-023:** Define `CompactionStrategy` trait with method `compact(messages: Vec<AgentMessage>, config: ContextConfig) -> Vec<AgentMessage>`. *(Source: [AR])*
  - Depends on: REQ-003, REQ-011
  - Definition of Done: Trait compiles; a struct can implement it.

---

### Milestone 1.3 — Agent Struct Construction

- [x] **REQ-024:** Implement `BasicAgent::new(model_config: ModelConfig) -> BasicAgent`. Initialize all fields to documented defaults: `messages = []`, `tools = []`, `thinking_level = Off`, `tool_execution = Parallel`, `steering_mode = OneAtATime`, `follow_up_mode = OneAtATime`, `context_config = Some(default)`, `execution_limits = Some(default)`, `retry_config = default`, `is_streaming = false`, `cancel = None`. *(Source: [PS])*
  - Depends on: REQ-011 through REQ-017, REQ-019, REQ-020
  - Definition of Done: `BasicAgent::new(ModelConfig::anthropic("m", "m", "k"))` compiles and all fields have their documented defaults.

- [x] **REQ-025:** Implement builder methods: `with_system_prompt(text)`, `with_model_config(cfg)`, `with_provider_override(provider)`, `with_max_tokens(n)`, `with_thinking(level)`. *(Source: [PS])*
  - Depends on: REQ-024
  - Definition of Done: Method chain `BasicAgent::new(ModelConfig::anthropic("m", "m", "k")).with_system_prompt("x")` compiles and all fields are set correctly.

- [x] **REQ-026:** Implement `with_tools(vec)`, `with_context_config(cfg)`, `with_execution_limits(limits)`, `with_retry_config(cfg)`, `with_cache_config(cfg)`, `with_tool_execution(strategy)`, `with_steering_mode(mode)`, `with_follow_up_mode(mode)`. *(Source: [PS])*
  - Depends on: REQ-024
  - Definition of Done: All builders set their respective fields; `with_tools` replaces (or extends) the tools list.

- [x] **REQ-027:** Initialize `steering_queue` and `follow_up_queue` as `Arc<Mutex<Vec<AgentMessage>>>` in `BasicAgent::new`. *(Source: [AR])*
  - Depends on: REQ-003, REQ-024
  - Definition of Done: Both queues are non-null, independently lockable, and start empty.

---

### Milestone 1.4 — AgentContext and AgentLoopConfig

- [x] **REQ-028:** Define `AgentContext` struct with fields `system_prompt: String`, `messages: Vec<AgentMessage>`, `tools: &[Box<dyn AgentTool>]`. *(Source: [AR])*
  - Depends on: REQ-003, REQ-021
  - Definition of Done: Struct compiles; `messages` is mutable in-place during the loop.

- [x] **REQ-029:** Define `AgentLoopConfig` struct bundling all behavioral settings: `provider`, `model`, `api_key`, `thinking_level`, `max_tokens`, `temperature`, `model_config`, `get_steering_messages: Option<Fn()>`, `get_follow_up_messages: Option<Fn()>`, `context_config`, `compaction_strategy`, `execution_limits`, `cache_config`, `tool_execution`, `retry_config`, `before_turn`, `after_turn`, `on_error`, `input_filters`, `transform_context`, `convert_to_llm`. *(Source: [OV])*
  - Depends on: REQ-011 through REQ-017, REQ-023
  - Definition of Done: Struct compiles with all optional fields as `None`.

---

### Milestone 1.5 — MockProvider and Smoke Test

- [x] **REQ-030:** Implement `MockProvider` that implements `StreamProvider`. Accepts a list of pre-configured responses to return in sequence. Returns a `Message::Assistant` with `stop_reason: Stop` and configurable text content. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: `MockProvider::new(vec![response1, response2])` returns each response in order when `stream()` is called; after exhausting the list, returns a default stop response.

- [x] **REQ-031:** Smoke test: construct `Agent::new(MockProvider::new([]))`, configure with builder methods, verify all fields are set correctly, and confirm no panic occurs. *(Source: [OV])*
  - Depends on: REQ-024 through REQ-030
  - Definition of Done: Test passes with zero panics; all configured fields read back correctly.

***

## Level 2 — Useful
> **Goal:** The primary use cases from the spec work end-to-end on valid,
> well-formed inputs. An agent can accept a prompt, call an LLM, execute
> tool calls, and return a final response.

**Completion Criteria:** Every primary use case from `../reference/glossary.md` executes
successfully with valid inputs and a real (or mock) provider: single-turn text
response, multi-turn tool call cycle, message persistence round-trip, and agent
reset. The built-in coding tools all execute on valid inputs.

---

### Milestone 2.1 — Event Channel Infrastructure

- [x] **REQ-032:** Implement an unbounded async event channel. The `agent_loop` holds the sender (`tx`); callers receive from the receiver (`rx`). The channel never blocks the sender. *(Source: [AR])*
  - Depends on: REQ-007
  - Definition of Done: Sender can emit 1,000 events without blocking; receiver drains them all in order.

- [x] **REQ-033:** Implement `CancellationToken` with methods `new()`, `cancel()`, `is_cancelled() -> bool`, `child_token() -> CancellationToken`. Cancelling a parent automatically cancels all children. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Cancelling a root token causes `is_cancelled()` to return `true` on both the root and any child tokens created from it.

---

### Milestone 2.2 — Agent Prompt Entry Point

- [x] **REQ-034:** Implement `Agent::prompt(text: String) -> EventReceiver` as a thin wrapper that constructs a `User` message and delegates to `prompt_messages`. *(Source: [PS])*
  - Depends on: REQ-002, REQ-035
  - Definition of Done: `agent.prompt("hello")` returns a receiver immediately (non-blocking).

- [x] **REQ-035:** Implement `Agent::prompt_messages_with_sender(messages, tx)`: set `is_streaming = true`, create `CancellationToken`, build `AgentContext` snapshot, build `AgentLoopConfig` (wiring queue closures), spawn `agent_loop`, merge returned messages into `Agent.messages` on completion, set `is_streaming = false`. *(Source: [PS])*
  - Depends on: REQ-027, REQ-028, REQ-029, REQ-033, REQ-036
  - Definition of Done: After the spawned task completes, `agent.messages` contains the new messages and `is_streaming` is `false`.

---

### Milestone 2.3 — Agent Loop Core

- [x] **REQ-036:** Implement `agent_loop`: emit `AgentStart`, append prompts to `context.messages`, emit `TurnStart`/`MessageStart`/`MessageEnd` for each prompt, call `run_loop`, emit `AgentEnd`, return new messages. *(Source: [PS])*
  - Depends on: REQ-032, REQ-037
  - Definition of Done: With `MockProvider`, a single call emits `AgentStart`, at least one `TurnStart`/`TurnEnd` pair, and `AgentEnd`; returned messages include the input prompt and the assistant response.

- [x] **REQ-037:** Implement `agent_loop_continue`: emit `AgentStart`/`TurnStart`, call `run_loop`, emit `AgentEnd`. *(Source: [PS])*
  - Depends on: REQ-036
  - Definition of Done: Resumes from existing context without re-appending prompts.

- [x] **REQ-038:** Implement `run_loop` inner loop (happy path only: no steering, no follow-ups, no limits): call `stream_assistant_response`, append assistant message, extract tool calls, call `execute_tool_calls`, append tool results, loop until no more tool calls, then break. *(Source: [PS])*
  - Depends on: REQ-039, REQ-045, REQ-060
  - Definition of Done: With a MockProvider that returns one tool call then one `Stop`, `run_loop` executes the tool and calls the LLM a second time before stopping.

---

### Milestone 2.4 — LLM Streaming (Happy Path)

- [x] **REQ-039:** Implement `stream_assistant_response` (no retry): build `StreamConfig` from context and config, call `provider.stream()`, process stream events (`Start` → emit `MessageStart`; `TextDelta`/`ThinkingDelta`/`ToolCallDelta` → emit `MessageUpdate`; `Done` → emit `MessageEnd`; `Error` → emit `MessageStart`+`MessageEnd`), return final `Message`. *(Source: [PS])*
  - Depends on: REQ-007, REQ-008, REQ-015, REQ-020, REQ-032
  - Definition of Done: With MockProvider, caller receives `MessageStart`, one or more `MessageUpdate` with text deltas, and `MessageEnd` containing the complete assembled message.

- [x] **REQ-040:** Implement `AnthropicProvider::stream`: POST to `https://api.anthropic.com/v1/messages` with `x-api-key` + `anthropic-version: 2023-06-01` headers, `stream: true` body; parse SSE events (`message_start`, `content_block_start`, `content_block_delta`, `message_delta`, `message_stop`); buffer `InputJsonDelta` tool-argument fragments; parse complete JSON on `content_block_stop`; emit `StreamEvent`s. *(Source: [AR])*
  - Depends on: REQ-020, REQ-039
  - Definition of Done: Integration test with a real or stubbed Anthropic endpoint produces a correctly parsed `Message::Assistant` with usage stats.

- [x] **REQ-041:** Implement `OpenAiCompatProvider::stream`: POST to configured base URL + `/chat/completions` with `Authorization: Bearer` header, `stream: true`, `stream_options: {include_usage: true}`; parse SSE chunks `choices[0].delta`; accumulate tool-call argument strings; emit `StreamEvent`s. *(Source: [AR])*
  - Depends on: REQ-020, REQ-039
  - Definition of Done: Correctly parses a streamed chat-completion response from any OpenAI-compatible endpoint.

- [x] **REQ-042:** Implement `ProviderRegistry` with `new()` (empty) and `default()` (pre-registers `AnthropicProvider` and `OpenAiCompatProvider`). `ProviderRegistry` itself implements `StreamProvider`, dispatching based on `ApiProtocol` or model prefix. *(Source: [AR])*
  - Depends on: REQ-040, REQ-041
  - Definition of Done: `ProviderRegistry::default()` can route a config to `AnthropicProvider` or `OpenAiCompatProvider` without manual dispatch.

- [x] **REQ-043:** Implement `StopReason` determination in each provider: map provider-specific stop signals to the unified `StopReason` enum (`"end_turn"`/`"stop"` → `Stop`; `"max_tokens"`/`"length"` → `Length`; `"tool_use"`/`"tool_calls"` → `ToolUse`; cancellation → `Aborted`; errors → `Error`). *(Source: [PS])*
  - Depends on: REQ-005, REQ-040, REQ-041
  - Definition of Done: Each stop signal string maps to exactly one `StopReason` variant.

- [x] **REQ-044:** Filter `Extension` messages out of `AgentMessage` history before building `StreamConfig.messages`. Only `Llm(LlmMessage)` variants are sent to the LLM (note: `LlmMessage` wraps `Message` + `Option<TurnId>`). *(Source: [AR])*
  - Depends on: REQ-003, REQ-015
  - Definition of Done: An `AgentMessage::Extension` present in `context.messages` does not appear in the `StreamConfig` sent to the provider.

---

### Milestone 2.5 — Tool Execution (Happy Path)

- [x] **REQ-045:** Implement `execute_tool_calls` dispatching to the configured `ToolExecutionStrategy`. For `Parallel` (default), use `execute_batch`. *(Source: [PS])*
  - Depends on: REQ-046
  - Definition of Done: Multiple tool calls from one LLM response are dispatched concurrently; results arrive in original call order.

- [x] **REQ-046:** Implement `execute_single_tool`: find tool by name, emit `ToolExecutionStart`, build `ToolContext` with child cancel token and callbacks, call `tool.execute(args, ctx)`, emit `ToolExecutionEnd`, construct `Message::ToolResult`, emit `MessageStart`/`MessageEnd`, return `(ToolResult, is_error)`. *(Source: [PS])*
  - Depends on: REQ-007, REQ-009, REQ-010, REQ-021, REQ-033
  - Definition of Done: A registered tool is called; its result is wrapped in a `ToolResult` message; `ToolExecutionStart` and `ToolExecutionEnd` events are emitted.

- [x] **REQ-047:** Implement `BashTool::execute` (basic): extract `command` param, run `bash -c {command}`, capture stdout+stderr, construct text output (`"Exit code: N\n{stdout}"` or `"Exit code: N\nSTDOUT:\n{stdout}\nSTDERR:\n{stderr}"`), return `Ok(ToolResult)`. *(Source: [PS])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: `echo "hello"` returns `Ok(ToolResult)` with text containing `"Exit code: 0"` and `"hello"`.

- [x] **REQ-048:** Implement `ReadFileTool::execute` (basic text path): extract `path` param, read file to string, split into lines, apply optional `offset`/`limit`, produce line-numbered output with header, return `Ok(ToolResult)`. *(Source: [PS])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: Reading a known text file returns numbered lines; partial reads with `offset`/`limit` return the correct slice with a range header.

- [x] **REQ-049:** Implement `WriteFileTool::execute`: extract `path` and `content` params, create parent directories as needed, write file, return `Ok(ToolResult)`. *(Source: [AR])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: Writing to a path with non-existent parent directories succeeds; file is created on disk with correct content.

- [x] **REQ-050:** Implement `EditFileTool::execute` (basic): extract `path`, `old_text`, `new_text`; read file; replace the first occurrence of `old_text` with `new_text`; write back; return confirmation text. *(Source: [PS])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: A known substitution in an existing file is applied correctly; confirmation message reports old/new line counts.

- [x] **REQ-051:** Implement `ListFilesTool::execute` (basic): extract `path`, `pattern`, `max_depth`; build and run `find` command with exclusions for `target/`, `.git/`, `node_modules/`; return file paths as text. *(Source: [PS])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: Listing a known directory returns its files; excluded directories do not appear in results.

- [x] **REQ-052:** Implement `SearchTool::execute` (basic): extract `pattern`, `path`, `include`, `case_sensitive`; prefer `rg`, fall back to `grep`; return matching lines. *(Source: [PS])*
  - Depends on: REQ-010, REQ-021
  - Definition of Done: Searching for a known string in a known directory returns matching file paths and line content.

- [x] **REQ-053:** Implement `default_tools()` returning a `Vec<Box<dyn AgentTool>>` containing all six built-in tools: Bash, ReadFile, WriteFile, EditFile, ListFiles, Search. *(Source: [AR])*
  - Depends on: REQ-047 through REQ-052
  - Definition of Done: `default_tools()` returns exactly 6 tools with distinct names.

---

### Milestone 2.6 — Context Compaction (Happy Path)

- [x] **REQ-054:** Implement `estimate_tokens(text) -> usize` using the heuristic `ceil(byte_length / 4)`. *(Source: [PS])*
  - Depends on: —
  - Definition of Done: `estimate_tokens("hello")` returns 2 (5 bytes / 4, rounded up).

- [x] **REQ-055:** Implement `content_tokens(content: Vec<Content>) -> usize` and `message_tokens(msg: AgentMessage) -> usize` per the specified formulas (image tokens: `clamp(raw_bytes/750, 85, 16000)`; per-message overhead: +4 for user/assistant, +8 for tool result). *(Source: [PS])*
  - Depends on: REQ-001, REQ-003, REQ-054
  - Definition of Done: Token counts match the specified formulas for each content type.

- [x] **REQ-056:** Implement `compact_messages(messages, config) -> Vec<AgentMessage>`: if under budget, return unchanged; else cascade through Level 1 → Level 2 → Level 3 until budget is satisfied. *(Source: [PS])*
  - Depends on: REQ-055, REQ-057, REQ-058, REQ-059
  - Definition of Done: `compact_messages` called on a history exceeding budget returns a smaller history with `total_tokens <= budget`.

- [x] **REQ-057:** Implement `level1_truncate_tool_outputs`: for each `ToolResult` message, truncate each `Text` content block to at most `max_lines` using head+tail preservation with an omission marker. *(Source: [PS])*
  - Depends on: REQ-003, REQ-054
  - Definition of Done: A 200-line tool output truncated to `max_lines=50` produces a 50-line result with `"[... N lines truncated ...]"` marker.

- [x] **REQ-058:** Implement `level2_summarize_old_turns`: keep the last `keep_recent` messages in full; replace older assistant+tool-result groups with a single one-line summary user message. *(Source: [PS])*
  - Depends on: REQ-003, REQ-054
  - Definition of Done: Old assistant messages and their tool results are replaced by `"[Summary] ..."` user messages; recent messages are untouched.

- [x] **REQ-059:** Implement `level3_drop_middle`: keep `keep_first` head messages and `keep_recent` tail messages; replace the dropped middle with a marker message. Implement `keep_within_budget` fallback that greedily keeps the most-recent messages fitting the budget. *(Source: [PS])*
  - Depends on: REQ-003, REQ-054
  - Definition of Done: Result contains the first N and last M messages with a marker; total tokens fits the budget.

- [x] **REQ-060:** Integrate `compact_messages` call in `run_loop` before each LLM call when `context_config` is `Some`. *(Source: [PS])*
  - Depends on: REQ-038, REQ-056
  - Definition of Done: When configured, each LLM call is preceded by a compaction pass; when `context_config` is `None`, no compaction occurs.

---

### Milestone 2.7 — Execution Limits

- [x] **REQ-061:** Implement `ExecutionTracker::record_turn(tokens: usize)` (increments `turns` and adds to `tokens_used`) and `check_limits() -> Option<String>` (returns a reason string if any limit is exceeded: turns, total tokens, or wall-clock duration). *(Source: [AR])*
  - Depends on: REQ-012
  - Definition of Done: `check_limits()` returns `None` when under all limits and `Some("max turns exceeded")` when over.

- [x] **REQ-062:** Integrate execution limit checking in `run_loop`: call `tracker.check_limits()` at the start of each inner loop iteration; if exceeded, append a synthetic `User` message `"[Agent stopped: {reason}]"`, emit `MessageStart`/`MessageEnd`, and return. *(Source: [PS])*
  - Depends on: REQ-038, REQ-061
  - Definition of Done: An agent with `max_turns=2` stops after exactly 2 LLM calls; the last message contains the stop reason.

---

### Milestone 2.8 — Message Persistence and Agent Control

- [x] **REQ-063:** Implement `Agent::save_messages() -> String`: serialize `agent.messages` to a JSON string. *(Source: [OV])*
  - Depends on: REQ-018
  - Definition of Done: `save_messages()` returns a valid JSON array; the string can be parsed back without error.

- [x] **REQ-064:** Implement `Agent::restore_messages(json: &str)`: deserialize the JSON string into `Vec<AgentMessage>` and replace `agent.messages`. *(Source: [OV])*
  - Depends on: REQ-018, REQ-063
  - Definition of Done: After `save_messages()` → `restore_messages()`, the agent's message history is identical to the original.

- [x] **REQ-065:** Implement `Agent::reset()`: clear `messages`, drain both queues, cancel any active run, reset `is_streaming` to `false`, drop the cancel token. *(Source: [AR])*
  - Depends on: REQ-033
  - Definition of Done: After `reset()`, `messages` is empty, both queues are empty, and `is_streaming` is false.

- [x] **REQ-066:** Implement `Agent::steer(msg: AgentMessage)` (push to `steering_queue`) and `Agent::follow_up(msg: AgentMessage)` (push to `follow_up_queue`). *(Source: [AR])*
  - Depends on: REQ-027
  - Definition of Done: After `steer(msg)`, the steering queue contains exactly that message and is safe to read from another thread.

- [x] **REQ-067:** Implement `Agent::abort()`: if a cancel token exists, call `cancel()` on it. *(Source: [AR])*
  - Depends on: REQ-033, REQ-035
  - Definition of Done: Calling `abort()` during an active run causes `cancel.is_cancelled()` to return `true` inside the running agent loop.

***

## Level 3 — Smart
> **Goal:** The system handles reality. Invalid inputs, missing data,
> external failures, and edge cases are all handled gracefully.
> Every `[invariant]` and `ERROR` branch from the pseudocode is implemented.

**Completion Criteria:** No unhandled exception can be triggered by a known
class of bad input. All error paths from `../architecture/algorithms.md` are covered:
provider failures, tool errors, context overflow, execution limits,
filter rejections, and cancellation.

---

### Milestone 3.1 — Input Filter Chain

- [x] **REQ-068:** Implement the input filter chain at the start of `agent_loop`: join all `Text` content from `User` messages in prompts, run each registered `InputFilter` in order. *(Source: [PS])*
  - Depends on: REQ-022, REQ-036
  - Definition of Done: A filter registered via `with_input_filter` is called with the user's text before any LLM call.

- [x] **REQ-069:** On first `Reject` result, emit `InputRejected { reason }` then `AgentEnd { messages: [] }` and return an empty message list immediately. *(Source: [PS])*
  - Depends on: REQ-068
  - Definition of Done: A rejecting filter stops the run before the first LLM call; the caller's event stream contains `InputRejected` followed by `AgentEnd`.

- [x] **REQ-070:** Accumulate `Warn` results; after all filters pass, append all warning text as `Content::Text` to the last `User` message before it is appended to context. *(Source: [PS])*
  - Depends on: REQ-068
  - Definition of Done: A warning filter adds `"[Warning: ...]"` text to the user message; the run continues normally.

---

### Milestone 3.2 — Retry Engine

- [x] **REQ-071:** Implement `delay_for_attempt(config, attempt) -> Duration`: exponential backoff formula `initial_delay_ms * (multiplier ^ (attempt - 1))`, capped at `max_delay_ms`, multiplied by a uniform random jitter in `[0.8, 1.2]`. *(Source: [PS])*
  - Depends on: REQ-013
  - Definition of Done: With defaults, attempt 1 produces a duration in `[800ms, 1200ms]`; attempt 3 produces a duration in `[3200ms, 4800ms]`.

- [x] **REQ-072:** Implement `is_retryable()` on `ProviderError`: returns `true` only for `RateLimited` and `Network` variants. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: `Auth`, `Api`, `ContextOverflow`, `Cancelled`, `Other` all return `false`; `RateLimited` and `Network` return `true`.

- [x] **REQ-073:** Implement `retry_after()` on `ProviderError`: extracts `retry_after_ms` from `RateLimited { retry_after_ms: Some(n) }` if present; returns `None` otherwise. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: `ProviderError::RateLimited { retry_after_ms: Some(5000) }.retry_after()` returns `Some(Duration::from_ms(5000))`.

- [x] **REQ-074:** Integrate retry loop into `stream_assistant_response`: on a retryable error, sleep for `retry_after() OR delay_for_attempt(attempt)` and retry up to `max_retries` times; stop retrying if `cancel.is_cancelled()`. *(Source: [PS])*
  - Depends on: REQ-039, REQ-071, REQ-072, REQ-073
  - Definition of Done: A `RateLimited` error causes the loop to wait and retry; after exhausting retries, the error is propagated as an `Error` stop reason.

---

### Milestone 3.3 — Provider Error Classification

- [x] **REQ-075:** Implement `ProviderError::classify(status: u16, message: String) -> ProviderError`: route to `ContextOverflow` first (status 400/413 or matching overflow phrase), then `RateLimited` (429), then `Auth` (401/403), then `Api`. *(Source: [PS])*
  - Depends on: REQ-020
  - Definition of Done: HTTP 429 maps to `RateLimited`; HTTP 401 maps to `Auth`; "prompt is too long" in the body maps to `ContextOverflow`.

- [x] **REQ-076:** Implement `is_context_overflow(status, message) -> bool`: check for empty body with status 400/413 (Cerebras/Mistral pattern); check for any of 15+ documented overflow phrases (case-insensitive substring match). *(Source: [PS])*
  - Depends on: —
  - Definition of Done: All 15 documented overflow phrases are recognized; unrelated 400 errors with non-empty body are not misclassified.

- [x] **REQ-077:** Implement context overflow recovery: when the streaming error event contains a message matching overflow detection (`Message::is_context_overflow()`), treat it as an overflow on the next turn by triggering `compact_messages` (if `context_config` is set). *(Source: [AR])*
  - Depends on: REQ-056, REQ-075, REQ-076
  - Definition of Done: A mock that returns an overflow error on turn 1 causes compaction before turn 2.

---

### Milestone 3.4 — Tool Error Handling

- [x] **REQ-078:** On `ToolError::Failed(msg)` or `ToolError::InvalidArgs(msg)`: convert to a `ToolResult` with `content: [Text(msg)]` and `is_error: true`; always return this to the LLM so it can self-correct. *(Source: [AR])*
  - Depends on: REQ-010, REQ-046
  - Definition of Done: A tool that returns `Err(Failed("oops"))` produces a `ToolResult` message with `is_error: true` and the text `"oops"`.

- [x] **REQ-079:** On `ToolError::NotFound(name)`: produce `ToolResult { content: [Text("Tool {name} not found")], is_error: true }`. *(Source: [PS])*
  - Depends on: REQ-046
  - Definition of Done: Requesting a non-existent tool name in a tool call produces a `NotFound` error result.

- [x] **REQ-080:** On `ToolError::Cancelled`: produce `ToolResult { content: [Text("Skipped due to queued user message.")], is_error: true }`. *(Source: [AR])*
  - Depends on: REQ-010, REQ-046
  - Definition of Done: A tool skipped due to steering produces the documented skipped message.

---

### Milestone 3.5 — Error and Abort Stop Reason Handling

- [x] **REQ-081:** In `run_loop`, when the assistant message has `stop_reason == Error`: call `on_error(error_message)` if defined, call `after_turn` if defined, emit `TurnEnd`, return immediately. *(Source: [PS])*
  - Depends on: REQ-038, REQ-082
  - Definition of Done: A mock provider that returns an error stop reason causes the loop to exit; `on_error` is called with the message text.

- [x] **REQ-082:** In `run_loop`, when `stop_reason == Aborted`: call `after_turn` if defined, emit `TurnEnd`, return immediately. *(Source: [PS])*
  - Depends on: REQ-038
  - Definition of Done: Calling `agent.abort()` mid-run causes the loop to exit cleanly; `TurnEnd` is emitted.

- [x] **REQ-083:** Construct a synthetic error `Message::Assistant` on irrecoverable provider failure (after retry exhaustion): empty content, `stop_reason: Error`, `error_message: Some(e.to_string())`. *(Source: [PS])*
  - Depends on: REQ-002, REQ-039
  - Definition of Done: A provider that always fails produces an `Assistant` message with `stop_reason: Error` containing the provider's error text.

---

### Milestone 3.6 — Sequential and Batched Tool Execution

- [x] **REQ-084:** Implement `execute_sequential`: execute tool calls one at a time; after each, check the steering queue; on non-empty steering, skip remaining tools with `ToolError::Cancelled` results and return steering messages. *(Source: [PS])*
  - Depends on: REQ-046, REQ-080
  - Definition of Done: With steering arriving after tool 1 of 3, tools 2 and 3 receive skipped error results; the steering message is returned for injection.

- [x] **REQ-085:** Implement `execute_batch` (Parallel): launch all tools concurrently via `join_all`; after all complete, check steering once; return steering if present. *(Source: [PS])*
  - Depends on: REQ-046
  - Definition of Done: Three parallel tools all complete; steering arriving before their completion is returned after all finish.

- [x] **REQ-086:** Implement `Batched { size }` dispatch: split tool calls into groups of `size`; run each group via `execute_batch`; check steering between groups; on steering, skip remaining groups with cancelled results. *(Source: [PS])*
  - Depends on: REQ-085
  - Definition of Done: With 5 tool calls, `Batched { size: 2 }` executes groups [1,2], [3,4], [5]; steering after group 1 skips groups 2 and 3.

---

### Milestone 3.7 — Steering and Follow-up Queue Integration

- [x] **REQ-087:** In `run_loop`, drain the steering queue at the start of the outer loop before the first inner-loop iteration. *(Source: [PS])*
  - Depends on: REQ-038
  - Definition of Done: Messages enqueued via `steer()` before `prompt()` is called are injected as the first pending messages.

- [x] **REQ-088:** After tool execution, if steering messages were captured, set them as `pending` and continue the inner loop (injecting them before the next LLM call). *(Source: [PS])*
  - Depends on: REQ-038, REQ-084, REQ-085
  - Definition of Done: A steering message injected during tool execution appears in context before the subsequent LLM call.

- [x] **REQ-089:** After the inner loop exits (no tool calls, no pending steering), check the follow-up queue; if non-empty, add follow-up messages to `pending` and continue the outer loop. *(Source: [PS])*
  - Depends on: REQ-038
  - Definition of Done: A follow-up message enqueued via `follow_up()` causes the agent to re-enter the loop rather than stopping.

- [x] **REQ-090:** Implement `QueueMode::OneAtATime` (pop exactly one message per read) and `QueueMode::All` (drain the entire queue per read). Both modes are thread-safe (mutex-protected). *(Source: [AR])*
  - Depends on: REQ-017, REQ-027
  - Definition of Done: `OneAtATime` leaves remaining messages in the queue; `All` empties it; both are safe to call from the agent loop while another thread pushes.

---

### Milestone 3.8 — Lifecycle Callbacks

- [x] **REQ-091:** Call `before_turn(messages, turn_number) -> bool` at the start of each turn (before the LLM call). If it returns `false`, return from `run_loop` immediately without emitting `AgentEnd`. *(Source: [PS])*
  - Depends on: REQ-038
  - Definition of Done: A `before_turn` that returns `false` on turn 2 stops the loop after turn 1; `AgentEnd` is not emitted.

- [x] **REQ-092:** Call `after_turn(messages, usage)` after each LLM call and its tool executions, including on error/abort paths. *(Source: [PS])*
  - Depends on: REQ-038
  - Definition of Done: `after_turn` is called exactly once per turn, including when the turn ends in an error.

- [x] **REQ-093:** Call `on_error(message: &str)` when `stop_reason == Error`. *(Source: [PS])*
  - Depends on: REQ-081
  - Definition of Done: An error-returning provider invokes the `on_error` callback with the error message string.

---

### Milestone 3.9 — Tool Safety and Edge Cases

- [x] **REQ-094:** `BashTool`: check each `deny_pattern` against the command (substring match) before execution; return `Err(Failed("Command blocked..."))` on match. *(Source: [PS])*
  - Depends on: REQ-047
  - Definition of Done: A command containing a deny pattern is rejected before any subprocess is spawned.

- [x] **REQ-095:** `BashTool`: race subprocess completion against a configurable timeout and the cancellation token; on timeout return `Err(Failed("Command timed out after Ns"))`; on cancellation return `Err(Cancelled)`. *(Source: [PS])*
  - Depends on: REQ-047
  - Definition of Done: `sleep 300` with a 2s timeout produces a timeout error; cancellation produces `Cancelled`.

- [x] **REQ-096:** `BashTool`: truncate `stdout` and `stderr` independently at `max_output_bytes` (default 256KB) and append `"\n... (output truncated)"`. *(Source: [PS])*
  - Depends on: REQ-047
  - Definition of Done: Output exceeding 256KB is truncated with the documented suffix.

- [x] **REQ-097:** `BashTool`: optional `confirm_fn` callback; if defined and returns `false`, return `Err(Failed("Command was not confirmed by the user."))`. *(Source: [PS])*
  - Depends on: REQ-047
  - Definition of Done: A rejecting `confirm_fn` prevents subprocess execution.

- [x] **REQ-098:** `ReadFileTool`: check file size before reading. Text files exceeding `max_bytes` (1MB): return `Err(Failed("File too large. Use offset/limit..."))`. Image files exceeding 20MB: return `Err(Failed("Image too large"))`. *(Source: [PS])*
  - Depends on: REQ-048
  - Definition of Done: Reading a file above the size limit returns the documented error without reading the file contents.

- [x] **REQ-099:** `ReadFileTool`: for image extensions, read file as bytes, base64-encode, detect MIME type from extension, return `Content::Image`. *(Source: [PS])*
  - Depends on: REQ-001, REQ-048
  - Definition of Done: Reading a `.png` file returns a `ToolResult` with `Content::Image { data: base64, mime_type: "image/png" }`.

- [x] **REQ-100:** `ReadFileTool`: check `ctx.cancel.is_cancelled()` before each I/O operation; return `Err(Cancelled)` if set. *(Source: [PS])*
  - Depends on: REQ-048
  - Definition of Done: Cancelling before a read returns `Cancelled` without touching the file.

- [x] **REQ-101:** `EditFileTool`: if `old_text` matches zero occurrences, attempt `find_similar_text` for a fuzzy hint; return `Err(Failed("old_text not found... Did you mean: ..."))`. *(Source: [PS])*
  - Depends on: REQ-050
  - Definition of Done: An edit with wrong `old_text` returns a `Failed` error; if a similar line exists, the hint is included.

- [x] **REQ-102:** `EditFileTool`: if `old_text` matches more than one occurrence, return `Err(Failed("old_text matches N locations. Include more context..."))`. *(Source: [PS])*
  - Depends on: REQ-050
  - Definition of Done: Attempting to replace ambiguous text returns a descriptive error with the match count.

- [x] **REQ-103:** `EditFileTool`: check `ctx.cancel.is_cancelled()` before each I/O operation. *(Source: [PS])*
  - Depends on: REQ-050
  - Definition of Done: Cancellation before read or write returns `Err(Cancelled)`.

- [x] **REQ-104:** `WriteFileTool`: check `ctx.cancel.is_cancelled()` before writing. *(Source: [AR])*
  - Depends on: REQ-049
  - Definition of Done: Cancellation prevents the write from occurring.

- [x] **REQ-105:** `ListFilesTool`: race `find` execution against a timeout (default 10s) and the cancellation token; truncate results at `max_results` (default 200) with a truncation suffix. *(Source: [PS])*
  - Depends on: REQ-051
  - Definition of Done: Listing a directory with 500 files returns 200 with the truncation message.

- [x] **REQ-106:** `SearchTool`: fall back from `rg` to `grep` if ripgrep is not available on the system. Check `ctx.cancel.is_cancelled()` before execution. *(Source: [PS])*
  - Depends on: REQ-052
  - Definition of Done: Search succeeds on a system without `rg` installed; cancellation is respected.

---

### Milestone 3.10 — Agent Invariants

- [x] **REQ-107:** In `prompt_messages_with_sender`, assert `!self.is_streaming` with a clear panic message before proceeding. *(Source: [PS])*
  - Depends on: REQ-035
  - Definition of Done: Calling `prompt()` while a run is active panics with a message directing the caller to use `steer()` or `follow_up()`.

- [x] **REQ-108:** In `agent_loop_continue`, validate preconditions: `context.messages` is non-empty and the last message is not an `Assistant` variant. *(Source: [PS])*
  - Depends on: REQ-037
  - Definition of Done: Calling `agent_loop_continue` with an empty context or with a trailing assistant message returns an error or panics with a clear message.

---

### Milestone 3.11 — Skill System

- [x] **REQ-109:** Implement `SkillSet::load(dirs: Vec<Path>)`: iterate directories, skip missing ones silently, scan each for subdirectories containing `SKILL.md`, parse frontmatter, build a name-keyed map (later dirs override earlier on collision), return sorted `SkillSet`. *(Source: [PS])*
  - Depends on: REQ-110
  - Definition of Done: Loading two dirs where both contain a skill named `"foo"` results in the second dir's version being used.

- [x] **REQ-110:** Implement `parse_frontmatter(content) -> (name, description)`: require content to begin with `---`, extract YAML block up to next `\n---`, parse `name:` and `description:` lines, strip surrounding quotes, return `Err(InvalidFrontmatter)` or `Err(MissingField)` on failure. *(Source: [PS])*
  - Depends on: —
  - Definition of Done: Valid frontmatter parses correctly; missing `name` field returns a `MissingField` error; missing delimiters return `InvalidFrontmatter`.

- [x] **REQ-111:** Implement `SkillSet::format_for_prompt()`: emit `<available_skills>` XML block with one `<skill>` element per skill (sorted by name ascending), XML-escaping all string values; return empty string if no skills loaded. *(Source: [PS])*
  - Depends on: REQ-109
  - Definition of Done: Output is well-formed XML; special characters in skill names/descriptions are correctly escaped.

- [x] **REQ-112:** Implement `SkillSet::load_dir(dir, source)` and `SkillSet::merge(other)`. *(Source: [AR])*
  - Depends on: REQ-109
  - Definition of Done: `merge` causes the other's skills to override on name conflict.

- [x] **REQ-113:** Implement `Agent::with_skills(skill_set)`: call `format_for_prompt()` and append the XML block to `self.system_prompt`. *(Source: [PS])*
  - Depends on: REQ-111
  - Definition of Done: After `with_skills(set)`, the agent's system prompt contains the `<available_skills>` XML block.

---

### Milestone 3.12 — MCP Client

- [x] **REQ-114:** Implement `McpClient::connect_stdio(cmd, args, env)`: spawn subprocess with piped stdin/stdout; complete the 3-step initialize handshake; return `Ok(McpClient)`. *(Source: [PS])*
  - Depends on: REQ-115, REQ-116
  - Definition of Done: Spawning a compliant MCP server subprocess results in a connected client; `server_info` is populated from the handshake.

- [x] **REQ-115:** Implement `McpClient::send_request(method, params)`: construct a JSON-RPC 2.0 request with auto-incremented atomic ID, send over transport, receive response, return `Err(JsonRpc{...})` on error field or `Err(Protocol("Empty result"))` on missing result. *(Source: [PS])*
  - Depends on: —
  - Definition of Done: A JSON-RPC response with an error field maps to `McpError::JsonRpc`; a valid result field is returned as `Ok(value)`.

- [x] **REQ-116:** Implement `McpClient::list_tools()` and `McpClient::call_tool(name, args)`. *(Source: [PS])*
  - Depends on: REQ-115
  - Definition of Done: `list_tools()` returns a parsed `Vec<McpToolInfo>`; `call_tool()` returns a parsed `McpToolCallResult`.

- [x] **REQ-117:** Implement `McpToolAdapter` implementing `AgentTool`: wraps `McpToolInfo` metadata and an `Arc<Mutex<McpClient>>`; `execute()` calls `client.call_tool()` and converts `McpContent` to `Content` variants. *(Source: [AR])*
  - Depends on: REQ-001, REQ-021, REQ-116
  - Definition of Done: An `McpToolAdapter` can be registered on an agent and called successfully in a tool-use turn.

- [x] **REQ-118:** Handle all `McpError` variants gracefully: `Transport`, `Protocol`, `JsonRpc`, `Serialization`, `Io`, `ConnectionClosed` all surface as `ToolError::Failed` with descriptive messages. *(Source: [AR])*
  - Depends on: REQ-117
  - Definition of Done: Each `McpError` variant produces a non-panicking `ToolError::Failed` with a message identifying the error type and context.

- [x] **REQ-119:** Implement `Agent::with_mcp_server_stdio(cmd, args, env)`: call `McpClient::connect_stdio`, then `McpToolAdapter::from_client`, append resulting tool adapters to `self.tools`. *(Source: [AR])*
  - Depends on: REQ-114, REQ-117
  - Definition of Done: After `with_mcp_server_stdio`, the agent's tool list includes all tools reported by the MCP server.

***

## Level 4 — Professional
> **Goal:** The system is safe, observable, and maintainable.
> It can be operated with multiple provider backends, supports prompt caching
> and extended thinking, exposes useful observability hooks, and shuts down
> gracefully.

**Completion Criteria:** All 7 provider protocols are implemented. Prompt
caching, thinking levels, structured logging, and security-sensitive fields
are all handled. The cancellation tree propagates correctly to all I/O
boundaries. The system is configurable for production use.

---

### Milestone 4.1 — Full Provider Suite

- [x] **REQ-120:** Implement `GoogleProvider::stream` (Gemini API): POST to `{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={API_KEY}`; use custom SSE parser (split on `\n\n`, extract `data:` line); map tool calls from `functionDeclarations`; auto-generate tool IDs as `"google-fc-{index}"`; tool results as `functionResponse` parts. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: A Gemini streaming response is parsed into the correct `StreamEvent`s; tool IDs are auto-generated in the documented format.

- [x] **REQ-121:** Implement `GoogleVertexProvider::stream` (Vertex AI): identical wire format to Gemini; endpoint pattern `https://{region}-aiplatform.googleapis.com/...`; auth via `Authorization: Bearer {OAUTH_TOKEN}`; tool IDs as `"vertex-fc-{index}"`. *(Source: [AR])*
  - Depends on: REQ-120
  - Definition of Done: Vertex request differs from Gemini only in endpoint and auth header.

- [x] **REQ-122:** Implement `BedrockProvider::stream` (ConverseStream API): endpoint `{base_url}/model/{model}/converse-stream`; newline-delimited JSON (not standard SSE); parse events `contentBlockDelta`, `contentBlockStart`, `contentBlockStop`, `messageStop`, `metadata`; tool spec format: `toolSpec { inputSchema: { json: schema } }`; tool result format: `{ toolResult: { toolUseId, content, status } }`. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: A Bedrock ndjson streaming response is correctly parsed; tool definitions and results are in the Bedrock-specific format.

- [x] **REQ-123:** Implement `OpenAiResponsesProvider::stream` (OpenAI Responses API): endpoint `{base_url}/responses`; system prompt in `"instructions"` field; SSE events `response.output_text.delta`, `response.reasoning.delta`, `response.function_call_arguments.*`, `response.completed`. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: The Responses API wire format differs correctly from Chat Completions in system prompt field and event names.

- [x] **REQ-124:** Implement `AzureOpenAiProvider::stream`: endpoint `{base_url}/responses?api-version=2025-01-01-preview`; auth via `api-key: {AZURE_OPENAI_API_KEY}` header (not `Authorization: Bearer`); same request/response format as OpenAI Responses API. *(Source: [AR])*
  - Depends on: REQ-123
  - Definition of Done: Azure auth uses `api-key` header; base URL pattern `https://{resource}.openai.azure.com/openai/deployments/{deployment}` is supported.

- [x] **REQ-125:** Register all 7 providers (Anthropic, OpenAiCompat, OpenAiResponses, Azure, Google, Vertex, Bedrock) in `ProviderRegistry::default()`. *(Source: [AR])*
  - Depends on: REQ-042, REQ-120 through REQ-124
  - Definition of Done: `ProviderRegistry::default()` can dispatch to any of the 7 implementations based on protocol selection.

---

### Milestone 4.2 — Prompt Caching

- [x] **REQ-126:** Implement `CacheStrategy::Auto`: provider automatically places `cache_control: { type: "ephemeral" }` breakpoints at the system prompt, the last tool definition, and the second-to-last message. *(Source: [AR])*
  - Depends on: REQ-014, REQ-040
  - Definition of Done: In Anthropic requests, the three cache breakpoints appear in the correct positions when `strategy: Auto`.

- [x] **REQ-127:** Implement `CacheStrategy::Manual { cache_system, cache_tools, cache_messages }`: conditionally apply breakpoints per flag. Implement `CacheStrategy::Disabled`: no breakpoints emitted. *(Source: [AR])*
  - Depends on: REQ-126
  - Definition of Done: Each flag independently controls placement of its respective cache breakpoint.

- [x] **REQ-128:** Propagate `Usage.cache_read` and `Usage.cache_write` from Anthropic response metadata into `Message::Assistant.usage`. *(Source: [AR])*
  - Depends on: REQ-006, REQ-040
  - Definition of Done: Cache token counts from Anthropic are populated in the usage struct after a cached-hit response.

---

### Milestone 4.3 — Extended Thinking

- [x] **REQ-129:** Map `ThinkingLevel` to Anthropic `thinking` parameter: `Off` → omit; `Minimal` → `budget_tokens: 128`; `Low` → 512; `Medium` → 2048; `High` → 8192. *(Source: [AR])*
  - Depends on: REQ-019, REQ-040
  - Definition of Done: Setting `ThinkingLevel::Medium` causes `{type:"enabled", budget_tokens:2048}` to appear in the Anthropic request.

- [x] **REQ-130:** Map `ThinkingLevel` to OpenAI-compat `reasoning_effort` parameter when `supports_reasoning_effort` flag is set: `Minimal`/`Low` → `"low"`; `Medium` → `"medium"`; `High` → `"high"`. *(Source: [AR])*
  - Depends on: REQ-019, REQ-041
  - Definition of Done: `ThinkingLevel::High` with a reasoning-capable provider produces `reasoning_effort: "high"` in the request body.

- [x] **REQ-131:** Parse `Thinking` content blocks from streaming responses (Anthropic `thinking` type blocks; OpenAI `delta.reasoning_content` / xAI `delta.reasoning`); emit as `StreamDelta::Thinking` and store as `Content::Thinking` in the final message. *(Source: [AR])*
  - Depends on: REQ-001, REQ-008, REQ-040
  - Definition of Done: A streaming response containing thinking/reasoning content produces `MessageUpdate` events with `StreamDelta::Thinking` and the final `Content::Thinking` block in the assembled message.

---

### Milestone 4.4 — MCP HTTP Transport

- [x] **REQ-132:** Implement `McpClient::connect_http(url)`: POST JSON-RPC bodies to the configured URL (stateless, no persistent connection); complete the initialize handshake. *(Source: [AR])*
  - Depends on: REQ-115
  - Definition of Done: An HTTP-based MCP server can be connected to and queried for tools.

- [x] **REQ-133:** Implement `Agent::with_mcp_server_http(url)` builder. Support optional tool name prefix (`{prefix}__{name}`) for namespace disambiguation. *(Source: [AR])*
  - Depends on: REQ-117, REQ-132
  - Definition of Done: HTTP MCP tools appear in the agent's tool list; with a prefix configured, tool names are formatted as `"{prefix}__{name}"`.

- [x] **REQ-134:** On MCP stdio transport shutdown, send EOF on stdin then kill the child process. *(Source: [AR])*
  - Depends on: REQ-114
  - Definition of Done: Dropping or closing the stdio MCP client terminates the child process cleanly.

---

### Milestone 4.5 — Observability and Logging

- [x] **REQ-135:** Implement structured retry logging: when a retry occurs, log attempt number, max retries, delay, and the triggering error at an appropriate log level. *(Source: [PS])*
  - Depends on: REQ-074
  - Definition of Done: A retried request produces a structured log entry containing all four fields.

- [x] **REQ-136:** Implement `ContextTracker`: combine provider-reported token counts (from `Usage`) with local `estimate_tokens` for messages appended since the last provider report. Expose `current_tokens() -> usize`. *(Source: [AR])*
  - Depends on: REQ-054, REQ-055
  - Definition of Done: After a turn with known provider-reported usage, `current_tokens()` reflects the reported value; after additional messages are appended, it adds heuristic estimates.

- [x] **REQ-137:** Populate `ToolResult.details` with structured metadata per tool: `BashTool` → `{ exit_code, success }`; `ReadFileTool` → `{ path }`; `WriteFileTool` → `{ path }`; `EditFileTool` → `{ path, old_lines, new_lines }`; `ListFilesTool` → `{ total, truncated }`; `SubAgentTool` → `{ sub_agent, turns }`. *(Source: [AR])*
  - Depends on: REQ-047 through REQ-052
  - Definition of Done: `ToolResult.details` for a bash execution contains `exit_code` and `success` keys.

---

### Milestone 4.6 — Security

- [x] **REQ-138:** Redact sensitive `OpenApiAuth` credentials in debug output: `Bearer(token)` displays as `Bearer("****")`; `ApiKey { value }` displays as `ApiKey { header: "...", value: "****" }`. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: Printing/logging an `OpenApiAuth::Bearer("secret")` value produces `"****"` instead of the actual token.

- [x] **REQ-139:** Implement the complete `BashTool` deny-pattern list (configurable; default list to be specified at implementation time based on the safety policy described in the spec). *(Source: [PS])*
  - Depends on: REQ-094
  - Definition of Done: A configurable list of deny patterns is applied; at least the patterns documented in the spec are included in the default list.

---

### Milestone 4.7 — Graceful Cancellation

- [x] **REQ-140:** Implement `CancellationToken::child_token()`: creates a new token that is cancelled when the parent is cancelled. Each `ToolContext` receives a child token. *(Source: [PS])*
  - Depends on: REQ-033, REQ-046
  - Definition of Done: Calling `agent.abort()` (which cancels the root token) causes all active tool contexts' `cancel.is_cancelled()` to return `true` simultaneously.

- [x] **REQ-141:** `SubAgentTool` forwards the parent's cancel token to the child `agent_loop()`, so `agent.abort()` terminates sub-agents as well. *(Source: [PS])*
  - Depends on: REQ-033, REQ-140
  - Definition of Done: Aborting the parent agent cancels the sub-agent's run.

---

### Milestone 4.8 — Callbacks and Advanced Configuration

- [x] **REQ-142:** Implement `on_update` callback in `ToolContext`: when called, emits `AgentEvent::ToolExecutionUpdate { tool_call_id, tool_name, partial_result }` to the event channel. *(Source: [AR])*
  - Depends on: REQ-007, REQ-046
  - Definition of Done: A tool that calls `ctx.on_update(partial)` causes `ToolExecutionUpdate` events to appear in the stream before `ToolExecutionEnd`.

- [x] **REQ-143:** Implement `on_progress` callback in `ToolContext`: when called, emits `AgentEvent::ProgressMessage { tool_call_id, tool_name, text }`. *(Source: [AR])*
  - Depends on: REQ-007, REQ-046
  - Definition of Done: A tool that calls `ctx.on_progress("working...")` causes a `ProgressMessage` event in the stream.

- [x] **REQ-144:** Implement `Agent::prompt_with_sender(text, tx)`: like `prompt`, but streams events to a caller-provided sender rather than creating a new channel. *(Source: [AR])*
  - Depends on: REQ-034
  - Definition of Done: Events are sent to the provided `tx`; the caller can multiplex one sender across multiple prompts.

- [x] **REQ-145:** Implement `transform_context` and `convert_to_llm` optional hooks on `AgentLoopConfig`. When set, `stream_assistant_response` calls them to preprocess messages before building `StreamConfig`. *(Source: [PS])*
  - Depends on: REQ-039
  - Definition of Done: A `transform_context` hook that adds a prefix message causes that message to appear in every LLM call.

- [x] **REQ-146:** Implement `Agent::with_compaction_strategy(strategy)` builder; when set, use the custom `CompactionStrategy` instead of the default tiered cascade. *(Source: [AR])*
  - Depends on: REQ-023, REQ-060
  - Definition of Done: A custom strategy that always returns an empty list causes the LLM to be called with no history.

- [x] **REQ-147:** Define `ModelConfig` struct with fields: `base_url: Option<String>`, `headers: Map<String,String>`, `max_tokens_field: String` (default `"max_tokens"`), `supports_developer_role: bool`, `supports_reasoning_effort: bool`. Apply in `OpenAiCompatProvider`. *(Source: [AR])*
  - Depends on: REQ-041
  - Definition of Done: Setting `max_tokens_field: "max_completion_tokens"` causes the OpenAI provider to use that key in the request body.

---

### Milestone 4.9 — Agent Identity and Event Hook Observability

- [x] **REQ-180:** Define `ContinuationKind` enum in `types.rs` with three variants: `Default` (unspecified continuation), `Rerun { tag: String }` (retry from equivalent context), `Branch { tag: String }` (different execution path). Tags are RFC 3339 UTC timestamps auto-generated at call time by the caller. *(Source: [AR])*
  - Depends on: —
  - Definition of Done: All three variants instantiate; `Rerun { tag }` and `Branch { tag }` round-trip through JSON serialization preserving the tag string.

- [x] **REQ-181:** Define `TurnTrigger` enum in `types.rs` with four variants: `User` (first turn of origin call), `SubAgent` (sub-agent invocation), `Continuation` (subsequent turns, tool round-trips, steering, Default/Rerun continuations), `Branch` (first turn of a Branch continuation). Add `triggered_by: TurnTrigger` field to `AgentEvent::TurnStart`. *(Source: [AR])*
  - Depends on: REQ-007
  - Definition of Done: `TurnStart` events carry the correct `triggered_by` value: origin calls emit `User` on turn 0; Branch continuations emit `Branch` on turn 0; all other first turns and all subsequent turns emit `Continuation`.

- [x] **REQ-182:** Add `before_loop: Option<BeforeLoopFn>` and `after_loop: Option<AfterLoopFn>` to `AgentLoopConfig`. `BeforeLoopFn` fires before `AgentStart` — return `false` to abort the loop (emit `AgentEnd { messages: [] }` instead). `AfterLoopFn` fires after `AgentEnd` with the new messages and accumulated usage. Both are wired in `agent_loop` and `agent_loop_continue`. *(Source: [AR])*
  - Depends on: REQ-036, REQ-037
  - Definition of Done: A `before_loop` returning `false` stops the run before `AgentStart`; `after_loop` is called exactly once per loop call, after `AgentEnd`, with correct message and usage values.

- [x] **REQ-183:** Add `before_tool_execution: Option<BeforeToolExecutionFn>` and `after_tool_execution: Option<AfterToolExecutionFn>` to `AgentLoopConfig`. `BeforeToolExecutionFn` fires before `ToolExecutionStart` — return `false` to skip the tool (emit skipped error result). `AfterToolExecutionFn` fires after `ToolExecutionEnd`. *(Source: [AR])*
  - Depends on: REQ-046
  - Definition of Done: A `before_tool_execution` returning `false` for one tool causes that tool to be skipped with an error result; other tools in the same batch are unaffected. `after_tool_execution` is called exactly once per tool call.

- [x] **REQ-184:** Add `before_tool_execution_update: Option<BeforeToolExecutionUpdateFn>` and `after_tool_execution_update: Option<AfterToolExecutionUpdateFn>` to `AgentLoopConfig`. `BeforeToolExecutionUpdateFn` fires before each `ToolExecutionUpdate` — return `false` to suppress the event (tool keeps running, final `ToolResult` unaffected). `AfterToolExecutionUpdateFn` fires after the event when not suppressed. *(Source: [AR])*
  - Depends on: REQ-142
  - Definition of Done: Suppressing an update via `before_tool_execution_update` causes no `ToolExecutionUpdate` event to be emitted; `after_tool_execution_update` is not called for suppressed updates.

- [x] **REQ-185:** Enforce and document the event hook ordering invariant: `before_loop → AgentStart … before_turn → TurnStart … before_tool_execution → ToolExecutionStart … (before_tool_execution_update → ToolExecutionUpdate → after_tool_execution_update)* … ToolExecutionEnd → after_tool_execution … TurnEnd → after_turn … AgentEnd → after_loop`. No hook may fire out of this sequence. *(Source: [AR])*
  - Depends on: REQ-182, REQ-183, REQ-184
  - Definition of Done: An integration test with all hooks registered verifies they fire in the documented order for a multi-turn, multi-tool run.

- [x] **REQ-186:** Add `fn provider_id(&self) -> &str` as a required method on the `StreamProvider` trait (`src/provider/traits.rs`). Implement in all 7 providers: `"anthropic"`, `"openai"`, `"openai_responses"`, `"azure_openai"`, `"google"`, `"google_vertex"`, `"bedrock"`. The `MockProvider` returns `"mock"`. *(Source: [AR])*
  - Depends on: REQ-020
  - Definition of Done: All 8 `StreamProvider` implementations compile with `provider_id()` returning the documented string; existing tests pass unchanged.

- [x] **REQ-187:** Add `config_id: Option<String>` field to `AgentLoopConfig`. When `None`, `Agent::next_loop_id()` auto-derives the effective config ID as `"{provider_id}.{model_slug}[.thinking]"`. When `Some`, the supplied value is used verbatim. Used as the middle segment of `loop_id`: `"{session_id}.{config_id}.{N}"`. *(Source: [AR])*
  - Depends on: REQ-029, REQ-186
  - Definition of Done: Setting `config_id: Some("my-config")` causes `loop_id` to include `"my-config"` as its middle segment; leaving `None` produces an auto-derived segment from provider + model.

- [x] **REQ-188:** Add `agent_id: String` and `session_id: String` fields to `Agent` struct, both initialized to UUID v4 in `Agent::new()`. These are stable for the lifetime of the `Agent` instance and injected into every `AgentContext` built by `Agent::prompt_*` and `continue_loop_*`. *(Source: [AR])*
  - Depends on: REQ-024
  - Definition of Done: All `AgentStart` events emitted by a single `Agent` instance share the same `agent_id` and `session_id` values across multiple `prompt()` calls.

- [x] **REQ-189:** Add `loop_counters: HashMap<String, usize>` and `last_loop_id: Option<String>` to `Agent`. Implement `Agent::next_loop_id(config) -> String`: compute `effective_config_id` from `config.config_id` or auto-derivation; increment the per-`"{session_id}.{effective_config_id}"` counter; return `"{session_id}.{effective_config_id}.{N}"`. Set `last_loop_id` after each `prompt_*` / `continue_loop_*` call. *(Source: [AR])*
  - Depends on: REQ-187, REQ-188
  - Definition of Done: Two `agent_loop` calls on the same agent with the same provider/model produce `loop_id` values ending in `.1` and `.2` respectively; different configs produce independent counters (both `.1`).

- [x] **REQ-190:** Add `agent_id`, `session_id`, `loop_id`, `parent_loop_id`, and `continuation_kind` fields to `AgentContext`. In `agent_loop`, generate and write back `agent_id`/`session_id`/`loop_id` if `None` at entry. `parent_loop_id` and `continuation_kind` remain whatever the caller set. *(Source: [AR])*
  - Depends on: REQ-028, REQ-180, REQ-189
  - Definition of Done: After `agent_loop` returns, `context.agent_id`, `context.session_id`, and `context.loop_id` are all `Some`; a subsequent `agent_loop_continue` on the same context can read them without regenerating.

- [x] **REQ-191:** In `agent_loop_continue`, assert `context.agent_id.is_some()` and `context.session_id.is_some()` with descriptive panic messages. Do not silently generate new UUIDs. *(Source: [AR])*
  - Depends on: REQ-037, REQ-190
  - Definition of Done: Calling `agent_loop_continue` with `agent_id: None` panics with a message referencing "agent_loop_continue requires context.agent_id to be set"; with both fields `Some`, the assertion passes.

- [x] **REQ-192:** Add `agent_id: String`, `session_id: String`, `loop_id: String`, `parent_loop_id: Option<String>`, and `continuation_kind: Option<ContinuationKind>` to `AgentEvent::AgentStart`. Emit these fields from both `agent_loop` and `agent_loop_continue`. `parent_loop_id` is `None` for origin calls; `continuation_kind` is `None` for origin calls and `Some(...)` for continuations. *(Source: [AR])*
  - Depends on: REQ-007, REQ-180, REQ-190, REQ-191
  - Definition of Done: `AgentStart` events from `agent_loop` have `parent_loop_id: None` and `continuation_kind: None`; events from `agent_loop_continue` carry the values set on `AgentContext`.

- [x] **REQ-193:** In `run_loop`, determine `TurnTrigger` for the first turn based on `context.continuation_kind`: `Branch(..)` → `TurnTrigger::Branch`; any other `Some(..)` → `TurnTrigger::Continuation`; `None` → `config.first_turn_trigger` (default `User`; `SubAgent` for sub-agent callers). All subsequent turns use `TurnTrigger::Continuation`. Emit `triggered_by` in `AgentEvent::TurnStart`. *(Source: [AR])*
  - Depends on: REQ-038, REQ-181
  - Definition of Done: A `Branch` continuation emits `TurnTrigger::Branch` on turn 0 and `TurnTrigger::Continuation` on all subsequent turns; a `Default` continuation emits `TurnTrigger::Continuation` on all turns.

- [x] **REQ-194:** Add `child_loop_id: Option<String>` to both `ToolResult` and `AgentEvent::ToolExecutionEnd`. Sub-agent tools set `ToolResult.child_loop_id` to the child loop's `loop_id` after `agent_loop` completes. `execute_single_tool` propagates `result.child_loop_id` into `ToolExecutionEnd`. Non-sub-agent tools leave both fields `None`. *(Source: [AR])*
  - Depends on: REQ-010, REQ-046, REQ-148, REQ-190
  - Definition of Done: A `ToolExecutionEnd` event from a `SubAgentTool` call carries a non-`None` `child_loop_id`; the same `loop_id` appears in the child's `AgentStart` event.

- [x] **REQ-195:** Add `SubAgentTool::with_parent_loop_id(loop_id: String)` builder method. When set, the child `AgentContext` built inside `execute()` has `parent_loop_id: Some(loop_id)`. The child's `AgentStart` event thus carries `parent_loop_id`, enabling ancestry tracing from child back to parent. *(Source: [AR])*
  - Depends on: REQ-148, REQ-190
  - Definition of Done: A sub-agent tool configured with `with_parent_loop_id("parent.loop.1")` emits a child `AgentStart` event with `parent_loop_id: Some("parent.loop.1")`.

---

### Milestone 4.10 — Evaluational Parallelism

- [x] **REQ-196:** Migrate `AgentContext.tools` from `Vec<Box<dyn AgentTool>>` to `Vec<Arc<dyn AgentTool>>`. Add `#[derive(Clone)]` to `AgentContext`. Update `Agent::set_tools`, `BasicAgent::with_tools`, `default_tools()` return type, and all push sites in `BasicAgent` (sub-agent, openapi, mcp). Remove `ArcToolWrapper` from `sub_agent.rs`. *(Implemented)*
  - Depends on: REQ-028, REQ-046
  - Definition of Done: `AgentContext: Clone`; all existing tests pass; `ArcToolWrapper` deleted.

- [x] **REQ-197:** Add `Usage::combine(&self, other: &Usage) -> Usage` method for summing usage across branches. *(Implemented)*
  - Depends on: —
  - Definition of Done: `usage_a.combine(&usage_b)` returns a `Usage` with all fields summed.

- [x] **REQ-198:** Add `ParallelLoopOutcome` and `ParallelLoopResult` structs to `types.rs`. Add `AgentEvent::ParallelLoopStart { session_id, loop_ids, timestamp }` and `AgentEvent::ParallelLoopEnd { session_id, selected_loop_id, selected_config_index, evaluation_usage, timestamp }` variants to `AgentEvent`. *(Implemented)*
  - Depends on: REQ-190, REQ-197
  - Definition of Done: Both structs construct and the enum variants match correctly.

- [x] **REQ-199:** Define `EvaluationDecision` enum and `EvaluationStrategy` trait in `types.rs`. Trait method: `evaluate(prompts, outcomes, tx, cancel) -> (EvaluationDecision, Usage)`. Placed in `types.rs` (not `evaluation.rs`) to avoid a circular dependency with `agent_loop.rs`. *(Implemented)*
  - Depends on: REQ-198
  - Definition of Done: Custom implementations compile by importing from `crate::types` or `crate::evaluation`.

- [x] **REQ-200:** Create `src/agent_loop/evaluation.rs` with five built-in `EvaluationStrategy` implementations: `TransparentEvaluation` (single-branch pass-through), `PickFirstEvaluation` (always index 0), `TokenEfficientEvaluation` (lowest `total_tokens`), `ElaborateEvaluation` (highest `total_tokens`), `LlmJudgeEvaluation { judge_config, system_prompt }`. *(Implemented)*
  - Depends on: REQ-199
  - Definition of Done: All five strategies implement `EvaluationStrategy`; unit tests pass for each.

- [x] **REQ-201:** `LlmJudgeEvaluation` — judge prompt construction: extract original query text from user messages in `prompts` only; extract final assistant text from each branch's `new_messages` (strip tool calls, tool results, intermediate turns). Build numbered judge prompt; run `agent_loop` with `judge_config`; parse first integer from reply; inherit `session_id` from branches for traceability. *(Implemented)*
  - Depends on: REQ-200
  - Definition of Done: Judge receives clean final responses, not raw tool traces; judge `AgentStart` has same `session_id` as branches.

- [x] **REQ-202:** `LlmJudgeEvaluation` — judge's comprehension criteria: all N branch final responses must fit in the judge model's context budget simultaneously. Apply iterative multi-tier compaction: tier 1 (last 80 lines), tier 2 (first+last paragraph), tier 3 (hard char limit derived from budget / N). Budget derives from `judge_config.context_config.max_context_tokens` (if set). Emit `AgentEvent::ProgressMessage` warning if criteria cannot be satisfied after tier 3. Selected winner always returns the original uncompacted messages. *(Implemented)*
  - Depends on: REQ-201
  - Definition of Done: With a tight `context_config.max_context_tokens`, compaction fires and a warning is emitted; selected output is the original branch content.

- [x] **REQ-203:** Add `derive_config_segment(config: &AgentLoopConfig) -> String` helper (pub crate) and `run_parallel_branches(...)` internal async function to `agent_loop.rs`. Add `agent_loop_parallel(prompts, base_context, configs, strategy, tx, cancel) -> ParallelLoopResult` public async function. Uses `futures::future::join_all` for branch concurrency (avoids `'static` bound on `AgentLoopConfig` hooks). Per-branch forwarder task (`tokio::spawn`) captures usage from `AgentEnd`. *(Implemented)*
  - Depends on: REQ-196, REQ-199
  - Definition of Done: `agent_loop_parallel` with 2 configs runs both branches, emits `ParallelLoopStart`/`ParallelLoopEnd`, and returns correct `selected_index`.

- [x] **REQ-204:** Export `evaluation` module from `lib.rs`; re-export `agent_loop_parallel` and all five evaluation strategies at crate root. *(Implemented)*
  - Depends on: REQ-200, REQ-203
  - Definition of Done: `use phi_core::{agent_loop_parallel, PickFirstEvaluation, LlmJudgeEvaluation}` compiles.

- [x] **REQ-205:** `agent_loop_parallel` routes to `agent_loop_continue` when `prompts` is empty. *(Implemented)*
  - Depends on: REQ-203
  - Definition of Done: Calling `agent_loop_parallel(vec![], ctx_with_user_msg, ...)` dispatches each branch via `agent_loop_continue` and returns a valid `ParallelLoopResult`.

- [x] **REQ-206:** Add `original_context_len: usize` to `ParallelLoopOutcome`. *(Implemented)*
  - Depends on: REQ-198, REQ-205
  - Definition of Done: `outcome.context.messages[..outcome.original_context_len]` is the shared base context; `[original_context_len..]` are branch-produced messages.

- [x] **REQ-207:** `LlmJudgeEvaluation` extracts prior conversation context and query from `context.messages[..original_context_len]` in `agent_loop_continue` mode; includes formatted prior-context transcript in judge prompt. *(Implemented)*
  - Depends on: REQ-201, REQ-206
  - Definition of Done: When `prompts` is empty, the judge prompt contains `"Prior conversation context:"` and `"Original query:"` sections derived from the original context.

- [x] **REQ-208:** Replace single-pass output compaction with 2-iteration `compact_for_judge`: Iteration 1 compacts prior context only (outputs intact); Iteration 2 compacts both independently. *(Implemented)*
  - Depends on: REQ-202, REQ-207
  - Definition of Done: Under a tight token budget, outputs remain uncompacted as long as prior-context compaction alone can satisfy the criteria.

- [x] **REQ-209:** Updated `build_judge_user_message` includes optional prior context section before the query. *(Implemented)*
  - Depends on: REQ-207
  - Definition of Done: Judge prompt includes `"Prior conversation context:\n<transcript>"` when prior context is non-empty; omitted when empty (fresh-session case).

***

## Level 5 — Creative
> **Goal:** The system surpasses the original. Sub-agent delegation,
> OpenAPI tool generation, advanced Anthropic protocol features, and all
> documented ambiguities are resolved with principled design decisions.

**Completion Criteria:** `SubAgentTool` works end-to-end; the OpenAPI adapter
generates callable tools from a spec file; all `[AMBIGUOUS]` items have a
documented resolution; performance benchmarks for parallel tool execution
meet or exceed documented expectations.

---

### Milestone 4.11 — Persistent Session Layer

- [x] **REQ-210:** Add `loop_id: String` to all `AgentEvent` variants that lacked it (`AgentEnd`, `TurnStart`, `TurnEnd`, `MessageStart`, `MessageUpdate`, `MessageEnd`, `ToolExecutionStart`, `ToolExecutionUpdate`, `ToolExecutionEnd`, `ProgressMessage`, `InputRejected`). Add `Serialize, Deserialize` to `AgentEvent`, `ContinuationKind`, `TurnTrigger`, `StreamDelta`. Thread `loop_id` through all emission sites in `agent_loop.rs` and `evaluation.rs`. *(Source: [AR])*
  - Depends on: REQ-007, REQ-114
  - Definition of Done: All `AgentEvent` variants carry `loop_id`; events from interleaved parallel branches can be unambiguously attributed to the correct `LoopRecord`.

- [x] **REQ-211:** Define `Session`, `LoopRecord`, `LoopEvent`, and `LoopConfigSnapshot` types in `src/session/`. `Session` contains an ordered `Vec<LoopRecord>`; `LoopRecord` holds identity fields (`loop_id`, `session_id`, `agent_id`), timing, status, messages (from `AgentEnd.messages`), usage, events, and tree links (`children_loop_ids`, `parent_loop_id`). `LoopConfigSnapshot` stores `model`, `provider`, `config_id`. *(Source: [AR])*
  - Depends on: REQ-210
  - Definition of Done: All types serialize/deserialize (JSON round-trip lossless); `Session.total_usage()` sums `LoopRecord.usage` across all loops.

- [x] **REQ-212:** Define `ChildLoopRef` and `SpawnRef` for bidirectional cross-session sub-agent tracking. `ChildLoopRef` is stored in `LoopRecord.child_loop_refs` (parent → child); `SpawnRef` is stored in `Session.parent_spawn_ref` (child → parent). Both carry `tool_call_id`, `tool_name`, and cross-session ids. *(Source: [AR])*
  - Depends on: REQ-211
  - Definition of Done: A parent session's `LoopRecord.child_loop_refs` can be used to load and link the child session.

- [x] **REQ-213:** Define `ParallelGroupRecord` and implement `LoopStatus::Pending` pre-registration in `SessionRecorder`. When `ParallelLoopStart` arrives, pre-create `LoopRecord { status: Pending }` for each branch loop_id so the group is registered before `AgentStart` fires for each branch. `ParallelLoopEnd` retroactively sets `ParallelGroupRecord` on all branch records. *(Source: [AR])*
  - Depends on: REQ-211
  - Definition of Done: After a parallel loop completes, all branch `LoopRecord`s have `parallel_group` set; exactly one has `is_selected = true`.

- [x] **REQ-214:** Implement `SessionRecorder` with `PerSessionId` formation policy. `on_event(event)` routes events by `loop_id`: creates `Session` on first-seen `session_id` from `AgentStart`; closes `LoopRecord` on `AgentEnd`; appends bidirectional tree links; handles sub-agent `SpawnRef` enrichment from `ToolExecutionEnd.child_loop_id`. *(Source: [AR])*
  - Depends on: REQ-211, REQ-212, REQ-213
  - Definition of Done: `test_session_recorder_single_loop`, `test_session_recorder_continuation`, `test_session_recorder_bidirectional_tree`, `test_session_recorder_continuation_kind` all pass.

- [x] **REQ-215:** Add `BasicAgent::new_session()` and `check_and_rotate(threshold)` to `BasicAgent`. Add `last_active_at: Option<DateTime<Utc>>` field; update `prompt_messages_with_sender` to record it. `new_session()` rotates `session_id`, clears `loop_counters` and `last_loop_id`. *(Source: [AR])*
  - Depends on: REQ-214
  - Definition of Done: `test_basic_agent_new_session` and `test_basic_agent_check_and_rotate` pass.

- [x] **REQ-216:** Implement `save_session`, `load_session`, `list_session_ids` persistence API. File layout: `{dir}/{session_id}.json` (pretty-printed JSON, flat directory). `list_session_ids` returns ids sorted by modification time (newest first). *(Source: [AR])*
  - Depends on: REQ-211
  - Definition of Done: `test_session_save_load_roundtrip` and `test_session_list_ids` pass; saved files are valid, human-readable JSON.

- [x] **REQ-217:** Implement `load_sessions_for_agent` and `delete_session`. `load_sessions_for_agent` loads all sessions in `dir` and filters by `agent_id`. `delete_session` removes the file; returns `SessionError::NotFound` if absent. *(Source: [AR])*
  - Depends on: REQ-216
  - Definition of Done: `test_session_delete` passes; `load_sessions_for_agent` returns only sessions with the matching `agent_id`.

- [x] **REQ-218:** Implement `Session` tree navigation methods: `root_loops()`, `children_of(loop_id)`, `parallel_siblings(loop_id)`, `get_loop(loop_id)`. Export all public session types from `src/lib.rs`. *(Source: [AR])*
  - Depends on: REQ-211
  - Definition of Done: `test_session_recorder_parallel_group` and `test_session_recorder_bidirectional_tree` exercise all navigation methods; all assertions pass.

- [x] **REQ-219:** Write `docs/concepts/sessions.md` documenting: Overview, Session Formation (three modes), LoopRecord Anatomy (field table, `LoopStatus` lifecycle, `continuation_kind` classification, `LoopConfigSnapshot` rationale), Loop Tree Navigation, Cross-Session Sub-Agent Tracking, Parallel Evaluation Groups, `SessionRecorder` usage with code example, Persistence API, and 9 Design Decisions (each with decision / why / rejected alternative). *(Source: [AR])*
  - Depends on: REQ-211 – REQ-218
  - Definition of Done: `docs/concepts/sessions.md` exists; covers all listed sections; code examples are syntactically valid Rust.

- [x] **REQ-220:** Update `docs/specs/architecture.md`: add `SessionStore` component section, add `SessionStore` to dependency graph, update `AgentEvent` variant table to document `loop_id: String` on all applicable variants, add `Session`/`LoopRecord`/`SessionRecorder` data model entries, add `new_session()` / `check_and_rotate()` / `last_active_at` to BasicAgent interface table. Update `docs/specs/roadmap.md` with this milestone. *(Source: [AR])*
  - Depends on: REQ-219
  - Definition of Done: Both spec files updated; all new types and methods are documented.

- [x] **REQ-221:** Fix `SessionRecorder` `SpawnRef` enrichment to handle the case where the child session has already been moved to `completed` before the parent's `ToolExecutionEnd` fires. Currently, `ToolExecutionEnd` only searches `open_sessions` for the child session to enrich `parent_spawn_ref.tool_call_id` / `tool_name`; if `flush()` was called between `child AgentEnd` and the parent's `ToolExecutionEnd` (e.g. periodic batch checkpointing in production), the child session is in `completed` and the enrichment is silently skipped — leaving `tool_call_id: ""` and `tool_name: ""` on the `SpawnRef` permanently. Fix by also searching `completed` sessions in the enrichment step, or by deferring child-session promotion to `completed` until the parent loop also closes. *(Source: post-sprint review)*
  - Depends on: REQ-214
  - Definition of Done: A test demonstrates that calling `flush()` between `child AgentEnd` and `parent ToolExecutionEnd` still produces a fully-enriched `SpawnRef` on the child session.

---

### Milestone 5.1 — Sub-Agent Delegation

- [x] **REQ-148:** Implement `SubAgentTool::execute`: validate `params["task"]` is non-empty; build a fresh `AgentContext` (empty messages, own toolset); build `AgentLoopConfig` with `max_turns` guard (default 10), no steering/follow-ups, no input filters; spawn child `agent_loop`; await result; call `extract_final_text`. *(Source: [PS])*
  - Depends on: REQ-036, REQ-157
  - Definition of Done: A sub-agent tool registered on a parent agent completes a delegated task and returns the child agent's final text as a `ToolResult`.

- [x] **REQ-149:** Implement `extract_final_text(messages) -> String`: scan messages in reverse for the last `Assistant` message with `Text` content blocks; join and return them; fall back to `"(sub-agent produced no text output)"`. *(Source: [PS])*
  - Depends on: REQ-002
  - Definition of Done: `extract_final_text` returns the text of the last assistant message; an all-tool-call assistant message returns the fallback string.

- [x] **REQ-150:** Sub-agent event forwarding: spawn a task to consume child `AgentEvent`s and forward them to parent channel as `ToolExecutionUpdate` (for `MessageUpdate::Text`) and `ProgressMessage` (for child `ProgressMessage`) events. *(Source: [PS])*
  - Depends on: REQ-007, REQ-148
  - Definition of Done: Parent event stream includes `ToolExecutionUpdate` events showing the sub-agent's text generation in real time.

- [x] **REQ-151:** Implement `SubAgentTool` builder: `SubAgentTool::new(name, model_config).with_system_prompt(...).with_tools(...).with_max_turns(...).with_thinking(...)`. *(Source: [AR])*
  - Depends on: REQ-021, REQ-148
  - Definition of Done: A fully configured `SubAgentTool` can be added to a parent agent's tool list via `with_tools`.

---

### Milestone 5.2 — OpenAPI Adapter (Feature-Gated)

- [x] **REQ-152:** Implement `OpenApiAdapter::from_str(spec, config, filter)`: auto-detect JSON vs YAML (first non-whitespace char `{` or `[` → JSON, else YAML); parse OpenAPI 3.x spec; resolve base URL; generate one `OpenApiToolAdapter` per matching operation. *(Source: [AR])*
  - Depends on: REQ-153, REQ-154, REQ-155, REQ-156
  - Definition of Done: A valid OpenAPI 3.x spec string (JSON and YAML both) produces one tool adapter per operation with an `operationId`.

- [x] **REQ-153:** Classify parameters: `path` → URL substitution with RFC 3986 percent-encoding; `query` → query string; `header` → request headers; `cookie` → skip with no error; `requestBody` (application/json only) → keyed as `"body"` (or `"_request_body"` on name collision). *(Source: [AR])*
  - Depends on: REQ-021
  - Definition of Done: Path parameters appear in the URL; query parameters appear in the query string; cookie parameters are silently ignored.

- [x] **REQ-154:** Implement the HTTP execution pipeline per tool call: validate params, substitute path params, build URL, chain query/header params, apply `OpenApiAuth`, apply `custom_headers`, optionally attach JSON body, send request, read body, truncate at `max_response_bytes` on a UTF-8 boundary, return `"{METHOD} {URL} → {STATUS}\n\n{BODY}"`. *(Source: [AR])*
  - Depends on: REQ-021
  - Definition of Done: A POST to a test endpoint with path, query, and body params produces the documented return format.

- [x] **REQ-155:** Implement `OperationFilter`: `All` (include everything with an `operationId`); `ByOperationId(ids)` (include only listed IDs); `ByTag(tags)` (include operations tagged with any listed tag); `ByPathPrefix(prefix)` (include operations whose path starts with prefix). Operations without `operationId` always emit a warning and are skipped. *(Source: [AR])*
  - Depends on: REQ-152
  - Definition of Done: Each filter variant correctly includes/excludes operations; an operation without `operationId` logs a warning and is excluded regardless of filter.

- [x] **REQ-156:** Apply optional `name_prefix` from `OpenApiConfig`: tool name becomes `"{prefix}__{operationId}"` when set. *(Source: [AR])*
  - Depends on: REQ-152
  - Definition of Done: With `name_prefix: Some("myapi")`, the tool for `operationId: "getUser"` is named `"myapi__getUser"`.

- [x] **REQ-157:** Implement `from_file(path, config, filter)` (async file read) and `from_url(url, config, filter)` (HTTP GET via HTTP client). *(Source: [AR])*
  - Depends on: REQ-152
  - Definition of Done: Both sources produce identical tool lists as `from_str` on the same spec content.

- [x] **REQ-158:** Implement `Agent::with_openapi_file`, `with_openapi_url`, `with_openapi_spec` builders on `Agent`. Gate the entire `openapi` module behind an `openapi` feature flag. *(Source: [AR])*
  - Depends on: REQ-026, REQ-157
  - Definition of Done: Without the `openapi` feature, the code compiles successfully without the adapter; with it, all three builders are available.

---

### Milestone 5.3 — Advanced Anthropic Protocol

- [x] **REQ-159:** Implement Anthropic OAuth auth path: when `model_config` indicates OAuth, use `Authorization: Bearer {TOKEN}` header plus beta headers `claude-code-20250219,oauth-2025-04-20,fine-grained-tool-streaming-2025-05-14`, `x-app: cli`, `anthropic-dangerous-direct-browser-access: true`, `user-agent: claude-cli/2.1.2`. *(Source: [AR])*
  - Depends on: REQ-040
  - Definition of Done: An OAuth-configured provider sends all documented headers; standard API key auth sends the standard `x-api-key` header.

- [x] **REQ-160:** Implement Anthropic `InputJsonDelta` tool-argument streaming: buffer incremental `InputJsonDelta` text fragments in `arguments["__partial_json"]`; parse the complete accumulated string as JSON on `content_block_stop`. *(Source: [AR])*
  - Depends on: REQ-040
  - Definition of Done: A tool call streamed in 5 `InputJsonDelta` fragments produces a single, complete, parseable JSON `arguments` object.

---

### Milestone 5.4 — Ambiguity Resolutions

- [x] **REQ-161:** [AMBIGUOUS] Standardize `AgentEnd` emission on abort: define and document whether `AgentEnd` is emitted when cancellation is detected at various checkpoints (start of loop, mid-stream, mid-tool). Implement a consistent policy. *(Source: [PS])*
  - Depends on: REQ-067, REQ-082
  - Definition of Done: The chosen policy is documented; behavior is consistent regardless of where in the loop cancellation is detected.

- [x] **REQ-162:** `TokenCounter` trait in `context/token.rs` with `HeuristicTokenCounter` (chars/4) as default. Pluggable via `ContextConfig.token_counter`. Threaded through all hot-path call sites. *(Source: [OV])*
  - Depends on: REQ-054
  - Definition of Done: A `TokenCounter` trait or injection point exists; the default implementation uses the 4-char heuristic; a precise implementation can be substituted via configuration.

- [x] **REQ-163:** [AMBIGUOUS] Define sub-agent error propagation: document what `execute()` returns when the child `agent_loop` produces only error/empty messages. Implement the `extract_final_text` fallback consistently. *(Source: [PS])*
  - Depends on: REQ-149
  - Definition of Done: The policy is documented; child agent error messages are reflected in the fallback text or surfaced as `ToolError::Failed`.

***

## Level 6 — Boss
> **Goal:** The system is exceptional. It is fully tested, scalable,
> developer-friendly, and operates as a platform with a clear public
> API contract and operational runbooks.

**Completion Criteria:** The system passes load tests at 10x expected
tool concurrency. Full test coverage includes unit, integration, property-based,
and end-to-end tests. Public API documentation is complete. Operational
runbooks cover all known failure modes.

---

### Milestone 6.1 — Full Test Suite

- [ ] **REQ-164:** Unit tests for all three compaction levels (`level1`, `level2`, `level3`) including: no-op when under budget; exact budget boundary; message count edge cases (fewer messages than `keep_recent`/`keep_first`); correct ordering of head+marker+tail in level 3. *(Source: [AR])*
  - Depends on: REQ-056 through REQ-059
  - Definition of Done: All edge cases identified above have dedicated test cases that pass.

- [ ] **REQ-165:** Property-based tests for `compact_messages`: for any valid `(messages, config)` input, `total_tokens(compact_messages(messages, config)) <= budget`. *(Source: [AR])*
  - Depends on: REQ-056
  - Definition of Done: 10,000 random test cases all satisfy the budget invariant without panic.

- [ ] **REQ-166:** Unit tests for `delay_for_attempt`: verify exponential growth; verify jitter stays in `[0.8, 1.2]` range over 10,000 samples; verify `max_delay_ms` cap is respected. *(Source: [AR])*
  - Depends on: REQ-071
  - Definition of Done: All three assertions pass across the full retry range.

- [ ] **REQ-167:** Integration tests for each of the 7 provider protocols using a mock HTTP server: correct request format, correct response parsing, correct `StopReason` mapping, correct tool-call extraction. *(Source: [AR])*
  - Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
  - Definition of Done: Each provider has at least one happy-path integration test and one error-path test using a local mock server.

- [ ] **REQ-168:** Integration test for MCP stdio transport: spawn a minimal mock MCP server subprocess; verify initialize handshake, tool listing, and tool execution. *(Source: [AR])*
  - Depends on: REQ-114 through REQ-119
  - Definition of Done: The mock MCP server can be connected to, queried, and called; all three phases produce correct results.

- [ ] **REQ-169:** End-to-end agent loop tests using `MockProvider`: test single-turn text response; multi-turn tool call cycle; steering injection mid-run; follow-up queue; execution limit enforcement; context compaction trigger; input filter rejection. *(Source: [AR])*
  - Depends on: REQ-036 through REQ-090
  - Definition of Done: All seven scenarios have a passing automated test.

---

### Milestone 6.2 — Load and Scale Testing

- [ ] **REQ-170:** Load test: run 100 parallel agents each with 10 concurrent tool calls using `MockProvider`. Verify no data races, no deadlocks, correct result ordering, no memory leaks. *(Source: [AR])*
  - Depends on: REQ-045, REQ-085
  - Definition of Done: 1,000 total tool calls complete correctly with no panics and tool results are in original call order.

- [ ] **REQ-171:** Load test: run a single agent for 1,000 turns with compaction enabled. Verify token estimates stay bounded; no unbounded memory growth; compaction fires when expected. *(Source: [AR])*
  - Depends on: REQ-056, REQ-060
  - Definition of Done: Memory usage stabilizes after compaction; no messages are dropped that violate `keep_first`/`keep_recent` invariants.

- [ ] **REQ-172:** Memory profile: verify `Agent.messages` does not grow unboundedly in a long conversation with compaction enabled. *(Source: [AR])*
  - Depends on: REQ-056, REQ-060
  - Definition of Done: Message count stays within `keep_first + keep_recent + small_constant` after steady state is reached.

---

### Milestone 6.3 — Public API Contract and Documentation

- [ ] **REQ-173:** Publish complete API reference documentation for all public types, traits, and functions with usage examples for each primary use case from `../reference/glossary.md`. *(Source: [OV])*
  - Depends on: REQ-001 through REQ-163
  - Definition of Done: A developer with no prior context can build a working coding assistant and CLI REPL from the docs alone.

- [ ] **REQ-174:** Document all 7 provider integration contracts: authentication method, endpoint pattern, request format, response parsing notes, any quirks (e.g., Bedrock ndjson, Google tool ID generation, Azure `api-key` header). *(Source: [AR])*
  - Depends on: REQ-040 through REQ-042, REQ-120 through REQ-124
  - Definition of Done: Each provider has a documentation page listing all fields from the integration contract table.

- [ ] **REQ-175:** Write and publish working example implementations: (1) CLI REPL with `/quit`, `/clear`, `/model` commands; (2) coding assistant with all built-in tools; (3) multi-agent pipeline with `SubAgentTool`. *(Source: [OV])*
  - Depends on: REQ-053, REQ-148
  - Definition of Done: All three examples compile and run end-to-end; the CLI REPL handles all three slash commands.

- [ ] **REQ-176:** Publish AgentSkills standard compliance documentation and MCP integration guide. *(Source: [OV])*
  - Depends on: REQ-109 through REQ-113, REQ-114 through REQ-119
  - Definition of Done: Both guides include a "getting started" section that results in a working integration.

---

### Milestone 6.4 — Developer Tooling and Operational Readiness

- [ ] **REQ-177:** Package and publish the library with proper semantic versioning. The `openapi` feature is opt-in. Document all feature flags. *(Source: [AR])*
  - Depends on: REQ-158
  - Definition of Done: Library installs as a dependency; `openapi` feature is absent from the default build; enabling it adds the adapter without breaking existing code.

- [ ] **REQ-178:** CI pipeline: run unit tests, integration tests (with mock servers), and `openapi`-feature tests on every commit. Gate provider live tests behind API key secrets. *(Source: [AR])*
  - Depends on: REQ-164 through REQ-169
  - Definition of Done: CI passes on every commit; provider live tests run in a separate gated workflow.

- [ ] **REQ-179:** Operational runbook covering: retry tuning (when to adjust `RetryConfig`); context overflow handling (choosing `ContextConfig` values); provider failover (switching providers on persistent failures); MCP server crash recovery; performance profiling guide. *(Source: [AR])*
  - Depends on: REQ-071 through REQ-077
  - Definition of Done: The runbook covers all five topics with actionable decision trees.

***

## Requirement Index

| REQ | Description | Level | Milestone | Source | Depends On |
|-----|-------------|-------|-----------|--------|------------|
| REQ-001 | `Content` enum (Text, Image, Thinking, ToolCall) | 1 | 1.1 | [AR] | — |
| REQ-002 | `Message` enum (User, Assistant, ToolResult) | 1 | 1.1 | [AR] | REQ-001, REQ-005, REQ-006 |
| REQ-003 | `AgentMessage` enum (Llm, Extension) | 1 | 1.1 | [AR] | REQ-002, REQ-004 |
| REQ-004 | `ExtensionMessage` struct | 1 | 1.1 | [AR] | — |
| REQ-005 | `StopReason` enum | 1 | 1.1 | [AR] | — |
| REQ-006 | `Usage` struct with `cache_hit_rate()` | 1 | 1.1 | [AR] | — |
| REQ-007 | `AgentEvent` enum (all variants) | 1 | 1.1 | [AR] | REQ-002, REQ-008 |
| REQ-008 | `StreamDelta` enum | 1 | 1.1 | [AR] | — |
| REQ-009 | `ToolContext` struct | 1 | 1.1 | [AR] | — |
| REQ-010 | `ToolResult` and `ToolError` types | 1 | 1.1 | [AR] | REQ-001 |
| REQ-011 | `ContextConfig` struct with defaults | 1 | 1.1 | [AR] | — |
| REQ-012 | `ExecutionLimits` and `ExecutionTracker` | 1 | 1.1 | [AR] | — |
| REQ-013 | `RetryConfig` with defaults | 1 | 1.1 | [AR] | — |
| REQ-014 | `CacheConfig` and `CacheStrategy` | 1 | 1.1 | [AR] | — |
| REQ-015 | `StreamConfig` struct | 1 | 1.1 | [AR] | REQ-014, REQ-016 |
| REQ-016 | `ToolDefinition` struct | 1 | 1.1 | [AR] | — |
| REQ-017 | `QueueMode` enum | 1 | 1.1 | [AR] | — |
| REQ-018 | Full Serialize/Deserialize on AgentMessage tree | 1 | 1.1 | [OV] | REQ-001–017 |
| REQ-019 | `ThinkingLevel` enum | 1 | 1.1 | [OV] | — |
| REQ-020 | `StreamProvider` trait and `ProviderError` enum | 1 | 1.2 | [AR] | REQ-002, REQ-015 |
| REQ-021 | `AgentTool` trait | 1 | 1.2 | [AR] | REQ-009, REQ-010 |
| REQ-022 | `InputFilter` trait | 1 | 1.2 | [OV] | — |
| REQ-023 | `CompactionStrategy` trait | 1 | 1.2 | [AR] | REQ-003, REQ-011 |
| REQ-024 | `Agent::new()` with all field defaults | 1 | 1.3 | [PS] | REQ-011–017, REQ-019–020 |
| REQ-025 | Builder methods: system_prompt, model, api_key, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-026 | Builder methods: tools, context_config, limits, etc. | 1 | 1.3 | [PS] | REQ-024 |
| REQ-027 | Steering/follow-up queues as Arc<Mutex<Vec>> | 1 | 1.3 | [AR] | REQ-003, REQ-024 |
| REQ-028 | `AgentContext` struct | 1 | 1.4 | [AR] | REQ-003, REQ-021 |
| REQ-029 | `AgentLoopConfig` struct | 1 | 1.4 | [OV] | REQ-011–017, REQ-023 |
| REQ-030 | `MockProvider` implementation | 1 | 1.5 | [AR] | REQ-020 |
| REQ-031 | Smoke test: Agent constructs without error | 1 | 1.5 | [OV] | REQ-024–030 |
| REQ-032 | Unbounded async event channel | 2 | 2.1 | [AR] | REQ-007 |
| REQ-033 | `CancellationToken` with child_token propagation | 2 | 2.1 | [AR] | — |
| REQ-034 | `Agent::prompt()` entry point | 2 | 2.2 | [PS] | REQ-002, REQ-035 |
| REQ-035 | `Agent::prompt_messages_with_sender()` | 2 | 2.2 | [PS] | REQ-027–029, REQ-033, REQ-036 |
| REQ-036 | `agent_loop()` implementation | 2 | 2.3 | [PS] | REQ-032, REQ-037 |
| REQ-037 | `agent_loop_continue()` implementation | 2 | 2.3 | [PS] | REQ-036 |
| REQ-038 | `run_loop()` inner loop (happy path) | 2 | 2.3 | [PS] | REQ-039, REQ-045, REQ-060 |
| REQ-039 | `stream_assistant_response()` (no retry) | 2 | 2.4 | [PS] | REQ-007–008, REQ-015, REQ-020, REQ-032 |
| REQ-040 | `AnthropicProvider::stream()` | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-041 | `OpenAiCompatProvider::stream()` | 2 | 2.4 | [AR] | REQ-020, REQ-039 |
| REQ-042 | `ProviderRegistry` with default() | 2 | 2.4 | [AR] | REQ-040, REQ-041 |
| REQ-043 | `StopReason` determination in providers | 2 | 2.4 | [PS] | REQ-005, REQ-040–041 |
| REQ-044 | Filter Extension messages before LLM call | 2 | 2.4 | [AR] | REQ-003, REQ-015 |
| REQ-045 | `execute_tool_calls()` (Parallel dispatch) | 2 | 2.5 | [PS] | REQ-046 |
| REQ-046 | `execute_single_tool()` | 2 | 2.5 | [PS] | REQ-007, REQ-009–010, REQ-021, REQ-033 |
| REQ-047 | `BashTool::execute()` (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-048 | `ReadFileTool::execute()` (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-049 | `WriteFileTool::execute()` | 2 | 2.5 | [AR] | REQ-010, REQ-021 |
| REQ-050 | `EditFileTool::execute()` (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-051 | `ListFilesTool::execute()` (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-052 | `SearchTool::execute()` (basic) | 2 | 2.5 | [PS] | REQ-010, REQ-021 |
| REQ-053 | `default_tools()` returning all 6 tools | 2 | 2.5 | [AR] | REQ-047–052 |
| REQ-054 | `estimate_tokens()` heuristic | 2 | 2.6 | [PS] | — |
| REQ-055 | `content_tokens()` and `message_tokens()` | 2 | 2.6 | [PS] | REQ-001, REQ-003, REQ-054 |
| REQ-056 | `compact_messages()` 3-tier cascade | 2 | 2.6 | [PS] | REQ-055, REQ-057–059 |
| REQ-057 | `level1_truncate_tool_outputs()` | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-058 | `level2_summarize_old_turns()` | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-059 | `level3_drop_middle()` and `keep_within_budget()` | 2 | 2.6 | [PS] | REQ-003, REQ-054 |
| REQ-060 | Integrate compaction in `run_loop` | 2 | 2.6 | [PS] | REQ-038, REQ-056 |
| REQ-061 | `ExecutionTracker::record_turn()` and `check_limits()` | 2 | 2.7 | [AR] | REQ-012 |
| REQ-062 | Execution limit enforcement in `run_loop` | 2 | 2.7 | [PS] | REQ-038, REQ-061 |
| REQ-063 | `Agent::save_messages()` | 2 | 2.8 | [OV] | REQ-018 |
| REQ-064 | `Agent::restore_messages()` | 2 | 2.8 | [OV] | REQ-018, REQ-063 |
| REQ-065 | `Agent::reset()` | 2 | 2.8 | [AR] | REQ-033 |
| REQ-066 | `Agent::steer()` and `Agent::follow_up()` | 2 | 2.8 | [AR] | REQ-027 |
| REQ-067 | `Agent::abort()` | 2 | 2.8 | [AR] | REQ-033, REQ-035 |
| REQ-068 | Input filter chain execution | 3 | 3.1 | [PS] | REQ-022, REQ-036 |
| REQ-069 | `Reject` → emit `InputRejected` + `AgentEnd([])` | 3 | 3.1 | [PS] | REQ-068 |
| REQ-070 | `Warn` → append warning text to last user message | 3 | 3.1 | [PS] | REQ-068 |
| REQ-071 | `delay_for_attempt()` exponential backoff with jitter | 3 | 3.2 | [PS] | REQ-013 |
| REQ-072 | `is_retryable()` on `ProviderError` | 3 | 3.2 | [AR] | REQ-020 |
| REQ-073 | `retry_after()` on `ProviderError` | 3 | 3.2 | [AR] | REQ-020 |
| REQ-074 | Retry loop in `stream_assistant_response` | 3 | 3.2 | [PS] | REQ-039, REQ-071–073 |
| REQ-075 | `ProviderError::classify()` HTTP status routing | 3 | 3.3 | [PS] | REQ-020 |
| REQ-076 | `is_context_overflow()` phrase matching | 3 | 3.3 | [PS] | — |
| REQ-077 | Context overflow recovery trigger | 3 | 3.3 | [AR] | REQ-056, REQ-075–076 |
| REQ-078 | `ToolError::Failed`/`InvalidArgs` → error ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-079 | `ToolError::NotFound` → "Tool X not found" | 3 | 3.4 | [PS] | REQ-046 |
| REQ-080 | `ToolError::Cancelled` → "Skipped" ToolResult | 3 | 3.4 | [AR] | REQ-010, REQ-046 |
| REQ-081 | Error stop reason handling in `run_loop` | 3 | 3.5 | [PS] | REQ-038, REQ-082 |
| REQ-082 | Aborted stop reason handling in `run_loop` | 3 | 3.5 | [PS] | REQ-038 |
| REQ-083 | Synthetic error `Message::Assistant` on provider failure | 3 | 3.5 | [PS] | REQ-002, REQ-039 |
| REQ-084 | `execute_sequential()` with steering check | 3 | 3.6 | [PS] | REQ-046, REQ-080 |
| REQ-085 | `execute_batch()` (Parallel) with post-batch steering | 3 | 3.6 | [PS] | REQ-046 |
| REQ-086 | `Batched { size }` dispatch with inter-batch steering | 3 | 3.6 | [PS] | REQ-085 |
| REQ-087 | Drain steering queue at start of outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-088 | Inject steering messages into `pending` after tools | 3 | 3.7 | [PS] | REQ-038, REQ-084–085 |
| REQ-089 | Follow-up queue check re-enters outer loop | 3 | 3.7 | [PS] | REQ-038 |
| REQ-090 | `QueueMode::OneAtATime` and `QueueMode::All` | 3 | 3.7 | [AR] | REQ-017, REQ-027 |
| REQ-091 | `before_turn` callback with abort-if-false | 3 | 3.8 | [PS] | REQ-038 |
| REQ-092 | `after_turn` callback on every turn | 3 | 3.8 | [PS] | REQ-038 |
| REQ-093 | `on_error` callback on Error stop reason | 3 | 3.8 | [PS] | REQ-081 |
| REQ-094 | `BashTool` deny patterns | 3 | 3.9 | [PS] | REQ-047 |
| REQ-095 | `BashTool` timeout + cancellation race | 3 | 3.9 | [PS] | REQ-047 |
| REQ-096 | `BashTool` output truncation | 3 | 3.9 | [PS] | REQ-047 |
| REQ-097 | `BashTool` `confirm_fn` callback | 3 | 3.9 | [PS] | REQ-047 |
| REQ-098 | `ReadFileTool` size limits (1MB text, 20MB image) | 3 | 3.9 | [PS] | REQ-048 |
| REQ-099 | `ReadFileTool` image path (base64, MIME detection) | 3 | 3.9 | [PS] | REQ-001, REQ-048 |
| REQ-100 | `ReadFileTool` cancellation check | 3 | 3.9 | [PS] | REQ-048 |
| REQ-101 | `EditFileTool` zero-match error with fuzzy hint | 3 | 3.9 | [PS] | REQ-050 |
| REQ-102 | `EditFileTool` multiple-match error | 3 | 3.9 | [PS] | REQ-050 |
| REQ-103 | `EditFileTool` cancellation check | 3 | 3.9 | [PS] | REQ-050 |
| REQ-104 | `WriteFileTool` cancellation check | 3 | 3.9 | [AR] | REQ-049 |
| REQ-105 | `ListFilesTool` timeout + max_results truncation | 3 | 3.9 | [PS] | REQ-051 |
| REQ-106 | `SearchTool` rg→grep fallback + cancellation | 3 | 3.9 | [PS] | REQ-052 |
| REQ-107 | `is_streaming` guard in `prompt_messages_with_sender` | 3 | 3.10 | [PS] | REQ-035 |
| REQ-108 | `agent_loop_continue` precondition validation | 3 | 3.10 | [PS] | REQ-037 |
| REQ-109 | `SkillSet::load()` with collision handling | 3 | 3.11 | [PS] | REQ-110 |
| REQ-110 | `parse_frontmatter()` with error variants | 3 | 3.11 | [PS] | — |
| REQ-111 | `SkillSet::format_for_prompt()` XML output | 3 | 3.11 | [PS] | REQ-109 |
| REQ-112 | `SkillSet::load_dir()` and `SkillSet::merge()` | 3 | 3.11 | [AR] | REQ-109 |
| REQ-113 | `Agent::with_skills()` builder | 3 | 3.11 | [PS] | REQ-111 |
| REQ-114 | `McpClient::connect_stdio()` with handshake | 3 | 3.12 | [PS] | REQ-115, REQ-116 |
| REQ-115 | `McpClient::send_request()` JSON-RPC 2.0 | 3 | 3.12 | [PS] | — |
| REQ-116 | `McpClient::list_tools()` and `call_tool()` | 3 | 3.12 | [PS] | REQ-115 |
| REQ-117 | `McpToolAdapter` implementing `AgentTool` | 3 | 3.12 | [AR] | REQ-001, REQ-021, REQ-116 |
| REQ-118 | All `McpError` variants → `ToolError::Failed` | 3 | 3.12 | [AR] | REQ-117 |
| REQ-119 | `Agent::with_mcp_server_stdio()` builder | 3 | 3.12 | [AR] | REQ-114, REQ-117 |
| REQ-120 | `GoogleProvider::stream()` (Gemini API) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-121 | `GoogleVertexProvider::stream()` (Vertex AI) | 4 | 4.1 | [AR] | REQ-120 |
| REQ-122 | `BedrockProvider::stream()` (ConverseStream) | 4 | 4.1 | [AR] | REQ-020 |
| REQ-123 | `OpenAiResponsesProvider::stream()` | 4 | 4.1 | [AR] | REQ-020 |
| REQ-124 | `AzureOpenAiProvider::stream()` | 4 | 4.1 | [AR] | REQ-123 |
| REQ-125 | All 7 providers in `ProviderRegistry::default()` | 4 | 4.1 | [AR] | REQ-042, REQ-120–124 |
| REQ-126 | `CacheStrategy::Auto` breakpoint placement | 4 | 4.2 | [AR] | REQ-014, REQ-040 |
| REQ-127 | `CacheStrategy::Manual` and `Disabled` | 4 | 4.2 | [AR] | REQ-126 |
| REQ-128 | Cache token counts in `Usage` | 4 | 4.2 | [AR] | REQ-006, REQ-040 |
| REQ-129 | `ThinkingLevel` → Anthropic `thinking` parameter | 4 | 4.3 | [AR] | REQ-019, REQ-040 |
| REQ-130 | `ThinkingLevel` → OpenAI `reasoning_effort` | 4 | 4.3 | [AR] | REQ-019, REQ-041 |
| REQ-131 | Parse `Thinking` content from streaming responses | 4 | 4.3 | [AR] | REQ-001, REQ-008, REQ-040 |
| REQ-132 | `McpClient::connect_http()` | 4 | 4.4 | [AR] | REQ-115 |
| REQ-133 | `Agent::with_mcp_server_http()` with prefix support | 4 | 4.4 | [AR] | REQ-117, REQ-132 |
| REQ-134 | MCP stdio shutdown (EOF + kill) | 4 | 4.4 | [AR] | REQ-114 |
| REQ-135 | Structured retry logging | 4 | 4.5 | [PS] | REQ-074 |
| REQ-136 | `ContextTracker` hybrid token tracking | 4 | 4.5 | [AR] | REQ-054–055 |
| REQ-137 | `ToolResult.details` per-tool metadata | 4 | 4.5 | [AR] | REQ-047–052 |
| REQ-138 | `OpenApiAuth` credential redaction in debug | 4 | 4.6 | [AR] | — |
| REQ-139 | `BashTool` default deny-pattern list | 4 | 4.6 | [PS] | REQ-094 |
| REQ-140 | `CancellationToken::child_token()` propagation | 4 | 4.7 | [PS] | REQ-033, REQ-046 |
| REQ-141 | Sub-agent inherits parent cancel token | 4 | 4.7 | [PS] | REQ-033, REQ-140 |
| REQ-142 | `on_update` callback → `ToolExecutionUpdate` event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-143 | `on_progress` callback → `ProgressMessage` event | 4 | 4.8 | [AR] | REQ-007, REQ-046 |
| REQ-144 | `Agent::prompt_with_sender()` | 4 | 4.8 | [AR] | REQ-034 |
| REQ-145 | `transform_context`/`convert_to_llm` hooks | 4 | 4.8 | [PS] | REQ-039 |
| REQ-146 | `Agent::with_compaction_strategy()` builder | 4 | 4.8 | [AR] | REQ-023, REQ-060 |
| REQ-147 | `ModelConfig` struct and application in OpenAiCompat | 4 | 4.8 | [AR] | REQ-041 |
| REQ-148 | `SubAgentTool::execute()` | 5 | 5.1 | [PS] | REQ-036, REQ-157 |
| REQ-149 | `extract_final_text()` | 5 | 5.1 | [PS] | REQ-002 |
| REQ-150 | Sub-agent event forwarding to parent channel | 5 | 5.1 | [PS] | REQ-007, REQ-148 |
| REQ-151 | `SubAgentTool` builder API | 5 | 5.1 | [AR] | REQ-021, REQ-148 |
| REQ-152 | `OpenApiAdapter::from_str()` JSON/YAML parsing | 5 | 5.2 | [AR] | REQ-153–156 |
| REQ-153 | OpenAPI parameter classification | 5 | 5.2 | [AR] | REQ-021 |
| REQ-154 | OpenAPI HTTP execution pipeline | 5 | 5.2 | [AR] | REQ-021 |
| REQ-155 | `OperationFilter` variants | 5 | 5.2 | [AR] | REQ-152 |
| REQ-156 | `name_prefix` tool naming | 5 | 5.2 | [AR] | REQ-152 |
| REQ-157 | `from_file()` and `from_url()` spec sources | 5 | 5.2 | [AR] | REQ-152 |
| REQ-158 | OpenAPI builders on Agent + feature flag | 5 | 5.2 | [AR] | REQ-026, REQ-157 |
| REQ-159 | Anthropic OAuth auth path | 5 | 5.3 | [AR] | REQ-040 |
| REQ-160 | Anthropic `InputJsonDelta` tool-arg streaming | 5 | 5.3 | [AR] | REQ-040 |
| REQ-161 | [AMBIGUOUS] `AgentEnd` on abort policy | 5 | 5.4 | [PS] | REQ-067, REQ-082 |
| REQ-162 | [AMBIGUOUS] `TokenCounter` abstraction point | 5 | 5.4 | [OV] | REQ-054 |
| REQ-163 | [AMBIGUOUS] Sub-agent error propagation policy | 5 | 5.4 | [PS] | REQ-149 |
| REQ-164 | Compaction algorithm unit tests | 6 | 6.1 | [AR] | REQ-056–059 |
| REQ-165 | Property-based tests: budget invariant | 6 | 6.1 | [AR] | REQ-056 |
| REQ-166 | Retry backoff unit tests | 6 | 6.1 | [AR] | REQ-071 |
| REQ-167 | Provider integration tests (mock HTTP server) | 6 | 6.1 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-168 | MCP stdio integration test | 6 | 6.1 | [AR] | REQ-114–119 |
| REQ-169 | End-to-end agent loop tests (MockProvider) | 6 | 6.1 | [AR] | REQ-036–090 |
| REQ-170 | Load test: 100 parallel agents, 10 concurrent tools | 6 | 6.2 | [AR] | REQ-045, REQ-085 |
| REQ-171 | Load test: 1,000-turn single agent with compaction | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-172 | Memory profile: message growth is bounded | 6 | 6.2 | [AR] | REQ-056, REQ-060 |
| REQ-173 | Public API reference documentation | 6 | 6.3 | [OV] | REQ-001–163 |
| REQ-174 | Provider integration contract documentation | 6 | 6.3 | [AR] | REQ-040–042, REQ-120–124 |
| REQ-175 | Working example implementations | 6 | 6.3 | [OV] | REQ-053, REQ-148 |
| REQ-176 | AgentSkills + MCP integration guides | 6 | 6.3 | [OV] | REQ-109–119 |
| REQ-177 | Library packaging with feature flags | 6 | 6.4 | [AR] | REQ-158 |
| REQ-178 | CI pipeline with gated live tests | 6 | 6.4 | [AR] | REQ-164–169 |
| REQ-179 | Operational runbooks | 6 | 6.4 | [AR] | REQ-071–077 |
| REQ-180 | `ContinuationKind` enum (`Default`, `Rerun { tag }`, `Branch { tag }`) | 4 | 4.9 | [AR] | — |
| REQ-181 | `TurnTrigger` enum (`User`, `Continuation`, `SubAgent`, `Branch`) | 4 | 4.9 | [AR] | — |
| REQ-182 | `before_loop`/`after_loop` hooks on `AgentLoopConfig` | 4 | 4.9 | [AR] | REQ-029, REQ-036 |
| REQ-183 | `before_tool_execution`/`after_tool_execution` hooks on `AgentLoopConfig` | 4 | 4.9 | [AR] | REQ-029, REQ-046 |
| REQ-184 | `before_tool_execution_update`/`after_tool_execution_update` hooks | 4 | 4.9 | [AR] | REQ-142, REQ-183 |
| REQ-185 | Guaranteed event hook ordering invariant | 4 | 4.9 | [AR] | REQ-182–184, REQ-091–092 |
| REQ-186 | `provider_id() -> &str` required method on `StreamProvider`; implement in all 7 providers | 4 | 4.9 | [AR] | REQ-020, REQ-125 |
| REQ-187 | `config_id: Option<String>` on `AgentLoopConfig`; auto-derived when `None` | 4 | 4.9 | [AR] | REQ-029, REQ-186 |
| REQ-188 | `agent_id`/`session_id` UUID fields on `Agent`; stable for Agent lifetime | 4 | 4.9 | [AR] | REQ-024 |
| REQ-189 | `loop_counters` and `last_loop_id` on `Agent`; `next_loop_id()` helper | 4 | 4.9 | [AR] | REQ-024, REQ-187, REQ-188 |
| REQ-190 | `agent_id`, `session_id`, `loop_id`, `parent_loop_id`, `continuation_kind` on `AgentContext`; write-back in `agent_loop` | 4 | 4.9 | [AR] | REQ-028, REQ-180, REQ-188 |
| REQ-191 | Assert `agent_id`/`session_id` are `Some` in `agent_loop_continue` | 4 | 4.9 | [AR] | REQ-037, REQ-190 |
| REQ-192 | `AgentStart` event: `agent_id`, `session_id`, `loop_id`, `parent_loop_id`, `continuation_kind` fields | 4 | 4.9 | [AR] | REQ-007, REQ-180, REQ-190 |
| REQ-193 | `TurnStart.triggered_by: TurnTrigger`; Branch continuation uses `Branch` on first turn | 4 | 4.9 | [AR] | REQ-007, REQ-181, REQ-190 |
| REQ-194 | `child_loop_id: Option<String>` on `ToolResult` and `ToolExecutionEnd`; set by sub-agent tools | 4 | 4.9 | [AR] | REQ-010, REQ-007, REQ-148 |
| REQ-195 | `SubAgentTool::with_parent_loop_id(loop_id)` builder; child `AgentContext` includes `parent_loop_id` | 4 | 4.9 | [AR] | REQ-151, REQ-190 |

***

## Known Ambiguities

Items marked `[AMBIGUOUS]` in the spec that require a design decision
before implementation:

| ID | Description | Suggested Resolution | Level Introduced |
|----|-------------|----------------------|------------------|
| AMB-001 | `AgentEnd` emission on abort — pseudocode says `AgentEnd` is NOT emitted on abort, but notes this may vary depending on where in the loop cancellation is detected (provider `Start`/`Done` events may still arrive). | Define a clear policy: `AgentEnd` is ALWAYS emitted when the loop exits, including on abort, so callers can rely on the channel always closing cleanly. Gate this by ensuring cancellation detection before the loop attempts to emit `AgentEnd`. | 5 |
| AMB-002 | Token counting precision — `estimate_tokens` uses a 4-chars-per-token heuristic explicitly noted as imprecise. No integration with tiktoken or similar is specified. | Introduce a `TokenCounter` trait (or function pointer) on `ContextConfig` that defaults to the 4-char heuristic but can be overridden by the caller. This keeps the default zero-dependency while enabling precision via injection. | 5 |
| AMB-003 | Sub-agent error propagation — when a child `agent_loop` produces only error or tool-only messages (no `Text` in the final assistant message), `extract_final_text` returns a fixed fallback string. It is unclear whether the calling tool should return `Ok(ToolResult { fallback })` or `Err(ToolError::Failed(...))`. | Return `Ok(ToolResult)` with the fallback text always. If the sub-agent produced an error assistant message, include the `error_message` field in the fallback text so the parent LLM can see and react to it. | 5 |

***

## Level Completion Checklist

- [x] **Level 1 — Survive:** All core types, traits, and the Agent struct initialize without error; smoke test passes.
- [x] **Level 2 — Useful:** Text prompt → LLM call → tool execution → final response works end-to-end; all 6 built-in tools execute on valid input; message persistence round-trips correctly.
- [x] **Level 3 — Smart:** Input filters, retry, provider error classification, tool errors, execution limits, steering/follow-up queues, lifecycle callbacks, tool safety guards, skill loading, and MCP client all handle their error paths without panicking.
- [x] **Level 4 — Professional:** All 7 provider protocols implemented; prompt caching and extended thinking integrated; cancellation propagates to all I/O; structured logging in place; `ContextTracker` accurate.
- [x] **Level 5 — Creative:** Sub-agent delegation works end-to-end; OpenAPI adapter generates callable tools; Anthropic OAuth and `InputJsonDelta` streaming are correct; all three ambiguities have documented resolutions and implementations.
- [ ] **Level 6 — Boss:** All test suites pass (unit, property-based, integration, end-to-end, load); public API docs and examples are complete; CI runs automatically; operational runbooks are written.

***

## Session & Loop Identity — Future Scenarios

> Added: 2026-03-22
> Status: Foundation implemented (loop_id, ContinuationKind, parent_loop_id, child_loop_id).
> The scenarios below build on this foundation but are out of scope for the initial change.

The current implementation covers:
- `loop_id` derived from `session_id + config_id + counter` (config owns its identity)
- `ContinuationKind` enum: `Default`, `Rerun { tag }`, `Branch { tag }`
- `parent_loop_id` for ancestry tracking across reruns/branches
- `child_loop_id` on `ToolExecutionEnd` for parent→sub-agent traceability
- Asserts in `agent_loop_continue` requiring `agent_id`/`session_id` to be set
- `TurnTrigger::Branch` fires on first turn of a `Branch` continuation

### Future: HITL Resume

**Scenario:** User cancels a loop mid-execution (via `Agent::abort()`), reviews the partial
output, then resumes. The loop was aborted at some known message boundary.

**Mechanism:** Caller restores `context.messages` to the desired resume point, then calls
`agent_loop_continue(Rerun | Branch)`. The kind communicates intent:
- `Rerun` — resume from the same point (same logical path, treat as a retry)
- `Branch` — resume but with modifications (e.g., injected steering message, different system
  prompt, tweaked tool result) — a diverging path from the original

**What needs to be built:** A `context.messages` checkpoint API. The current `Agent::messages()`
getter returns a slice; the caller needs to be able to snapshot and restore it. The `save_messages`
/ `restore_messages` methods on `Agent` already support this (JSON round-trip). The missing piece
is a higher-level `Agent::checkpoint() -> Checkpoint` and `Agent::restore(checkpoint)` that
bundle the full state (messages + loop_id + session_id) for clean HITL resume without manual
field management.

### Future: Checkpoint Restore

**Scenario:** Context is serialized to persistent storage (database, file) and later loaded for
a new run — either by the same process after restart or by a different process instance.

**Mechanism:** Same as HITL resume at the loop level. The caller deserializes `context.messages`
and sets the identity fields (`agent_id`, `session_id`, `loop_id`) to their original values, then
calls `agent_loop_continue(Branch)`. The `parent_loop_id` points to the last loop ID from the
original session, maintaining the ancestry chain across process boundaries.

**What needs to be built:** A serializable `AgentSnapshot` type that captures everything needed
to resume: `messages`, `agent_id`, `session_id`, `last_loop_id`, and any relevant config fields.
`AgentSnapshot::save(path)` / `AgentSnapshot::load(path)` convenience methods. The snapshot does
NOT include the provider config (API keys, base URLs) — those remain in the caller's environment.

### Future: Parallel Exploration

**Scenario:** Multiple branches from the same checkpoint are run concurrently — e.g., A/B testing
two different tool result injections, or evaluating three different system prompt variants on the
same conversation prefix.

**Mechanism:** The caller snapshots the context at a branching point, then calls multiple
`agent_loop_continue(Branch)` concurrently, each with a different modification to `context.messages`
before the call. Each concurrent call produces an independent event stream with its own `loop_id`
and `parent_loop_id` pointing to the same branch-point loop.

**What needs to be built:** No new primitives are needed — `agent_loop_continue` and `AgentContext`
already support this. The caller is responsible for cloning the context and making independent calls.
A higher-level `Agent::explore_branches(Vec<BranchSpec>) -> Vec<Receiver<AgentEvent>>` convenience
method could simplify the pattern but is not required for correctness.

**Concurrency note:** Each branch needs its own `AgentContext` (owned), its own `CancellationToken`,
and its own `mpsc::UnboundedSender`. `tokio::spawn` each `agent_loop_continue` call independently.
The parent task collects results from all branch receivers.

### Future: Auto Origin/Continue Selection

**Scenario:** The caller wants to send a new message to the agent without knowing whether the
current context requires an origin call (`agent_loop`) or a continuation (`agent_loop_continue`).

**Mechanism:** Inspect `context.messages.last()`:
- No messages → `agent_loop` (fresh start)
- Last message is `User` or `ToolResult` → `agent_loop_continue` (already awaiting model response)
- Last message is `Assistant` → `agent_loop` with new prompt (start new turn)

**What needs to be built:** An `Agent::send(message)` method (or similar) that encapsulates
this logic. It would inspect the context state, build the appropriate call type, and dispatch.
This trades explicit caller control for convenience and is opt-in.