do_it 0.3.2

Autonomous coding agent powered by local LLMs via Ollama. Cross-platform, no shell dependency, no cloud APIs required.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
# do_it — Documentation

An autonomous coding agent powered by local or cloud LLMs.

Reads, writes, and fixes code in your repositories.

Runs on Windows and Linux with no shell dependency, no Python.

Supports **Ollama** (local), **OpenAI-compatible**, and **Anthropic-compatible** backends — including self-hosted services and providers such as MiniMax.

---

## Table of Contents

1. [Quick Start](#quick-start)
2. [Installation](#installation)
3. [Configuration](#configuration)
4. [CLI Reference](#cli-reference)
5. [Agent Roles](#agent-roles)
6. [Tools](#tools)
7. [Telegram — ask_human and notify](#telegram--ask_human-and-notify)
8. [Browser Tools](#browser-tools)
9. [Agent Self-Improvement](#agent-self-improvement)
10. [How the Agent Works](#how-the-agent-works)
11. [Model Selection and Routing](#model-selection-and-routing)
12. [Sub-agent Architecture](#sub-agent-architecture)
13. [Persistent Memory](#persistent-memory)
14. [Limitations](#limitations)
15. [Tips and Recommendations](#tips-and-recommendations)
16. [Troubleshooting](#troubleshooting)
17. [Project Structure](#project-structure)

---

## Quick Start

```bash
# 1. Install
cargo install do_it

# 2. Initialise project (interactive)
cd /path/to/project
do_it init

# 3. Run
do_it run --task "Find and fix the bug in src/parser.rs" --repo /path/to/project

# With a role (recommended for smaller models)
do_it run --task "Add input validation to handlers.rs" --role developer
```

With Ollama:
```bash
ollama pull qwen3.5:cloud
do_it init --backend ollama --model qwen3.5:cloud --yes
```

With OpenAI or compatible:
```bash
do_it init --backend openai --llm-url https://api.openai.com --model gpt-4o
```

Windows:
```powershell
.\target\release\do_it.exe run `
  --task "Find and fix the bug in src/parser.rs" `
  --repo C:\Projects\my-project `
  --role developer
```

---

## Installation

### Requirements

- [Rust](https://rustup.rs/) 1.85+ (edition 2021)
- An LLM backend: [Ollama](https://ollama.com) locally, or any OpenAI / Anthropic compatible service

### Build from source

```bash
git clone https://github.com/oleksandrpublic/doit
cd doit
cargo build --release
```

Binary: `target/release/do_it` (Linux) / `target\release\do_it.exe` (Windows).

### Install from crates.io

```bash
cargo install do_it
```

### Backend setup

**Ollama (local):**
```bash
ollama pull qwen3.5:cloud   # recommended default
ollama list                 # verify installed models
curl http://localhost:11434/api/tags  # verify Ollama is running
```

**OpenAI or compatible service:** set `llm_url` and `llm_api_key` in `config.toml`, or run `do_it init` to generate a config interactively.

**Anthropic or compatible service:** same — set `llm_backend = "anthropic"`, `llm_url`, and `llm_api_key`.

---

## Configuration

Configuration is loaded from the first file found in this priority order:

1. `--config <path>` — explicit path passed on the command line
2. `./config.toml` — local project config (in the working directory)
3. `~/.do_it/config.toml` — global user config (created automatically on first run)
4. Built-in defaults

On first run, `~/.do_it/` is created with a default `config.toml` and supporting files.

```toml
# ── LLM backend ───────────────────────────────────────────────────────────────
# llm_backend: "ollama" | "openai" | "anthropic"
llm_backend      = "ollama"
llm_url          = "http://localhost:11434"
# llm_api_key    = ""          # or set LLM_API_KEY environment variable

# Default model — used when no role-specific override is set
model            = "qwen3.5:cloud"

# Sampling temperature: 0.0 = deterministic
temperature      = 0.0

# Max tokens per LLM response
max_tokens       = 4096

# Number of recent steps to include in full in context; older steps are collapsed to one line
history_window   = 8

# Max characters in tool output before truncation
max_output_chars = 6000

# Logging configuration
log_level  = "info"    # "error", "warn", "info", "debug", "trace"
log_format = "text"    # "text", "json"

# Optional: Telegram for ask_human / notify
# telegram_token   = "1234567890:ABCdef..."
# telegram_chat_id = "123456789"

# Optional: per-action-type model overrides
[models]
# thinking  = "qwen3.5:cloud"
# coding    = "qwen3-coder-next:cloud"
# search    = "qwen3.5:9b"
# execution = "qwen3.5:9b"
# vision    = "qwen3.5:cloud"
```

### Backend examples

**Local Ollama (default):**
```toml
llm_backend = "ollama"
llm_url     = "http://localhost:11434"
model       = "qwen3.5:cloud"
```

**Remote Ollama:**
```toml
llm_backend = "ollama"
llm_url     = "http://192.168.1.100:11434"
model       = "qwen3.5:35b"
```

**OpenAI:**
```toml
llm_backend = "openai"
llm_url     = "https://api.openai.com"
llm_api_key = "sk-..."
model       = "gpt-4o"
```

**Anthropic:**
```toml
llm_backend = "anthropic"
llm_url     = "https://api.anthropic.com"
llm_api_key = "sk-ant-..."
model       = "claude-sonnet-4-5-20251001"
```

**MiniMax (OpenAI-compatible):**
```toml
llm_backend = "openai"
llm_url     = "https://api.minimax.io"
llm_api_key = "..."
model       = "abab6.5s-chat"
```

**Local proxy without key:**
```toml
llm_backend = "openai"
llm_url     = "http://localhost:20128/v1"
model       = "my-local-model"
# no llm_api_key needed
```

The `llm_api_key` field can also be supplied via the `LLM_API_KEY` environment variable — useful in CI or when you prefer not to store keys in config files.

### Defaults

| Field | Default |
|---|---|
| `llm_backend` | `ollama` |
| `llm_url` | `http://localhost:11434` |
| `llm_api_key` | _(none — set via env if needed)_ |
| `model` | `qwen3.5:cloud` |
| `temperature` | `0.0` |
| `max_tokens` | `4096` |
| `history_window` | `8` |
| `max_output_chars` | `6000` |
| `log_level` | `info` |
| `log_format` | `text` |

```toml
# Enable optional tool groups (browser, background, github)
# tool_groups = ["browser", "github"]
```

| Group | Tools | Roles |
|---|---|---|
| `browser` | `screenshot`, `browser_get_text`, `browser_action`, `browser_navigate` | boss, developer, qa, reviewer |
| `background` | `run_background`, `process_status`, `process_list`, `process_kill` | boss, developer |
| `github` | `github_api` | developer, qa |

```bash
do_it config              # print resolved config
do_it config --config custom.toml
```

---

## CLI Reference

```
do_it run     Run the agent on a task
do_it init    Initialise a project workspace
do_it config  Print resolved config and exit
do_it roles   List all roles with their tool allowlists
do_it status  Show current project status
```

`do_it status` currently summarizes:
- session report files in `.ai/logs/`
- structured trace files in `.ai/logs/`
- compact path-sensitivity diagnostics from the latest structured trace, when present
- `last_session.md`
- `current_plan.md`
- `~/.do_it/tool_wishlist.md`
- knowledge keys under `.ai/knowledge/`

### `run` arguments

```
--task, -t        Task text, path to a .md file, or path to an image
--repo, -r        Repository / working directory (default: .)
--config, -c      Path to config.toml (default: ./config.toml)
--role            Agent role: boss | research | developer | navigator | qa | reviewer | memory
--system-prompt   Override system prompt: inline text or path to a file
--max-steps       Maximum agent steps (default: 30)
```

### `init` arguments

```
--repo, -r        Repository / working directory (default: .)
--backend         LLM backend: ollama | openai | anthropic
--llm-url         LLM service URL
--model           Model name
--api-key         API key (alternative: LLM_API_KEY env var)
--yes, -y         Skip interactive prompts, use defaults
```

`do_it init` creates `.ai/` workspace, `config.toml`, `.ai/project.toml`, and `.gitignore` entries. If run without `--yes` it prompts for backend, URL, model, and API key interactively. Existing files are never overwritten.

### Examples

```bash
# Plain task
do_it run --task "Refactor the database module"

# With role (recommended)
do_it run --task "Add rate limiting" --role developer
do_it run --task "What does this project do?" --role navigator
do_it run --task "Find docs for tower-http middleware" --role research

# Orchestrate a complex task with sub-agents
do_it run --task "Add OAuth2 login" --role boss --max-steps 60

# Task from file
do_it run --task tasks/issue-42.md --role developer --repo ~/projects/my-app

# Screenshot with an error (vision mode)
do_it run --task error-screenshot.png --role developer

# Custom prompt for a specific task
do_it run --task "Review only, do not edit" --system-prompt prompts/reviewer.md

# More steps for complex tasks
do_it run --task "Refactor auth module" --role developer --max-steps 50

# Continue after interruption
do_it run --task "continue" --max-steps 50

```

`--task` and `--system-prompt`: if the value is a path to an existing file, its contents are read; if the extension is an image (`.png`, `.jpg`, `.webp`, etc.), vision mode is activated.

### Logging

```bash
RUST_LOG=info  do_it run --task "..."   # default
RUST_LOG=debug do_it run --task "..."   # includes raw LLM responses
RUST_LOG=error do_it run --task "..."   # errors only
```

---

## Agent Roles

Roles restrict the agent to a focused tool set and apply a role-specific system prompt. This is critical for smaller models — 6–8 tools instead of 20+ significantly improves output quality and reduces hallucinations.

```bash
do_it roles   # print all roles and their allowlists
```

### Role table

| Role | Purpose | Core tools (≤ 12–14) |
|---|---|---|
| `default` | No restrictions | all tools |
| `boss` | Orchestration — delegates everything, never writes code | `memory`, `tree`, `project_map`, `web_search`, `ask_human`, `notify`, `spawn_agent/s`, `tool_request`, `capability_gap` |
| `research` | Information gathering | `web_search`, `fetch_url`, `memory`, `ask_human` |
| `developer` | Write and run code — uses navigator sub-agent for exploration | `read_file`, `write_file`, `str_replace`, `apply_patch_preview`, `run_command`, `run_targeted_test`, `format_changed_files_only`, `run_script`, `git_*`, `memory`, `notify` |
| `navigator` | Explore codebase — read-only | `read_file`, `list_dir`, `find_files`, `search_in_files`, `tree`, `get_symbols`, `outline`, `find_references`, `project_map`, `trace_call_path`, `memory` |
| `qa` | Testing and verification | `read_file`, `search_in_files`, `run_command`, `run_script`, `diff_repo`, `read_test_failure`, `test_coverage`, `git_*`, `memory`, `notify` |
| `reviewer` | Static code review — no execution | `read_file`, `search_in_files`, `diff_repo`, `git_log`, `get_symbols`, `outline`, `get_signature`, `find_references`, `ask_human`, `memory` |
| `memory` | Managing `.ai/` state | `memory_read`, `memory_write` |

### System prompt priority

```
--system-prompt  (highest)
  ↓
--role           → looks for .ai/prompts/<role>.md, falls back to built-in
  ↓
system_prompt in config.toml
  ↓
~/.do_it/system_prompt.md  (global override)
  ↓
built-in DEFAULT_SYSTEM_PROMPT  (lowest)
```

### Custom role prompts

Create `.ai/prompts/<role>.md` in the repository root to override the built-in prompt for that role:

```bash
mkdir -p .ai/prompts
cat > .ai/prompts/developer.md << 'EOF'
You are a Rust expert working on this codebase.
Always run `cargo clippy -- -D warnings` after edits.
Use thiserror for error types, never anyhow in library code.
All public functions require doc comments.
EOF
```

The file is picked up automatically — no restart needed.

---

## Tools

All tools are implemented in native Rust under `src/tools/`.

The tool surface is no longer documented only by hand-written prompts:
- canonical names, aliases, role availability, dispatch kind, and capability status live in a shared registry (`src/tools/spec.rs`)
- role prompts inject their `## Available tools` section from that registry at runtime
- generated tool catalogs mark non-real tools as:
  - `[limited]` for stubbed tools
  - `[experimental]` for tools with a narrower or unstable runtime contract

This makes prompts, allowlists, and runtime dispatch significantly more consistent than before.

### Filesystem

| Tool | Arguments | Description |
|---|---|---|
| `read_file` | `path`, `start_line?`, `end_line?` | Read file with line numbers (default: first 100 lines) |
| `write_file` | `path`, `content` | Overwrite file, create directories if needed |
| `str_replace` | `path`, `old_str`, `new_str` | Replace a unique string in a file |
| `apply_patch_preview` | `path`, `content?` or `old_str` + `new_str` | Preview an edit as a unified diff without writing it |
| `list_dir` | `path?` | List directory contents (one level) |
| `find_files` | `pattern`, `dir?` | Find files by name: `*.rs`, `test*`, substring |
| `search_in_files` | `pattern`, `dir?`, `ext?` | Regex search across file contents |
| `tree` | `dir?`, `depth?`, `ignore?` | Recursive directory tree (ignores `target`, `.git`, etc. by default) |

`apply_patch_preview` is currently `[experimental]`. It supports:
- write-style preview via `path + content`
- targeted replace preview via `path + old_str + new_str`
- no filesystem writes; the current file stays untouched
- unified diff output intended for fast review before a real edit

### Execution

| Tool | Arguments | Description |
|---|---|---|
| `run_command` | `program`, `args[]`, `cwd?` | Run a program with explicit args array (no shell) |
| `format_changed_files_only` | `dir?`, `check_only?`, `timeout_secs?` | Format only changed Rust files detected from git status |
| `run_targeted_test` | `path?`, `test?`, `kind?`, `target?`, `dir?` | Run a narrow Rust test target inferred from file path or explicit target |
| `run_script` | `script`, `dir?` | Run a sandboxed Rhai script for lightweight parsing and summarization |
| `diff_repo` | `base?`, `staged?`, `stat?` | Git diff vs HEAD or any ref |

`run_script` is currently `[experimental]`. It exposes a deliberately small host API:
- `read_lines(path)` — read file lines within the workspace
- `regex_match(pattern, text)` — regex match helper
- `parse_json(text)` — parse JSON into Rhai values
- `log(msg)` — append to script logs returned in the tool result

Sandbox limits:
- 1 second wall-clock budget
- operation, depth, string, array, and map size limits
- output capped at 512 KB
- no network, process spawning, or unrestricted filesystem access

`run_targeted_test` is currently `[experimental]` and Rust-first:
- `path: "src/..."` narrows to `cargo test --lib`
- `path: "tests/<target>/..."` narrows to `cargo test --test <target>`
- `test` adds a standard cargo test filter
- `kind` can force `auto`, `lib`, `test`, or `package`
- Node/Python fallback branches are intentionally deferred until the Rust-first workflow is stable

`format_changed_files_only` is currently `[experimental]` and Rust-first:
- reads changed paths from `git status --porcelain`
- filters to changed `.rs` files only
- runs `rustfmt` just on those files instead of the whole repo
- `check_only: true` switches to `rustfmt --check`
- Node/Python formatter fallbacks are intentionally deferred until the Rust-first workflow is stable

### Background Processes

| Tool | Arguments | Description |
|---|---|---|
| `run_background` | `program`, `args[]`, `cwd?`, `id`, `cmd?` | Start a named background process |
| `process_status` | `id` or `pid` | Check if a background process is still running |
| `process_kill` | `id` or `pid` | Terminate a background process |
| `process_list` |  | List all tracked background processes |

Current state: these tools are real. They spawn OS processes, track real PIDs, support lookup by stable `id`, and keep a recent stdout/stderr excerpt for status inspection.

### Git

| Tool | Arguments | Description |
|---|---|---|
| `git_status` | `short?` | Working tree status and branch info |
| `git_commit` | `message`, `files?`, `allow_empty?` | Stage files and commit |
| `git_log` | `n?`, `path?`, `oneline?` | Commit history, optionally filtered by path |
| `git_stash` | `action`, `message?`, `index?` | Stash management: `push`, `pop`, `list`, `drop`, `show` |
| `git_pull` | `remote?`, `branch?` | Pull from remote repository |
| `git_push` | `remote?`, `branch?`, `force?` | Push to remote repository |

### Internet

| Tool | Arguments | Description |
|---|---|---|
| `web_search` | `query`, `max_results?` | Search via DuckDuckGo (no API key required) |
| `fetch_url` | `url`, `selector?` | Fetch a web page and return readable text |
| `github_api` | `method`, `endpoint`, `body?`, `token?` | GitHub REST API — issues, PRs, branches, commits, file contents |

`github_api` requires a `GITHUB_TOKEN` environment variable (or `token` argument). Responses are automatically filtered to keep context concise — file contents are base64-decoded automatically.

Common endpoints:
```
GET  /repos/{owner}/{repo}/issues              — list open issues
GET  /repos/{owner}/{repo}/issues/{n}          — read single issue
POST /repos/{owner}/{repo}/issues/{n}/comments — post a comment
PATCH /repos/{owner}/{repo}/issues/{n}         — update issue (state/labels)
GET  /repos/{owner}/{repo}/pulls               — list open PRs
POST /repos/{owner}/{repo}/pulls               — create PR
PUT  /repos/{owner}/{repo}/pulls/{n}/merge     — merge PR
GET  /repos/{owner}/{repo}/branches            — list branches
GET  /repos/{owner}/{repo}/contents/{path}     — read file (auto-decoded)
```

### Code Intelligence

Tree-sitter backend. Supports Rust, TypeScript/JavaScript, Python, C++, Kotlin. Detected by file extension.

| Tool | Arguments | Description |
|---|---|---|
| `get_symbols` | `path`, `kinds?` | List all symbols: fn, struct, class, impl, enum, trait, type, const |
| `outline` | `path` | Structural overview with signatures and line numbers |
| `get_signature` | `path`, `name`, `lines?` | Full signature + doc comment for a named symbol |
| `find_references` | `name`, `dir?`, `ext?` | All usages of a symbol across the codebase |

`kinds` filter examples: `"fn,struct"`, `"class,interface"`, `"fn,method"`.

### Testing

| Tool | Arguments | Description |
|---|---|---|
| `test_coverage` | `dir?`, `threshold?`, `timeout_secs?` | Run tests with coverage (auto-detects project type). Enforces threshold (default: 80%). |

Current state: Rust projects are supported. The tool prefers `cargo llvm-cov`, falls back to `cargo tarpaulin`, and falls back again to `cargo test` when no coverage backend is installed, returning a clear note that coverage numbers are unavailable. Non-Rust project types still return a clean "not implemented yet" result.

`run_targeted_test` complements this by making the inner edit/test loop cheaper before QA runs broader coverage or full-suite verification.

### Memory (`.ai/` hierarchy)

| Tool | Arguments | Description |
|---|---|---|
| `memory_read` | `key` | Read a memory entry |
| `memory_write` | `key`, `content`, `append?` | Write or append to a memory entry |

**Logical key mapping:**

| Key | File |
|---|---|
| `plan` | `.ai/state/current_plan.md` |
| `last_session` | `.ai/state/last_session.md` |
| `session_counter` | `.ai/state/session_counter.txt` |
| `external` | `.ai/state/external_messages.md` |
| `history` | `.ai/logs/history.md` |
| `knowledge/<n>` | `.ai/knowledge/<n>.md` |
| `prompts/<n>` | `.ai/prompts/<n>.md` |
| `user_profile` | `~/.do_it/user_profile.md` (global) |
| `boss_notes` | `~/.do_it/boss_notes.md` (global) |
| any other key | `.ai/knowledge/<key>.md` |

**Persistent knowledge keys used by built-in roles:**

| Key | Written by | Purpose |
|---|---|---|
| `knowledge/lessons_learned` | QA | Project-specific pitfalls and correct patterns |
| `knowledge/decisions` | Boss, Developer | Architectural decisions and rationale |
| `knowledge/qa_report` | QA | Latest test run report |
| `knowledge/review_report` | Reviewer | Latest static code review |
| `knowledge/<role>_result` | Sub-agents | Results from `spawn_agent` calls |

### Communication

| Tool | Arguments | Description |
|---|---|---|
| `ask_human` | `question` | Send a question via Telegram and wait up to 5 min for reply; falls back to console |
| `notify` | `message`, `silent?` | Send a one-way Telegram notification (non-blocking, no waiting) |
| `finish` | `summary`, `success` | Signal task completion |

### Multi-agent

| Tool | Arguments | Description |
|---|---|---|
| `spawn_agent` | `role`, `task`, `memory_key?`, `max_steps?` | Delegate a subtask to a specialised sub-agent |
| `spawn_agents` | `agents[]` | Spawn multiple sub-agents in parallel |

### Browser

Browser tools are currently `[experimental]`.

| Tool | Arguments | Description |
|---|---|---|
| `screenshot` | `url?`, `path?` | Optionally navigate, then save a PNG screenshot to a workspace path |
| `browser_get_text` | `url?`, `selector?` | Read rendered text from the current page or an optionally provided URL |
| `browser_action` | `action`, `selector?`, `value?`, `url?` | Interact with an element: `click`, `type`, `hover`, `clear`, `select`, `scroll` |
| `browser_navigate` | `url` | Navigate and wait for page load |

Current runtime behavior:
- the implementation launches `headless_chrome` directly
- `AgentConfig.browser` exists but is not yet the source of truth for browser startup
- if browser startup fails, the tool returns the real startup error from the browser backend

So the browser contract is usable, but the config-driven setup described below is still aspirational and not fully wired.

### Self-improvement

| Tool | Arguments | Description |
|---|---|---|
| `tool_request` | `name`, `description`, `motivation`, `priority?` | Request a new tool — appends to `~/.do_it/tool_wishlist.md` |
| `capability_gap` | `context`, `impact` | Report a structural blind spot without a specific solution — appends to wishlist |

`tool_request` and `capability_gap` are available to `boss` and `default` roles. The Boss calls `tool_request` when it encounters a missing capability for the second time in any session, and `capability_gap` when it observes something it structurally cannot do.

See [Sub-agent Architecture](#sub-agent-architecture) for details.

---

## Telegram — ask_human and notify

### Setup

1. Create a bot via [@BotFather](https://t.me/BotFather) → copy the token
2. Send `/start` to your bot
3. Find your `chat_id` via [@userinfobot](https://t.me/userinfobot)
4. Add to `config.toml`:

```toml
telegram_token   = "1234567890:ABCdef-ghijklmnop"
telegram_chat_id = "123456789"
```

### ask_human

Sends a question and waits up to 5 minutes for your reply. Your reply is returned to the agent as the tool result. Falls back to console stdin if Telegram is not configured or unreachable.

Use this when the agent needs a decision before continuing — it will not guess on important choices.

### notify

Sends a one-way message with no waiting. Used for progress updates and completion notices during long autonomous runs. Falls back to stdout if Telegram is not configured.

```json
{ "tool": "notify", "args": { "message": "OAuth implementation complete, running tests..." } }
{ "tool": "notify", "args": { "message": "All tests pass. PR created.", "silent": true } }
```

### External messages (inbox)

To send instructions to the agent before its next run, write to `.ai/state/external_messages.md`. On startup, the agent reads this file, injects it into context, and clears it. Any external process — a webhook, a cron script, another agent — can write to this file.

```bash
echo "## 2026-01-15 10:30
Please also update the README after the refactor." >> .ai/state/external_messages.md
```

The next run will show: `[inbox] 2 external message(s) received`.

---

## Browser Tools

Browser tools give the agent eyes — the ability to see rendered pages, interact with UI elements, and take screenshots. This is essential for JavaScript-heavy applications (React, Vue, Leptos) where `fetch_url` returns an empty shell.

### Setup

The agent speaks CDP (Chrome DevTools Protocol). The backend is transparent — swap `cdp_url` to change from Chrome to Lightpanda or any future CDP-compatible browser without changing any agent code.

```toml
# config.toml
[browser]
# Option 1: connect to a running CDP server
cdp_url = "ws://127.0.0.1:9222"

# Option 2: launch Chrome locally (requires chromiumoxide feature — coming in a future version)
# chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"

# Screenshot output directory (default: .ai/screenshots)
# screenshot_dir = ".ai/screenshots"
```

Start a CDP server before running the agent:

```bash
# Chrome (headless)
google-chrome --headless --remote-debugging-port=9222

# Lightpanda — lightweight, designed for AI, 9x less RAM than Chrome
# Linux/macOS binary:
lightpanda serve --host 127.0.0.1 --port 9222

# Docker:
docker run -d --name lightpanda -p 9222:9222 lightpanda/browser:nightly
```

### Planned browser direction

The intended direction is a config-driven CDP/browser backend with cleaner “browser not configured” messages. The current implementation is a simpler direct `headless_chrome` integration, so treat the rest of this section as the target architecture rather than a fully completed contract.

### How the agent uses browser tools

```
screenshot(url)          → PNG saved to .ai/screenshots/
                           base64 returned for vision model input
browser_get_text(url)    → full page text after JS execution
browser_action(...)      → click/type/hover + implicit screenshot
browser_navigate(url)    → navigate + wait + screenshot
```

The Boss uses screenshots to verify UI work directly, without delegating:

```
boss: spawn_agent("developer", "add login form to /login")
boss: screenshot("http://localhost:3080/login")  ← sees result
boss: browser_action("type", "#email", "test@example.com")
boss: browser_action("click", "#submit")
boss: screenshot("http://localhost:3080/dashboard")  ← sees result after login
```

The Developer uses browser tools for visual feedback after UI changes:
```
developer: write_file("src/components/Login.rs", ...)
developer: run_command("trunk", ["build"])
developer: screenshot("http://localhost:3080/login")  ← verify rendering
```

### Vision model integration

`screenshot` returns the image as base64. Pass it to a vision-capable model for deeper analysis:

```bash
# Take a screenshot and describe it
do_it run --task screenshot.png --role developer
```

When browser startup fails, the error is surfaced explicitly. Unknown browser actions are rejected before startup so contract errors do not get masked by environment failures.

---

## Agent Self-Improvement

The Boss accumulates knowledge about missing capabilities across sessions. Two tools write to `~/.do_it/tool_wishlist.md`:

### tool_request

Called when the Boss encounters a missing capability for the **second time** in any session. Not for first encounters — the agent tries to work around once, then requests.

```json
{
  "tool": "tool_request",
  "args": {
    "name": "run_background",
    "description": "Run a process in the background and keep it alive",
    "motivation": "Need to start trunk serve and then navigate to localhost, but run_command blocks",
    "priority": "high"
  }
}
```

### capability_gap

Called when the Boss observes a structural blind spot — it cannot see or reach something important — and has no specific solution to propose.

```json
{
  "tool": "capability_gap",
  "args": {
    "context": "Developer wrote a Leptos component but I cannot see how it renders without a browser",
    "impact": "Visual bugs and layout issues go undetected until manual review"
  }
}
```

### Reading the wishlist

```bash
cat ~/.do_it/tool_wishlist.md
```

Each entry is timestamped and structured. The wishlist is your primary source for understanding what the agent actually needs — derived from real tasks, not speculation.

---

## How the Agent Works

### Session initialisation (`src/agent/session.rs`)

```
session_init():
  → increment .ai/state/session_counter.txt
  → read last_session.md              → inject into history as step 0
  → read user_profile.md (Boss only)  → inject into history (global preferences)
  → read boss_notes.md   (Boss only)  → inject into history (cross-project insights)
  → read external_messages.md         → inject into history, then clear the file
  → read/scaffold .ai/project.toml   → inject into history as project context
```

On first run in a new repository, `.ai/project.toml` is scaffolded automatically by detecting `Cargo.toml` / `package.json` / `pyproject.toml` / `go.mod` and reading the GitHub remote from `.git/config`. Edit it freely — it will not be overwritten.

```toml
# .ai/project.toml (auto-generated, edit as needed)
[project]
name     = "my-project"
language = "rust"

[commands]
test  = "cargo test"
build = "cargo build --release"
lint  = "cargo clippy -- -D warnings"

[github]
repo = "owner/my-project"

[agent]
notes = """
- Always run clippy before committing
"""
```

### Main loop

```
if --task is an image:
  → vision model describes it → description becomes effective_task

for each step 1..max_steps:
  0. build prompt from:
     - task
     - working memory (`goal`, `attempted_actions`, `artifacts_found`, `blocked_on`, `next_best_action`)
     - recent history
     - status-aware strategy notes for weak/repeated tools
     - project context
  1. thinking model → JSON { thought, tool, args }
  2. canonicalize tool name through the registry
  3. check role tool allowlist (if role != Default)
  4. if specialist model differs → re-call with specialist
  5. execute tool
  6. update working memory from the tool result
  7. record in history
  8. loop detection / anti-loop policy
  9. if tool == "finish" → done
```

### Loop detection and anti-loop policy

The runtime now combines raw history with tool capability status from the registry:

- repeated identical `[limited]` tool calls trip loop handling sooner
- repeated identical `[experimental]` tool calls trip loop handling sooner than normal tools
- prompts include `Strategy Notes` when a weak tool already returned the same result with the same args
- prompts also include `Working Memory`, which helps the model remember blockers and next-best actions

The goal is to interrupt low-value repetition before the agent burns many steps on the same failing tactic.

### Resume behavior

Top-level sessions persist structured working memory in `.ai/state/task_state.json`.

Current behavior:
- `continue` reuses the saved top-level goal when available
- the first prompt after restore includes resume guidance from the saved task state
- persisted task state is restored only for the top-level agent
- successful top-level completion clears the persisted snapshot
- failed or interrupted top-level runs keep the snapshot for resume

Sub-agents intentionally do not restore persisted task state. They start with fresh in-memory working memory so their behavior stays scoped to the current delegated task and remains reproducible.

Design constraint:
- `TaskState` is intentionally small and operational
- it is not a full narrative session transcript
- avoid adding extra resume metadata unless there is a proven benefit

This keeps prompt pressure lower and makes the state more usable on weaker models.

### Context window (`src/history.rs`)

- Last `history_window` steps in full
- Older steps collapsed to one line: `step N ✓ [tool] → first line of output`

### LLM response parsing

Finds the first `{` and last `}` — handles models that add prose before or after JSON. Strips ` ```json ` fences automatically.

### Tool error handling

A tool error does not stop the agent — it is recorded in history as a failed step. The agent can recover and try a different approach on the next step.

---

## Model Selection and Routing

### Recommended models (qwen3.5 family, for Ollama)

| Model | Size | Use case |
|---|---|---|
| `qwen3.5:cloud` | - | Default — good balance |
| `qwen3.5:35b` | 24 GB | Complex multi-step tasks |
| `qwen3.5:27b` | 17 GB | Complex multi-step tasks |
| `qwen3.5:9b` | 6.6 GB | Roles with ≤8 tools (navigator, research) |
| `qwen3-coder:30b` | 19 Gb | Complex multi-step coding |
| `qwen3-coder-next:cloud` | - | Developer role, best code quality |

### Per-action-type model routing

```toml
model = "qwen3.5:cloud"   # fallback for all action types

[models]
coding    = "qwen3-coder-next:cloud"   # used for write_file, str_replace
search    = "qwen3.5:9b"               # used for read_file, find_files, etc.
execution = "qwen3.5:9b"               # used for run_command
vision    = "qwen3.5:cloud"            # used for image tasks
```

### LLM backends

Three wire protocols are supported. Set `llm_backend` in `config.toml` or via `do_it init`:

| Backend | `llm_backend` value | Auth |
|---|---|---|
| Ollama | `ollama` | none (local) |
| OpenAI / compatible | `openai` | `llm_api_key` or `LLM_API_KEY` env |
| Anthropic / compatible | `anthropic` | `llm_api_key` or `LLM_API_KEY` env |

Any service that implements the OpenAI `/v1/chat/completions` or Anthropic `/v1/messages` API works, including self-hosted proxies and third-party providers. The `llm_url` field sets the base URL — the protocol suffix (`/v1/chat/completions` etc.) is added automatically.

```toml
# OpenAI-compatible local proxy, no key needed
llm_backend = "openai"
llm_url     = "http://localhost:20128/v1"
model       = "my-local-model"
```

---

## Sub-agent Architecture

`spawn_agent` lets the `boss` role delegate subtasks to specialised sub-agents. Each sub-agent runs in-process with its own history, role prompt, fresh in-memory working memory, and tool allowlist. Communication between boss and sub-agents goes through the shared `.ai/knowledge/` memory.

Design rule:
- top-level agent: may restore persisted `TaskState`
- sub-agent: never restores persisted `TaskState`; always starts clean for the current delegated scope

### Usage

```json
{
  "tool": "spawn_agent",
  "args": {
    "role": "research",
    "task": "Find the best OAuth2 crates for Axum in 2026",
    "memory_key": "knowledge/oauth_research",
    "max_steps": 15
  }
}
```

After the sub-agent finishes, the boss reads the results:

```json
{ "tool": "memory_read", "args": { "key": "knowledge/oauth_research" } }
```

### Arguments

| Argument | Required | Description |
|---|---|---|
| `role` | yes | Sub-agent role: `research`, `developer`, `navigator`, `qa`, `reviewer`, `memory` |
| `task` | yes | Task description — what the sub-agent should do |
| `memory_key` | no | Where to write results (default: `knowledge/agent_result`) |
| `max_steps` | no | Step limit (default: parent's `max_steps / 2`, min 5) |

Boss cannot spawn another boss (recursion guard).

### Full orchestration example

```bash
do_it run --task "Add OAuth2 login to the API" --role boss --max-steps 80
```

```
boss: reads last_session, plan, decisions, user_profile → writes task breakdown to plan
  │
  ├─ spawn_agent(role="research", task="find best OAuth crates for Axum",
  │              memory_key="knowledge/oauth_research")
  │    └─ research agent searches web, writes findings
  │
  ├─ memory_read("knowledge/oauth_research")
  │
  ├─ spawn_agent(role="navigator", task="find current auth middleware location",
  │              memory_key="knowledge/auth_structure")
  │    └─ navigator explores codebase, maps existing code
  │
  ├─ spawn_agent(role="developer", task="implement OAuth2 per the plan",
  │              memory_key="knowledge/impl_notes")
  │    └─ developer writes code, runs tests, commits
  │
  ├─ spawn_agent(role="reviewer", task="review the OAuth2 implementation",
  │              memory_key="knowledge/review_report")
  │    └─ reviewer reads code, checks decisions.md, writes structured report
  │
  ├─ spawn_agent(role="qa", task="verify all tests pass, check coverage",
  │              memory_key="knowledge/qa_report")
  │    └─ qa runs test_coverage, writes report, appends lessons_learned
  │
  └─ boss: reads review_report + qa_report → notify("OAuth2 complete") → finish
```

---

## Persistent Memory

The agent maintains persistent state in `.ai/` at the repository root. This directory is gitignored by default.

```
.ai/
├── project.toml           ← project config (auto-scaffolded, edit freely)
├── prompts/               ← custom role prompts (override built-ins per project)
│   ├── boss.md
│   ├── developer.md
│   └── qa.md
├── state/
│   ├── current_plan.md        ← boss writes the task plan here
│   ├── last_session.md        ← read on startup, written at end of session
│   ├── session_counter.txt
│   └── external_messages.md  ← external inbox, read and cleared on startup
├── logs/
│   ├── history.md
│   ├── session-NNN.md         ← per-session markdown report, including path-sensitivity summary when relevant
│   └── session-NNN.trace.json ← structured session trace (start, turns, finish, path-sensitivity diagnostics)
└── knowledge/                 ← agent-written project knowledge
    ├── lessons_learned.md     ← QA appends project-specific patterns after each session
    ├── decisions.md           ← Boss/Developer log architectural decisions + rationale
    └── qa_report.md           ← latest test run
```

### What each file is for

**`last_session.md`** — the agent writes a note to its future self at the end of every session: what was done, what is pending, any important context. Read automatically on next startup.

**`session-NNN.md`** — per-session markdown report written under `.ai/logs/`. It includes task, summary, tool usage, and a `Path sensitivity` section when the session touched config, prompts, repo metadata, or other tagged path categories.

**`session-NNN.trace.json`** — structured session trace written under `.ai/logs/`. Besides start/turn/finish events and per-tool call counts, it now records aggregated `path_sensitivity_stats` and embeds per-turn sensitivity hints when tool outputs include `[sensitivity: ...]`.

**Final session summary** — the plain-text close-out printed at the end of a run now includes a compact `Safety : ...` line when path-sensitive writes occurred, so the operator sees the same signal immediately without opening saved artifacts.

**Redaction** — before any text is written to `session-NNN.md`, `session-NNN.trace.json`, or `last_session.md`, the task description and final summary pass through a central redaction filter (`src/redaction.rs`). The filter replaces any line that contains a known sensitive token pattern with `[redacted]`. The same filter is applied to the output strings of all write-oriented tools (`write_file`, `str_replace`, `str_replace_multi`, `str_replace_fuzzy`, `memory_write`, `memory_delete`) before they are returned to the agent. Path-sensitivity annotations such as `[sensitivity: project_config]` are intentionally preserved — they do not match any sensitive-token pattern.

Covered patterns: PEM key headers (`-----BEGIN `), common API key prefixes (`sk-`, `ghp_`, `ghs_`, `glpat-`, `xoxb-`, `xoxp-`), HTTP auth headers (`Authorization: Bearer `, `Authorization: Basic `), and common env-var-style assignments (`password=`, `passwd=`, `secret=`, `api_key=`, `apikey=`, `access_token=`, `refresh_token=`, `private_key=`).

Intentionally not redacted: read-tool output (the agent needs file and memory content to act), subprocess stdout/stderr from `run_command` and git tools (compiler errors, test output), and HTTP response bodies from `fetch_url` / `github_api`. These are ephemeral in the LLM context window and not persisted to on-disk artifacts by the agent runtime.

**`external_messages.md`** — your inbox for the agent. Write anything here before a run; the agent will see it on startup and the file is cleared. Use this for instructions that don't fit as a `--task` flag, or to send notes from an external process.

**`project.toml`** — permanent project context injected every session. Commands for test/build/lint, GitHub repo name, agent conventions. Edit once, used forever.

**`lessons_learned.md`** — QA appends project-specific anti-patterns and correct approaches after each session. The agent reads this before starting work to avoid repeating mistakes.

**`decisions.md`** — Boss and Developer log significant architectural decisions: what was chosen, what alternatives were considered, and why. Consulted before redesigning anything.

**`review_report.md`** — Reviewer writes a structured static analysis after each review session: architectural issues, code smells, convention violations, potential bugs, with per-finding severity ratings.

### Global memory — `~/.do_it/`

Two files in `~/.do_it/` persist across all projects. The `boss` role reads them automatically at the start of every session (if they contain actual content, not just the default comments).

| File | Key | Purpose |
|---|---|---|
| `user_profile.md` | `user_profile` | Your preferences: communication language, tech stack, preferred crates, workflow style. Edit once, applies to all projects. |
| `boss_notes.md` | `boss_notes` | Cross-project insights the Boss accumulates — patterns that work, approaches to avoid, ideas for future projects. |

Boss reads these files and appends to them over time:
- When it learns something stable about you → updates `user_profile` via `memory_write("user_profile", ...)`
- When it discovers a cross-project insight → appends to `boss_notes` via `memory_write("boss_notes", ..., append=true)`

Both files are created with commented templates on first run. They are only injected into context when they contain actual content (not just `#` comment lines).

---

## Limitations

**Context window** — for long sessions reduce `history_window` to 4–5 and `max_output_chars` to 3000.

**`find_files`** — simple patterns only: `*.rs`, `test*`, substring. No `**`, `{a,b}`, or `?`.

**`run_command`** — no `|`, `&&`, `>`. Each command is a separate call with an explicit args array.

Current policy hardening:
- `program` must be a bare executable name from `PATH`, not `./tool` or `C:\path\tool.exe`
- timeout is required to stay within the tool's max bound
- arg count / arg length are capped
- env var count / value length are capped
- risky env overrides such as `PATH`, `PATHEXT`, `COMSPEC`, `LD_PRELOAD`, `DYLD_INSERT_LIBRARIES`, `RUSTC_WRAPPER`, `CARGO_HOME`, `RUSTUP_HOME`, `HOME`, and `USERPROFILE` are rejected

**`run_targeted_test`** — currently optimized for Rust repositories. Node/Python fallback branches are intentionally deferred.

**`format_changed_files_only`** — currently optimized for Rust repositories and relies on `git` + `rustfmt` being available.

**`apply_patch_preview`** — preview-only helper. It intentionally does not edit files; follow with `str_replace` or `write_file` for the real change.

**`run_script`** — intentionally narrow. It is for small parsing/inspection tasks, not general-purpose program execution.

**`fetch_url`** — public URLs only, no authentication, no JavaScript rendering. For GitHub use `raw.githubusercontent.com`.

**`web_search`** — DuckDuckGo HTML endpoint, no API key required. Rate limiting possible under heavy use.

**Code intelligence** — tree-sitter backend covers ~95% of real-world cases. Does not handle macros, conditional compilation, or dynamically generated code.

**`test_coverage`** — implemented for Rust projects. Coverage numbers depend on `cargo llvm-cov` or `cargo tarpaulin`; without those tools it falls back to `cargo test` and reports that coverage is unavailable.

**Vision** — image input is implemented and tested. GGUF files in Ollama may not include the mmproj component for vision models. Use llama.cpp directly or verify your model has vision support.

**`spawn_agent` / `spawn_agents`** — implemented and tested. The main practical limit is step budget and model quality, not a stubbed runtime.

**`memory_write`** — supports `append=true` and namespaced keys (`knowledge/decisions`, `knowledge/qa_report`, etc.) via subdirectory layout in `.ai/memory/`.

**`memory_delete`** — removes a memory entry by key. Available to `boss` and `memory` roles.

---

## Tips and Recommendations

### Choose the right role

```bash
do_it run --task "What does this project do?"             --role navigator
do_it run --task "Find examples of using axum extractors" --role research
do_it run --task "Add validation to src/handlers.rs"      --role developer
do_it run --task "Do all tests pass?"                     --role qa
do_it run --task "Plan the auth module refactor"          --role boss
```

### Before running

```bash
git status              # start from a clean working tree
do_it config            # verify resolved config — check llm_url, llm_backend, model
```

### Write a good task description

```bash
# Vague — bad
do_it run --task "Improve the code"

# Specific — good
do_it run --task "In src/lexer.rs the tokenize function returns an empty Vec \
  for input containing only spaces. Fix it and add a regression test." --role developer

# From a file with full context
do_it run --task tasks/issue-42.md --role developer
```

A task file can include reproduction steps, logs, expected vs actual behaviour, and any relevant constraints.

### Project-specific role prompts

```bash
mkdir -p .ai/prompts
cat > .ai/prompts/developer.md << 'EOF'
You are a Rust expert on this codebase.
Always run `cargo clippy -- -D warnings` after edits.
Always run `cargo test` after edits.
Use thiserror for error types in library code.
All public items require doc comments.
EOF
```

### Using GitHub API

Set `GITHUB_TOKEN` in your environment (classic token with `repo` scope is sufficient):

```bash
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
do_it run --task "Find all open bugs and fix the highest priority one" --role developer
```

The agent can list issues, read them, post comments, create PRs, and read file contents directly from GitHub — useful when working across repositories.

---

## Troubleshooting

**`Tool 'X' is not allowed for role 'Y'`** — the model tried to use a tool outside the role's allowlist. Either switch to `--role default` or add the tool to `.ai/prompts/<role>.md` with an explicit mention.

**`Cannot reach LLM service`** — check that `llm_url` is correct and the service is running:
```bash
# Ollama
ollama serve
curl http://localhost:11434/api/tags

# OpenAI-compatible — check the URL and port
curl http://localhost:20128/v1/models
```

**`Model 'X' not found`** (Ollama only)
```bash
ollama pull qwen3.5:cloud
ollama list
```

**`HTTP 401 Unauthorized`** — API key missing or wrong. Set `llm_api_key` in `config.toml` or the `LLM_API_KEY` environment variable.

**`HTTP 404` on chat endpoint** — `llm_url` may include an extra path. Use the base URL only — the protocol suffix (`/v1/chat/completions` for OpenAI, `/api/chat` for Ollama, `/v1/messages` for Anthropic) is appended automatically. For example use `https://api.openai.com`, not `https://api.openai.com/v1`.

**`LLM response has no JSON`** — the model responded outside of JSON format. Try a larger model, set `temperature = 0.0`, or use a role to reduce the tool count.

**`str_replace: old_str found N times`** — provide more context to make `old_str` unique:
```json
{ "old_str": "fn process(x: i32) -> i32 {\n    x + 1", "new_str": "..." }
```

**`ask_human via Telegram: no reply received within 5 minutes`** — verify the bot token and chat_id, and that you have sent `/start` to the bot. The agent continues with an error and falls back to console.

**`fetch_url: HTTP 403` or empty result** — the site blocks bots or requires JavaScript. Use direct API endpoints: `raw.githubusercontent.com` instead of `github.com`.

**`github_api: no token found`** — set `GITHUB_TOKEN` environment variable or pass `"token"` in args.

**`test_coverage: coverage backend not found`** — install `cargo-llvm-cov` or `cargo-tarpaulin`. Without them, the tool falls back to `cargo test` and reports that coverage numbers are unavailable.

**`run_script` failed or timed out** — reduce the script size, avoid heavy loops, and keep it to parsing/transform tasks. For broader automation, prefer a dedicated Rust tool or `run_command`.

**`run_command` rejects a program or env override** — this is usually intentional policy enforcement, not a runtime bug. Use a bare executable name from `PATH`, keep `timeout_secs` modest, and avoid overriding blocked environment keys.

**`run_targeted_test` only reports Rust support** — this is expected right now. The helper is intentionally Rust-first; Node/Python fallbacks are tracked as later follow-up work.

**`format_changed_files_only` finds nothing to do** — it only formats changed `.rs` files visible to `git status`. If the repo is not a git repo or only non-Rust files changed, the no-op result is expected.

**`apply_patch_preview` says `old_str` is missing or not unique** — this mirrors `str_replace` rules on purpose, so the preview stays honest about whether the eventual targeted edit is safe.

**Agent is looping** — the runtime now injects `Working Memory` and `Strategy Notes` to reduce repeated weak-tool calls, but loops are still possible. Reduce `history_window`, restart with a more specific task, or use a larger model if needed.

**Sub-agent stuck** — if `spawn_agent` takes too long, the sub-agent is limited by `max_steps`. Pass a smaller `max_steps` explicitly, or break the task into smaller pieces.

---

## Project Structure

```
do_it/
├── Cargo.toml           name="do_it", edition="2021", version="0.3.2"
├── config.toml          runtime configuration (models, Telegram, [browser])
├── README.md            project overview
├── DOCS.md              this file
├── LICENSE              MIT
├── .gitignore
├── .ai/                 agent memory (created automatically, gitignored)
│   ├── project.toml     project config (auto-scaffolded)
│   ├── prompts/         custom role prompts
│   ├── state/           plan, last_session, session_counter, external_messages
│   ├── logs/            history.md
│   ├── screenshots/     browser tool output (PNG files)
│   └── knowledge/       lessons_learned, decisions, qa_report, review_report, sub-agent results
└── src/
    ├── main.rs              bin wrapper — calls start::run()
    ├── start.rs             CLI: run | config | roles; --role flag; image detection
    ├── lib.rs               library root for integration tests
    ├── agent/               agent module (replaces agent_core/loop/tools)
    │   ├── mod.rs           pub use core::{SweAgent, StepOutcome}
    │   ├── core.rs          SweAgent struct and all field accessors
    │   ├── tools.rs         parse_action, first_line, format_args_display, project detection
    │   ├── session.rs       session_init/finish, task_state persistence, resume logic
    │   ├── prompt.rs        build_prompt, build_strategy_notes, detect_loop
    │   ├── display.rs       console/TUI output helpers, boss task-source enforcement
    │   ├── spawn.rs         spawn_agent, spawn_agents, TUI sender bridge
    │   └── loops         
    │       ├── mod.rs       run(), run_capture(), step(), loop entry points
    │       └── tests.rs     loops tests
    ├── config_struct.rs     AgentConfig, BrowserConfig, Role enum, model router, built-in prompts
    ├── config_loader.rs     config loading, global config bootstrap, prompt overrides
    ├── config_validation.rs runtime and static config validation
    ├── history.rs           sliding window context manager
    ├── task_state.rs        structured working memory for goal/actions/artifacts/blockers
    ├── redaction.rs         central redaction filter — scrubs sensitive tokens from artifact strings
    ├── shell.rs             LLM client: Ollama / OpenAI / Anthropic backends, BackendConfig, LlmClient
    ├── tui.rs               Ratatui TUI: three-panel live view, prompt widget, panic hook
    ├── validation.rs        path traversal protection and safe path resolution
    ├── tools/               runtime tools split by domain
    │   ├── core.rs          central dispatch (dispatch_with_depth) and shared helpers
    │   ├── spec.rs          tool registry: names, aliases, roles, status, prompt metadata
    │   ├── file_ops.rs      read_file, write_file, str_replace, apply_patch_preview, list_dir, find_files, search_in_files
    │   ├── commands.rs      run_command, run_targeted_test, format_changed_files_only
    │   ├── web.rs           web_search, fetch_url, github_api
    │   ├── human.rs         ask_human (TUI + Telegram), notify
    │   ├── memory.rs        memory_read, memory_write, memory_delete (namespaced keys)
    │   ├── code_analysis.rs get_symbols, outline, get_signature, find_references
    │   ├── git.rs           git_status, git_commit, git_log, git_stash, git_pull, git_push
    │   ├── browser.rs       screenshot, browser_get_text, browser_action, browser_navigate
    │   ├── workspace.rs     tree, project_map, find_entrypoints, trace_call_path, diff_repo
    │   ├── background.rs    run_background, process_status, process_list, process_kill
    │   ├── scripting.rs     run_script (Rhai sandbox)
    │   └── test_coverage.rs test_coverage with cargo llvm-cov / tarpaulin fallback
    └── prompts/             built-in role prompts compiled into the binary via include_str!
        ├── default.md
        ├── boss.md          orchestrator — memory, spawn, never writes code directly
        ├── developer.md
        ├── navigator.md
        ├── qa.md
        ├── reviewer.md
        ├── research.md
        └── memory.md
```

Global config at `~/.do_it/` (created on first run):

```
~/.do_it/
├── config.toml          global defaults (overridden by local config.toml)
├── system_prompt.md     global default prompt override
├── user_profile.md      your preferences — Boss reads on every session start
├── boss_notes.md        cross-project insights accumulated by Boss
└── tool_wishlist.md     agent-requested capabilities — review to prioritise dev work
```

### Dependencies

| Crate | Purpose |
|---|---|
| `tokio` | async runtime |
| `reqwest` | HTTP — Ollama, fetch_url, web_search, Telegram API, GitHub API |
| `serde` / `serde_json` | JSON serialization |
| `toml` | config.toml / project.toml parsing |
| `clap` | CLI argument parsing |
| `walkdir` | recursive filesystem traversal |
| `regex` | search_in_files, find_references, AST parsers |
| `base64` | image encoding for vision; GitHub file contents decoding |
| `anyhow` | error handling |
| `tracing` / `tracing-subscriber` | structured logging |