parecode 0.1.0

A terminal coding agent built for token efficiency and local model reliability
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
# PareCode — Implementation Plan

> Build a Rust CLI coding agent that matches OpenCode's baseline, then beats it on token efficiency and small-model reliability. Hyper-optimised orchestration + smart deterministic programming where a model call would be wasteful.

---

## Market Position

**The core bet:** context efficiency is the hard problem. Features are plumbing. A model drowning in 60k tokens of accumulated history fails. A model given 8k tokens of clean, relevant context succeeds — and on a 14B local model, this is the difference between working and not working.

**Why this wins:**

| Dimension | OpenCode / Cursor / Claude Code | PareCode |
|---|---|---|
| Token usage per task | 20k–60k (reactive compression, full file reads) | 3k–12k (proactive, compressed from the start) |
| Local model support | Broken on most OSS backends (Zod schemas, context bloat) | First-class — designed for Qwen3 14B, Ollama |
| Plan/execute isolation | Plans in conversation — model loses thread by step 3 | Each step: fresh context, bounded instruction, scaffold carries state |
| Loop detection | 3 identical calls before intervention | 2 calls — injects cached result immediately |
| Cost | Cloud API required; usage compounds | Works on free local inference; cloud optional |
| Enterprise / IP | Code leaves the building | Self-hosted, air-gapped capable |

**The efficiency story compounds over time.** As local models improve (Qwen4, etc), PareCode gets better for free. We're not locked to any provider's pricing decisions. And every token saved is real money: a team of 10 running 50 tasks/day at OpenCode's token rate vs PareCode's is hundreds of dollars a month difference.

**What's genuinely novel:**
- Plan/execute separation where the scaffold owns state and the model only sees one bounded step at a time. No other agent does this.
- Tool output compression that is deterministic and immediate, not a reactive LLM call at 90% capacity.
- Per-step file symbol summaries carried forward between steps — the model knows what changed without seeing implementation detail.

---

## Why OpenCode Falls Over (Validated Against Their Codebase)

| Failure | Impact |
|---|---|
| System prompt bloat (one user hit 217,905 tokens) | Entire context consumed before conversation starts |
| Full file reads (up to 50KB per read) | Most content irrelevant, wastes model attention |
| Glob returns 100K+ tokens per call | Known issue, unfixed |
| Tool outputs never compressed mid-session | History balloons; blunt compaction fires at 90% |
| Compaction is reactive LLM call | Costs tokens to save tokens |
| Doom loop detection fires at 3 identical calls | Already wasted 3 tool round-trips |
| Zod schemas break on OSS backends (SGLang, K2.5) | Tools literally don't work on many local models |
| No per-step context isolation | Small models lose the plan by step 3 |
| Hidden cheap-model calls (Haiku) | Unexpected cost accumulation |
| No conversation persistence | Can't resume, roll back, or compare sessions |

---

## ✅ Phase 1 — Match OpenCode — COMPLETE

**`src/client.rs`** — Ollama/OpenAI-compatible HTTP client
- POST to `/v1/chat/completions` with streaming SSE
- Parse streamed tool call deltas into complete tool calls
- `stream_options: {include_usage: true}` for Ollama token counts
- Config: endpoint URL + model from `~/.config/parecode/config.toml`

**`src/tools/`** — Core tool set with lean handwritten JSON schemas
- `read_file`, `write_file`, `edit_file`, `bash`, `search`, `list_files`
- All schemas minimal — work correctly on Qwen3 14B, Ollama backends

**`src/agent.rs`** — Agent loop with streaming output

**`src/main.rs`** — CLI via `clap` — `parecode "task"`, `--dry-run`, `-v`, `--profile`, `--init`, `--profiles`

---

## ✅ Phase 2 — Easy Wins That Beat OpenCode — COMPLETE

### ✅ 2a. Tool Output Compression (`src/history.rs`)
- `read_file` content kept full in model context (needed for editing)
- Separate `display_summary` (one-liner) shown in TUI sidebar
- Budget enforcer compresses older read results when threshold hit
- On `edit_file` failure: file content injected into error response so model can self-correct without re-reading

### ✅ 2b. File Read Cache (`src/cache.rs`)
- All reads cached; cache-hit returns content instantly with age note
- Invalidated on write/edit

### ✅ 2c. Proactive Token Budget (`src/budget.rs`)
- Enforced before every API call (not reactive at 90%)
- Pass 1: compress older tool results, leave most recent intact
- Pass 2: trim oldest turns (protects index 0 — original task)
- Loop detection fires at 2 identical calls (vs OpenCode's 3)

### ✅ 2d. Smart File Excerpting (`src/tools/read.rs`)
- Max 150 lines by default; explicit `line_range` for full access
- `symbols=true` mode returns function/struct/class index with line numbers — lets model navigate large files without reading them

### ✅ 2e. Lean Tool Schemas
- Handwritten, minimal — no Zod, no extra metadata

### ✅ Additional: Ratatui TUI (`src/tui/`)
- Full alternate-screen TUI with conversation history, status bar, input
- Context % and token count in status bar
- `@` file picker overlay (fuzzy search)
- **Attached files panel** — `@` adds file as a pinned chip above input; content injected as preamble in every agent call; protected from budget eviction; Tab/Del to manage chips
- Ctrl+P command palette (`/cd`, `/profile`, `/profiles`, `/clear`, `/ts`, `/quit`)
- Agent cancellation (Ctrl+C)
- Conventions loading: auto-discovers `AGENTS.md` / `CLAUDE.md` / `.parecode/conventions.md`

### Observed results vs OpenCode
- ~2.3k tokens for a file analysis task that cost OpenCode 20k+ tokens
- ~443 tokens for a simple query (OpenCode spikes to 10k immediately)
- Model successfully self-corrects edit_file failures without re-reading
- Attached files prevent the "context forget" that caused OpenCode to loop

---

## ✅ Phase 3 — Multi-Turn Conversation Persistence — COMPLETE

### ✅ 3a. In-session conversation history (`src/sessions.rs`)
- `Vec<ConversationTurn>` in `AppState` accumulates across agent runs
- Each turn: user message, agent response text, tool summary
- Prior context injected as preamble on each new run (8k token cap — ~25% of a 32k window)
- Short reply hint: model told "yes/ok/go ahead" are responses to the previous message

### ✅ 3b. Persistent conversation storage
- JSONL files in `~/.local/share/parecode/sessions/{ts}_{basename}.jsonl`
- Auto-resumed on startup for the matching cwd

### ✅ 3c. Session management
- `/sessions`, `/resume [n]`, `/rollback [n]`, `/new` slash commands
- `Ctrl+H` session browser overlay — date, project, turn count, first message preview
- Status bar indicator: `◈ N↩` shows active turn count and resumed state

### ✅ 3d. Rollback
- Active turn pointer — rolling back branches without deleting archived turns

---

## ✅ Phase 4 — Plan/Execute Mode — COMPLETE

**The core architectural differentiator.** Plan is a data structure owned by the scaffold. Each step gets fresh, minimal context. The model only ever sees the current step. The scaffold carries all state.

### ✅ Plan data structure (`src/plan.rs`)
- `Plan { task, steps, current, status, created_at, project }`
- `PlanStep { description, instruction, files, verify, status, tool_budget, user_annotation, completed_summary }`
- `Verification`: None | FileChanged | PatternAbsent | CommandSuccess | BuildSuccess

### ✅ Per-step context isolation
- Fresh `messages` vec per step — zero bleed from previous steps
- Only `step.files` loaded as attached context
- Single bounded instruction to model

### ✅ Step carry-forward summaries
- After each step passes, `summarise_completed_step()` scans modified files deterministically
- Extracts top symbols (fn/struct/class/def) from recently modified files
- Result: `"modified src/auth.rs [validate_token, AuthError]; modified src/handler.rs [handle_request]"`
- Injected into next step's preamble — model knows exact function names without seeing implementation
- Zero model calls, ~5 lines of context per completed step

### ✅ TUI plan review
- `/plan "task"` — generate plan, enter inline review mode
- `↑↓` navigate steps, `e` annotate, `a` approve, `Esc` cancel
- Annotations injected as `"\n\nUser note: {}"` into the step instruction
- All steps must be individually approved before execution begins
- Per-step ✓/✗ shown in conversation history during execution

### ✅ Plan persistence
- Plans saved to `.parecode/plans/{timestamp}-plan.json` (JSON, machine-readable)
- Plans written to `.parecode/plan.md` (Markdown, human-readable — open in editor while plan runs)
- Failed plans paused at the failing step, resumable

### ✅ Plan UX polish
- Overlay closes immediately on Enter confirm — mode transitions to `PlanRunning` synchronously, no async lag
- Planning message shows which model is thinking when `planner_model` is configured: `⟳ planning via claude-opus-4-6: task`

---

## ✅ Phase 5 — Agent Reliability — COMPLETE

### ✅ 5a. `recall` tool
- Schema: `{ tool_call_id?, tool_name? }` — either works
- Handled before dispatch in `agent.rs` — not recorded in history (prevents recursion)
- `recall_by_name()` fallback for local models that don't echo IDs reliably

### ✅ 5b. Bash timeout (async)
- `tokio::process::Command` + `tokio::time::timeout`
- `execute_tool` is now `async fn`
- `MAX_OUTPUT_LINES` = 200

### ✅ 5c. Smart bash summarisation
- Error-line aware: keeps `error:`, `FAILED`, `panic` lines (up to 20)
- Build check failures pass through history compression unchanged
- Build check success prompts model to verify via search before declaring done

### ✅ 5d. Fuzzy `edit_file` matching
- CRLF → LF → per-line trim() → per-line trim_end() cascade
- Only applies if exactly one candidate found
- On failure: ±15 line context hint instead of full file dump

### ✅ 5e. `write_file` existence guard
- `overwrite: bool` required to replace existing files
- Prevents silent overwrites by local models that don't track what exists

### ✅ 5f. Token counting fix
- `s.chars().count() / 4` — correct for multi-byte Unicode
- Prevents premature compression on non-ASCII codebases

### ✅ 5g. Unicode panic fix
- `format_args_summary` now uses `.chars().take(N).collect()` not `&s[..N]`
- Prevents panic on multi-byte chars in tool arg display (∑, Chinese, emoji)

### ✅ 5h. System prompt hardening
- "Do not ask permission mid-task — make necessary changes and report what you did"
- "For replacement tasks, search to confirm no instances remain before declaring done"
- "Do not re-read files already read this session"
- Auto build-check after every file mutation (`cargo check -q` / `tsc --noEmit`)

---

## ✅ Phase 5i — Sub-agent model split — COMPLETE

`planner_model` config field per profile:
- If set, plan generation uses `planner_model`; step execution uses `model`
- Enables Opus plan + Haiku execute — high reasoning where it counts (planning), cheap tokens where they're plentiful (execution)
- Planning is ~1–2k tokens; execution is 10–40k. The split is economically significant.
- Falls back to `model` if `planner_model` not set — zero behaviour change for existing configs
- See `CONFIG.md` for full examples

---

## ✅ Phase 6a — MCP Client — COMPLETE

Full Model Context Protocol client (`src/mcp.rs`):
- Spawns any MCP server process (Node/Python/binary) configured per-profile
- JSON-RPC 2.0 over stdin/stdout with proper `initialize` / `notifications/initialized` handshake
- Dynamic tool discovery via `tools/list` — tools appear as `<server>.<tool>` (e.g. `brave.brave_web_search`)
- Dispatched transparently alongside native tools — model sees one unified tool list
- Multiple servers per profile, all running concurrently
- Silently skips servers that fail to start (logs to stderr)
- Config in `config.toml` per-profile:
  ```toml
  [[profiles.local.mcp_servers]]
  name    = "brave"
  command = ["npx", "-y", "@modelcontextprotocol/server-brave-search"]
  [profiles.local.mcp_servers.env]
  BRAVE_API_KEY = "BSA..."
  ```
- Commented examples in default config: Brave Search, filesystem, fetch (`uvx mcp-server-fetch`)

---

## Phase 6b — Distribution

The Rust binary is PareCode's biggest distribution advantage. Every competitor requires a language runtime: OpenCode and Claude Code need Node.js, Aider needs Python, oh-my-opencode needs both. PareCode is a single static binary — zero dependencies, starts in <10ms. The goal: install to productive in under 60 seconds, better than any competitor.

### 6b-i. Binary releases with cargo-dist SECOND NEXT - TEST MYSELF - install setup, qwen scenarios, then Claude

**cargo-dist** automates the entire release pipeline from a single `dist init`. On every version tag push, GitHub Actions builds all targets, produces platform installers, updates the Homebrew tap, and creates the GitHub Release — zero manual steps.

**Target matrix:**
| Target | Platform | Notes |
|---|---|---|
| `x86_64-unknown-linux-musl` | Linux x86_64 | Statically linked — works on any Linux, any glibc version |
| `aarch64-unknown-linux-musl` | Linux ARM64 | AWS Graviton, Raspberry Pi, ARM servers |
| `x86_64-apple-darwin` | macOS Intel | Older Macs |
| `aarch64-apple-darwin` | macOS Apple Silicon | M1/M2/M3 — now majority of Macs |
| `x86_64-pc-windows-msvc` | Windows x86_64 | Primary Windows target |

**musl is non-negotiable for Linux.** Statically linked = no "error while loading shared libraries" ever. This eliminates the most common class of post-install failures on Linux.

**Cargo.toml / dist.toml configuration:**
```toml
[workspace.metadata.dist]
cargo-dist-version = "0.30.4"
ci = ["github"]
installers = ["shell", "powershell", "homebrew"]
tap = "PartTimer1996/homebrew-parecode"
targets = [
    "x86_64-unknown-linux-musl",
    "aarch64-unknown-linux-musl",
    "x86_64-apple-darwin",
    "aarch64-apple-darwin",
    "x86_64-pc-windows-msvc",
]
publish-jobs = ["homebrew"]

[profile.dist]
inherits = "release"
lto = "thin"
```

**Release process:** `git tag v0.1.0 && git push --tags` — that's it.

**What cargo-dist produces automatically:**
- GitHub Release with 5 platform binaries + SHA256 checksums for each
- Shell installer script (`parecode-installer.sh`) with checksum validation
- PowerShell installer script (`parecode-installer.ps1`) for Windows
- Homebrew formula pushed to `PartTimer1996/homebrew-parecode` tap

### 6b-ii. Install methods (README-ready)

```bash
# macOS / Linux — one-liner, zero dependencies
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/PartTimer1996/parecode/releases/latest/download/parecode-installer.sh | sh

# macOS — Homebrew
brew install PartTimer1996/parecode/parecode

# Windows — PowerShell
irm https://github.com/PartTimer1996/parecode/releases/latest/download/parecode-installer.ps1 | iex
```

**Competitive install comparison:**
| Tool | Install command | Requires |
|---|---|---|
| **PareCode** | `curl ... \| sh` | Nothing |
| OpenCode | `npm install -g opencode` | Node.js |
| oh-my-opencode | npm + manual agent config | Node.js + setup time |
| Claude Code | `npm install -g @anthropic-ai/claude-code` | Node.js |
| Aider | `pip install aider-chat` | Python |
| Plandex | `curl ... \| bash` | Nothing (also compiled binary) |

PareCode and Plandex are the only zero-dependency installs in the category.

### 6b-iii. Distribution channel rollout

**Week 1 (ship with first release):**
- GitHub Releases (cargo-dist, automated)
- Shell installer (cargo-dist, automated)
- Homebrew tap (cargo-dist, automated)

**Week 2:**
- **AUR** (`parecode-bin`) — binary PKGBUILD, targets Arch Linux developers. Highly technical early-adopter audience. Minimal maintenance: update `pkgver` + `sha256sums` on each release.
- **WinGet** — pre-installed on Windows 11. `wingetcreate new <release-url>` generates the manifest; `vedantmgoyal9/winget-releaser` GitHub Action automates future updates.
- **Shell completions** — generate for bash/zsh/fish via clap's `generate` feature. Included in the tarball, install instructions in README. Makes PareCode feel native.

**Later (when users ask):**
- `flake.nix` for Nix users — provide in repo, they can `nix profile install github:PartTimer1996/parecode`
- nixpkgs submission — often happens organically when the tool gains traction
- deb/rpm — only worth building if significant Ubuntu/Fedora user base requests it

**Do not bother:**
- Snap (sandboxing breaks tool, wrong audience)
- Flatpak (designed for GUI apps)
- Docker (not a server application)
- npm/pip wrappers (adds maintenance surface for marginal gain)

### 6b-iv. `parecode update` self-upgrade command

curl-installed users have no package manager to update through. `parecode update` re-runs the install script against latest, replaces the binary in-place.

```
$ parecode update
Checking for updates... parecode 0.1.0 → 0.2.1 available
Downloading parecode 0.2.1 for aarch64-apple-darwin... ✓
Verifying checksum... ✓
Replacing /home/user/.local/bin/parecode... ✓
parecode 0.2.1 installed.
```

Implementation: `src/main.rs` — `--update` subcommand, fetches GitHub API `/releases/latest`, compares version, re-runs platform-specific installer script.

### 6b-v. Benchmarking suite

Run on the tasks that caused Qwen3 14B to loop in OpenCode. Record token counts, tool calls, success rate, wall time. Publish results — this is the "viral moment" that proves the token efficiency claim.

| Task | Target |
|---|---|
| `"remove all console.log from src/"` | ≤ 5 tool calls, < 5k tokens |
| `"rename columns → allColumns in data-table.component.ts"` | No re-reads, clean 1-shot |
| `"reorganise SCSS in header.component.scss"` | < 3k tokens |

Model matrix: Qwen3 14B (Ollama), Mistral 7B, DeepSeek-Coder, Claude Sonnet (API). Publish side-by-side with OpenCode numbers.

### 6b-vi. Expose PareCode as an MCP server (`--mcp` flag)
- JSON-RPC over stdin/stdout, `--mcp` flag
- Makes PareCode usable as a backend from any MCP-compatible IDE (Cursor, Zed, etc.)
- Reuses all existing tool infrastructure

### 6b-vii. VSCode extension (trivial packaging, large surface area)
- `package.json` + launch PareCode subprocess + pipe events to webview
- Reuses all existing TUI event infrastructure
- Gives access to VSCode's file tree, git integration, diff viewer

---

## Phase 6c DONE! — First-Run Experience (install → productive in 60 seconds)

**The target flow:**
```
install → parecode → interactive setup → working
```

**Nobody's current flow:**
```
install → run → error: no config → read docs → create config → run again → maybe works
```

PareCode should be the tool that just works.

### 6c-i. First-run detection and setup wizard

When `parecode` is launched with no config file present, run an interactive setup wizard instead of erroring:

```
Welcome to PareCode ⚒

No config found at ~/.config/parecode/config.toml. Let's get you set up.

? How do you want to run PareCode?
  ❯ Local (Ollama) — free, private, works offline
    Anthropic Claude — best quality, requires API key
    OpenAI — GPT-4o, requires API key
    OpenRouter — any model, one API key
    Skip — I'll configure manually

[If Ollama selected — after silently probing localhost:11434]
  Checking for Ollama... ✓ found (3 models installed)

? Which model?
  ❯ qwen3:14b   (recommended for coding tasks)
    qwen2.5-coder:14b
    llama3.1:8b

Config written to ~/.config/parecode/config.toml ✓
Running /init to detect project context... ✓ written to .parecode/conventions.md

Ready. What would you like to build?
▶
```

**Auto-detection shortcuts (skip the wizard entirely):**
- If `ANTHROPIC_API_KEY` env var present → auto-configure Claude profile, skip wizard
- If `OPENAI_API_KEY` env var present → auto-configure OpenAI profile, skip wizard
- If Ollama responds at `localhost:11434` with models → default to local, only ask which model
- If only one model installed → skip even that question, just use it

**Implementation:**
- `src/setup.rs` — `run_setup_wizard() -> ResolvedConfig` — terminal prompts (no TUI, runs before TUI starts)
- `src/main.rs` — check `config_path().exists()` before launching TUI; if missing, run wizard first
- Wizard uses `dialoguer` crate for interactive prompts (or hand-rolled crossterm prompts to avoid extra dependency)

### 6c-ii. Ollama auto-detection

On every startup (not just first run), silently probe `localhost:11434/api/tags` (100ms timeout). If Ollama is running:
- Show `◉ Ollama` indicator in TUI status bar when using local profile
- If user is on a cloud profile but Ollama is also running: show soft hint `◉ Local models available — /profile local to switch`
- On first run: Ollama presence triggers local-first default in the wizard

### 6c-iii. `/init` auto-prompt on new project

On first `parecode` launch in a directory with no `.parecode/` folder:

```
No project conventions found.
Run /init to prime PareCode with your project's stack and style? [Y/n]
```

If Y: runs `/init` inline (see Phase 6i), shows result, asks to save. If N: continues normally, can run `/init` later.

### 6c-iv. `parecode update` and version awareness

Status bar shows version and available update indicator:
```
parecode 0.1.0 · new version 0.2.1 available — run `parecode update`
```

Checked once per session against GitHub API (cached for 24h in `~/.local/share/parecode/update-check`). Never blocks startup.

### 6c-v. Shell completion install hint

On first run after install, if completions aren't installed:
```
Tip: install shell completions for tab-completion of commands and flags:
  parecode --completions zsh > ~/.zfunc/_parecode   # zsh
  parecode --completions bash > ~/.bash_completion.d/parecode  # bash
  parecode --completions fish > ~/.config/fish/completions/parecode.fish  # fish
```

Shown once, suppressed after. Completions generated via clap's `generate` feature, shipped in release tarballs.

### ✅ 6d. Smarter file selection — COMPLETE

`src/index.rs` — project symbol index, built on every `/plan` invocation (zero model calls):
- Walks project files (Rust, TS/JS, Python, Go, C/C++), extracts top-level symbols: `fn`, `struct`, `enum`, `trait`, `impl`, `class`, `def`, `func`, `const`
- Caps at 500 files, < 100ms, pure regex/text scan
- Injected into plan prompt as a compact file map — model sees real symbol names and paths, not a directory listing
- Post-parse resolution: `files: ["validate_token"]` → scaffold resolves to `src/middleware/jwt.rs` via index
- Model names what it needs; scaffold resolves where it lives
- 7 unit tests: Rust/TS/Python extraction, symbol resolve, ident parsing

### 6e. Mechanical mode (`--mechanical`)
- Pure grep/sed for pattern tasks, zero model calls
- `parecode --mechanical "replace foo with bar in src/"` — explicit flag only, never auto-routed
- For rename/replace tasks this is 100x faster and cheaper than any model approach

### ✅ 6f. Telemetry & analytics — COMPLETE
- `src/telemetry.rs` — `SessionStats` (live) + `TaskRecord` (persisted)
- Per-task: input/output tokens, tool calls, compression ratio, model, profile
- Flushed to `.parecode/telemetry.jsonl` after every completed agent run (JSONL, appendable, aggregatable)
- **Always-visible stats bar** in TUI — second line below status bar, no toggle needed:
  - `∑ N tasks  X.Xktok  avg Y/task  Z tool calls  W% compressed  peak P%`
  - Dimmed/purple palette so it doesn't compete with active status bar
  - Budget enforcement count and peak context % tracked separately
- Foundation for a hosted dashboard / benchmarking comparisons

---

## ✅ Phase 6g — Hash-Anchored Edits (correctness) — COMPLETE

**The single biggest correctness improvement available.** Inspired by oh-my-opencode's hash-anchored edit validation, which moved task success from 6.7% → 68.3% on complex tasks. Stale-line edits — where the file has shifted since it was read — are the most common silent failure mode.

**How it works:**a
- `read_file` output annotates each line with a short content hash: `42#a3f: fn validate_token(...)`
- Hashes are compact (4–5 chars), placed at the start of the line number field — subtle, not noisy
- `edit_file` accepts an optional `anchor` hash alongside `old_str`
- Before applying: verify the hash still matches the line at the expected position
- If hash mismatch → return error: `"Anchor mismatch at line 42 — file has changed since last read. Re-read to get current hashes."`
- If no anchor provided → fall through to existing fuzzy matching (backwards compatible)

**Implementation:**
- `src/tools/read.rs` — hash generation (CRC32 or FNV-1a of the line content, base36, 4 chars)
- `src/tools/edit.rs` — anchor verification before fuzzy match
- `src/cache.rs` — cache stores hashes alongside content; invalidated on write/edit
- Hash format: `{line_num}#{hash}:` prefix — stripped before content is used

**Design constraints:**
- Hashes must be invisible to the model's reasoning (it should use them for anchoring, not describe them)
- System prompt addition: `"Each line in read_file output is prefixed {line}#{hash}: — use the hash as an anchor in edit_file calls to prevent stale-line errors"`
- Backwards compatible: anchor param is optional; existing edit calls continue to work

---

## ✅ Phase 6h — Hooks System — COMPLETE

**First-class workflow automation.** Config-driven pre/post hooks that run deterministic shell commands at key points in the agent lifecycle. The key innovation beyond a simple CI config: `on_edit` output is **injected directly into the model's tool result**, so the model sees compile/lint errors immediately and can self-correct without an extra read-file round-trip.

**Hook events:**
| Event | Trigger | Injection | Common use |
|---|---|---|---|
| `on_edit` | After any `write_file` or `edit_file` call | ✓ Injected into tool result | `cargo check -q`, `tsc --noEmit` |
| `on_task_done` | After every completed agent run | TUI only | `cargo test -q 2>&1 \| tail -5` |
| `on_plan_step_done` | After each plan step completes | TUI only | lint, format |
| `on_session_start` | TUI startup | TUI only | `git pull`, environment check |
| `on_session_end` | TUI quit | stderr only | `git status --short` |

**Auto-detection (the key UX win):**

On first run with no hooks in config, PareCode scans the project root for language markers and auto-configures sensible defaults — no manual setup required:
| Marker | `on_edit` | `on_task_done` |
|---|---|---|
| `Cargo.toml` | `cargo check -q` | `cargo test -q 2>&1 \| tail -5` |
| `tsconfig.json` | `tsc --noEmit` | — |
| `go.mod` | `go build ./...` | — |
| `pyproject.toml` / `setup.py` + ruff in PATH | `ruff check .` | — |

Detection runs **once** then writes a `[profiles.{name}.hooks]` section into `~/.config/parecode/config.toml` (append-only, preserving all comments). The written block includes active detected commands plus all 5 event types commented out as examples — so users can see and edit every option. Subsequent startups read from config; detection never repeats.

**Config (per-profile):**
```toml
[profiles.local.hooks]
on_edit      = ["cargo check -q"]
on_task_done = ["cargo test -q 2>&1 | tail -5"]
# on_plan_step_done = []
# on_session_start  = []
# on_session_end    = []
```

Set `hooks_disabled = true` in a profile to permanently suppress all hooks including auto-detected ones.

**UX behaviour:**
- Startup: `⚙ hooks  on_edit: cargo check -q  ·  on_task_done: cargo test -q …  (/list-hooks for details)` shown as a system message so hooks are never invisible
- `on_edit` output appended inline to the model's tool result — model sees `⚙ \`cargo check -q\` (exit 1): error[E0308]: …` and self-corrects immediately
- Hook output rendered in TUI as dimmed `⚙` block; amber on non-zero exit
- 30s timeout per hook; 50-line output cap to avoid context bloat
- `/hooks on|off` — per-session toggle (survives across tasks within a session)
- `/hooks` alone shows current status and usage hint
- `/list-hooks` — full breakdown of all 5 event types with their commands, toggle state, and profile-level disabled status; includes config file edit hint
- `hooks_disabled = true` in profile → permanent kill switch, overrides `/hooks on`

**Implementation:**
- `src/hooks.rs` — `HookConfig { on_edit, on_task_done, on_plan_step_done, on_session_start, on_session_end }`, `HookResult { output, exit_code }`, `detect_language_hooks()`, `write_hooks_to_config(profile_name)`, `run_hook(cmd) -> HookResult`; `HookConfig::summary()` (one-liner for startup), `HookConfig::detail()` (multi-line for `/list-hooks`)
- `src/config.rs` — `hooks: HookConfig` and `hooks_disabled: bool` added to `Profile` and `ResolvedConfig`, both `#[serde(default)]` for backwards compatibility
- `src/agent.rs` — `AgentConfig { hooks: Arc<HookConfig>, hooks_enabled: bool }`; after each successful mutating tool call, runs `on_edit` hooks and appends output to `result_content`; after the main loop runs `on_task_done` hooks (TUI display only)
- `src/tui/mod.rs` — `UiEvent::HookOutput { event, output, exit_code }`, `ConversationEntry::HookOutput { event, output, success }`, `AppState.hooks_enabled`; hook bootstrap in `event_loop` (calls `write_hooks_to_config`, updates `resolved.hooks` in-place); `resolve_hooks()` helper gates on `hooks_enabled`/`hooks_disabled`; `on_session_start` hooks fire as `tokio::spawn` after `ui_tx` created; `on_session_end` hooks run synchronously before returning; `on_plan_step_done` hooks fire in `launch_plan` after each passing step
- `src/tui/render.rs` — `ConversationEntry::HookOutput` rendered as dimmed `⚙ on_edit ✓` / amber `⚙ on_edit ✗` with up to 10 lines of output

---

## ✅ Phase 6i — `/init` Command — COMPLETE

**One-shot project context priming.** Walks the project and auto-generates `.parecode/conventions.md` from existing project files. Eliminates manual conventions setup for new projects.

**Sources (in priority order):**
1. `README.md` — first 50 lines (project description, stack, install)
2. `Cargo.toml` / `package.json` / `pyproject.toml` / `go.mod` — name, language, key dependencies
3. `AGENTS.md` / `CLAUDE.md` — if already exists, merge rather than overwrite
4. `.eslintrc` / `rustfmt.toml` / `pyproject.toml [tool.ruff]` — style rules detected
5. Test directory structure — infer test runner from `jest.config`, `pytest.ini`, `#[cfg(test)]`

**Output format (`.parecode/conventions.md`):**
```markdown
# Project: my-app
Language: TypeScript (Bun runtime)
Test runner: `bun test` — tests in `src/__tests__/`
Lint: `eslint src/` — run after edits
Key dependencies: React 19, Drizzle ORM, Hono

## Style
- Prefer `const` over `let`
- No default exports
- Zod for all external input validation
```

**TUI integration:**
- `/init` slash command — runs inline, shows progress, opens result in pager overlay for review/edit before saving
- On first `parecode` run in a new directory (no `.parecode/` present): prompt "No conventions found. Run `/init` to prime project context? [y/N]"
- `parecode --init` CLI flag (already exists for config) — extend to also run project init if in a project directory

**Implementation:**
- `src/init.rs` — `run_project_init(cwd) -> String` — pure text extraction, no model calls
- `src/tui/mod.rs` — `/init` command handler, first-run prompt

---

## ✅ Phase 6j — Cost Estimation in Plan Overlay — COMPLETE

**Pre-task cost transparency.** Before running a plan, show an estimated token cost and (optionally) API cost. Nobody does this. Users burned $638+ in 6 weeks on AI agents without forewarning.

**Estimation method (no model call, heuristic):**
- Per step: `base_tokens (500) + sum(file_sizes_in_step / 4) + instruction_len / 4`
- Total: `sum(step_estimates) × 1.3` (overhead factor for tool results and responses)
- API cost: `total_tokens × rate_per_token` — rates configured per-profile, or use known defaults (Haiku: $0.25/Mtok input)

**Plan overlay addition:**
```
┌─ Plan: add JWT authentication ────────────────────────┐
│ 4 steps  ·  est. 12k–18k tokens  ·  ~$0.004 at Haiku │
│                                                        │
│ ▶ Step 1: Add JWT dependency to Cargo.toml            │
│   Step 2: Implement token validation middleware        │
│   ...                                                  │
```

**Config:**
```toml
[profiles.claude]
cost_per_mtok_input  = 0.25   # optional, enables cost display
cost_per_mtok_output = 1.25
```

**Implementation:**
- `src/plan.rs` — `estimate_plan_cost(plan, index) -> CostEstimate { tokens_low, tokens_high, usd }`
- `src/tui/render.rs` — add estimate row to plan overlay header
- `src/config.rs` — `cost_per_mtok_input/output` optional fields on `Profile`

---

## ✅ Phase 6k — Quick Mode / Tiered Autonomy — COMPLETE

**Right-sized agent for right-sized tasks.** The full agent loop (plan → load context → multi-turn tool loop → verify) is overkill for a one-line fix. Quick mode skips the overhead entirely.

**Trigger:**
- `parecode --quick "task"` — explicit flag
- Auto-detect heuristic (opt-in via config `auto_quick = true`): task < 20 words, no file `@` attachments, no `/plan` prefix → quick mode
- `/quick "task"` in TUI

**Quick mode behaviour:**
- Single API call — no multi-turn loop
- No plan generation, no step isolation
- Context: system prompt + task only (no file loading, no session history)
- Tools available: `edit_file`, `bash` (read-only commands only), `search`
- Max 1 tool call before returning to user
- Token target: < 2k tokens total
- TUI: shows `⚡ quick` badge in status bar instead of spinner

**When NOT to use quick mode:**
- Task contains words like "refactor", "add feature", "implement", "plan" → warn and suggest normal mode
- Task references multiple files → warn

**Implementation:**
- `src/agent.rs` — `run_quick(task, config) -> AgentResult` — simplified single-shot path
- `src/main.rs` — `--quick` flag, auto-detect logic
- `src/tui/mod.rs` — `/quick` command, badge in status bar


## Phase 6l - DONE

Simple for / autocomplete show options, similar to @, simple yet massive for UX 

---

## ✅ Phase 6m — Git Integration — COMPLETE

**Every competitor has git integration.** Aider's entire edit model is built on git diffs. Claude Code auto-commits. OpenCode has git tools. For a tool that modifies files, not having automatic checkpoints is a safety gap users will notice immediately — one bad edit with no easy undo and you've lost a user forever.

### ✅ 6m-i. Auto-checkpoint before tasks
- Before every agent run, `git add -A && git commit --no-verify -m "parecode: checkpoint before \"<task>\""` if tree is dirty
- Clean tree → record HEAD hash as checkpoint (zero cost, no commit created)
- `--no-verify` bypasses user pre-commit hooks — checkpoints must never be blocked by lint
- Skip silently if not in a git repo

### ✅ 6m-ii. Post-task diff display
- After every completed agent run, `⎇ N files changed — press 5 to review, d to diff, /undo to revert` in chat
- `d` key from any tab opens full-screen syntax-coloured diff overlay (green/red/cyan, scroll)
- `/diff` command switches to Git tab + opens diff overlay
- **Bug fixed**: diffs compare checkpoint against working tree (`git diff <hash>`), not commit-to-commit (`git diff <hash> HEAD`)

### ✅ 6m-iii. Undo via git
- `/undo` slash command — opens interactive checkpoint picker in Git tab (↑↓ select, Enter revert, Esc cancel)
- `u` key in Git tab opens the same picker
- `UndoPicker` mode: full-area checkpoint list with hash, age, message columns; amber/orange danger palette
- Warning bar: `⚠ git reset --hard — this cannot be undone`
- After undo: clears checkpoint hash, diff content, and stat so stale data doesn't linger

### ✅ 6m-iv. Auto-commit on task success (opt-in)
- Config: `auto_commit = true` in profile (default: false)
- On successful task completion: `git add -A && git commit --no-verify -m "<prefix><task summary>"`
- `auto_commit_prefix = "parecode: "` configurable

### ✅ 6m-v. Git-aware context
- `git status --short` injected into system prompt preamble when `git_context = true` (default)
- Lightweight — model knows which files have uncommitted changes without a tool call

**Implementation:**
- `src/git.rs` — `GitRepo { root: PathBuf }`, `checkpoint()`, `undo()`, `diff_stat_from()`, `diff_full_from()`, `auto_commit()`, `status_short()`, `list_checkpoints()`, `is_git_repo(path) -> bool`
- Uses `std::process::Command` — no libgit2, keeps binary lean
- `src/tui/git_view.rs` — Git tab: checkpoint header, diff stat, undo picker overlay
- `src/tui/overlays.rs` — `draw_diff_overlay()` — full-screen syntax-coloured diff viewer
- `src/tui/mod.rs` — `/undo`, `/diff` commands, `UndoPicker` mode, `UiEvent::GitChanges/GitAutoCommit/GitError`
- `src/config.rs` — `auto_commit`, `auto_commit_prefix`, `git_context` on `Profile`

**Config:**
```toml
[profiles.local]
git_context = true                # inject git status into system prompt; enables checkpoints
auto_commit = false               # default — don't auto-commit
auto_commit_prefix = "parecode: "   # prefix for auto-commit messages
```

---

## ✅ Phase 6n — Diff/Patch Edit Mode — COMPLETE

**More token-efficient editing for multi-hunk changes.** The current `edit_file` tool uses search-and-replace (`old_str` → `new_str`), which works well for single edits but becomes expensive for multi-hunk changes — the model must send the full old content and full new content for each hunk. A unified-diff mode sends only the changes, which aligns directly with PareCode's efficiency thesis.

**Aider proved this works.** Their unified-diff edit format reduced token usage by 30-50% on multi-hunk edits compared to search-and-replace, with comparable accuracy on capable models. The key insight: models are already trained on diff output — it's a natural format for them.

### ✅ 6n-ii. Adaptive tool selection
- System prompt guidance: "Use `edit_file` for single-location changes. Use `patch_file` for multi-hunk edits or when changing multiple related locations in the same file."
- Both tools remain available — model chooses based on task

### ✅ 6n-iii. Fuzzy patch application
- 3-tier cascade: exact match → whitespace-normalised → hint-biased on multiple candidates
- Context lines used for anchoring — if context matches but line numbers are off, apply at the matched location
- Critical for local models that produce slightly incorrect line numbers in `@@` headers

**Implementation:**
- `src/tools/patch.rs` — `parse_hunks()`, `apply_hunk()`, `find_needle()` with 3-tier fuzzy matching; 6 unit tests
- `src/tools/mod.rs` — registered in `all_definitions()`, `is_native()`, `dispatch()`
- `src/agent.rs` — system prompt guidance, `is_mutating` check, hook/telemetry arm

---

## Phase 6o — Multi-File Awareness via Git - We can do this last - cargo and typescript compilars will work quite well without this for now

**Leverages Phase 6m's git integration to detect and handle cross-file breakage.** Currently, when a model edits `auth.rs` and breaks `handler.rs`, the only detection mechanism is the `cargo check` hook — which only works for languages with fast type-checkers. This phase makes cross-file impact visible to the model proactively.

### 6o-i. Change-impact analysis (git-powered)
- After each file edit, run `git diff --name-only` against the checkpoint to get the full list of modified files
- Cross-reference modified files against the project symbol index (`src/index.rs`): which symbols in modified files are imported/used by other files?
- If a modified symbol is referenced in files not yet touched by the model → inject a warning into the tool result:
  `"⚠ Modified \`validate_token\` in src/auth.rs — referenced by: src/handler.rs:14, src/middleware.rs:8. Consider updating these files."`
- Zero model calls — pure deterministic analysis using the symbol index + basic import/use scanning

### 6o-ii. Scope-aware file loading in plan mode
- When generating a plan, use git history to identify co-change patterns: files that are frequently modified together
- `git log --name-only --pretty=format: -50` → parse file co-occurrence matrix
- If a plan step targets `auth.rs` and history shows `auth.rs` + `handler.rs` are modified together in 60%+ of commits → auto-include `handler.rs` in the step's file list
- Surfaces as a suggestion in the plan review overlay: `"history suggests handler.rs is usually modified alongside auth.rs — include? [y/N]"`

### 6o-iii. Post-task validation sweep
- After a full agent run or plan execution completes, run a lightweight validation:
  1. `git diff --name-only` → list all modified files
  2. For each modified file: check if any exported symbol's signature changed
  3. For each changed signature: grep for usages in non-modified files
  4. If stale references found → report: `"⚠ 3 files may need updates: src/handler.rs, src/test_auth.rs, src/middleware.rs"`
- Model can then be prompted to fix these, or user can review manually
- This catches the cross-file breakage that single-file hooks miss

### 6o-iv. Git blame for context
- When reading a file for editing, optionally show recent git blame annotations for the target region
- Helps the model understand code authorship and recency: recently-changed code is more likely to be the target of a bug fix
- Exposed as `read_file` parameter: `blame: true` → adds `(3 days ago, user)` annotations to relevant lines
- Lightweight: only fetches blame for the requested line range, not the entire file

**Implementation:**
- `src/git.rs` — `changed_files()`, `co_change_matrix()`, `blame_range()`, `changed_symbols()`
- `src/index.rs` — extend with `find_usages(symbol, exclude_files) -> Vec<(path, line)>` for cross-reference scanning
- `src/agent.rs` — post-edit change-impact warning injection, post-task validation sweep
- `src/plan.rs` — co-change suggestions in plan generation
- `src/tools/read.rs` — optional `blame` parameter


## Phase 6p — TUI Visual Overhaul

**Turn the TUI from "functional terminal app" into "this looks like a real product."** Ratatui was absolutely the right choice here — it has first-class `Tabs`, `Table`, split layouts, scrollable viewports, and inline syntax highlighting via `syntect`. Everything below is achievable without changing framework. This is the phase where PareCode stops looking like a dev tool and starts looking like a product.

### 6p-i. Tab bar (top of screen) - Working pretty nicely also

Replace the current single-view layout with a tab bar across the top. Each tab is a full-screen view. `1-5` number keys or `Ctrl+Tab` to switch.

```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─  Git ─┬─ 📊 Stats ─┬─ 📋 Plan ─┐
│                                                              │
```

| Tab | Contents | Key |
|---|---|---|
| DONE - **Chat** (default) | Current conversation view — what exists today | `1` |
| Mostly - DONE - **Config** | Profile switcher, hooks status, MCP servers, conventions preview | `2` |
| NOT DONE - **Git** | Diff viewer, commit history, checkpoint list, undo controls | `3` |
| Needs fixed - **Stats** | Telemetry dashboard — session totals, per-task breakdown, cost tracking | `4` |
| Needs tested - **Plan** | Plan viewer when a plan is active — step list, status, carry-forward summaries | `5` |

**Design notes:**
- Tabs use ratatui's `Tabs` widget — already built into the library, just needs importing
- Only the Chat tab exists at launch; other tabs appear contextually (Git tab only if in a git repo, Plan tab only when a plan is active)
- Tab bar is a single row — minimal vertical space cost
- Active tab highlighted, inactive tabs dimmed
- Each tab has its own scroll state — switching tabs preserves position

### DONE 6p-ii. Session sidebar (left panel, Chat tab) - Working pretty nicely 

A collapsible sidebar on the left showing session history — like the sidebar in ChatGPT/Claude web UI. This is the single biggest UX improvement for multi-session users.

```
┌──────────┬────────────────────────────────────────┐
│ Sessions │  Chat                                  │
│──────────│                                        │
│ ▶ Today  │  You: add auth to the API              │
│  jwt auth│  ⚒ reading src/routes.ts...            │
│  fix css │                                        │
│          │                                        │
│ ▶ Yday   │                                        │
│  refactor│                                        │
│  tests   │                                        │
│──────────│                                        │
│ [+] New  │                                        │
└──────────┴────────────────────────────────────────┘
```

**Behaviour:**
- `Ctrl+B` toggles sidebar visibility (like VSCode)
- Default: hidden on terminals < 120 cols, visible on wider terminals
- Sidebar width: 20 chars fixed, or configurable
- Sessions grouped by date (Today, Yesterday, This Week, Older)
- Click/Enter on a session to resume it — replaces `/resume` for most users
- Active session highlighted
- `[+] New` at bottom to start fresh session (replaces `/new` for most users)
- Session entries show: first message preview (truncated), turn count, model used

**Implementation:**
- `src/tui/render.rs` — `Layout::default().direction(Direction::Horizontal)` split: sidebar + main chat area
- `src/tui/mod.rs` — `AppState.sidebar_visible: bool`, `AppState.sidebar_selected: usize`
- Sessions loaded from existing `~/.local/share/parecode/sessions/` JSONL files

### 6p-iii. Git tab (full diff viewer)

**The terminal diff viewer.** This is the "mad but really cool" one — and it's very doable in ratatui. `delta` and `diff-so-fancy` proved terminal diffs can look great. We don't need to shell out — we can render it natively.

```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─  Git ─┬─ 📊 Stats ─┐
│                                                   │
│  Checkpoint: parecode: before "add JWT auth"         │
│  3 files changed, +42 -8                          │
│                                                   │
│  src/auth.rs ──────────────────────────────────── │
│  @@ -12,6 +12,14 @@                               │
│    fn validate_token(token: &str) -> Result<...>  │
│  - let claims = decode(token)?;                   │
│  + let claims = decode(token)                     │
│  +     .map_err(|e| AuthError::Invalid(...))?;    │
│  + log::info!("validated: {}", claims.sub);       │
│    Ok(claims)                                     │
│                                                   │
│  [u] Undo to checkpoint  [c] Commit  [s] Stash   │
└───────────────────────────────────────────────────┘
```

**Features:**
- Syntax-highlighted diff — added lines green, removed lines red, context lines dimmed
- File headers as collapsible sections (Enter to expand/collapse a file's hunks)
- Scrollable —  `↑↓` to navigate, `Page Up/Down` for fast scroll
- Bottom action bar: `u` undo to checkpoint, `c` commit changes, `s` stash
- Checkpoint history list (left side or top selector): navigate between checkpoints
- `git diff --stat` summary at the top

**Implementation:**
- `src/tui/git_view.rs` — new module for git tab rendering
- Parse `git diff` output into structured hunks (or use `src/git.rs` from Phase 6m)
- Syntax colouring: line-prefix-based (`+` = green, `-` = red, `@@` = cyan) — no `syntect` needed for diffs
- Scrollable viewport: ratatui's built-in scroll support

### 6p-iv. Config tab (profile/hooks/MCP management) - Done, needs edit file functionality directly 

A read/edit view of the current configuration — eliminates the need to leave PareCode to edit `config.toml`.

```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─  Git ─┬─ 📊 Stats ─┐
│                                                   │
│  Profile: local (active)                          │
│  ─────────────────────────                        │
│  endpoint:       http://localhost:11434            │
│  model:          qwen3:14b                        │
│  context_tokens: 32768                            │
│  planner_model:  —                                │
│                                                   │
│  Hooks                                            │
│  ─────                                            │
│  on_edit:      cargo check -q  ✓ enabled          │
│  on_task_done: cargo test -q   ✓ enabled          │
│                                                   │
│  MCP Servers                                      │
│  ───────────                                      │
│  brave:  running (3 tools)                        │
│  fetch:  running (1 tool)                         │
│                                                   │
│  Conventions: .parecode/conventions.md (loaded)      │
│                                                   │
│  [p] Switch profile  [e] Edit config  [h] Toggle  │
└───────────────────────────────────────────────────┘
```

**Features:**
- Shows all profile fields, hooks, MCP server status (running/stopped/error + tool count)
- `p` to switch profile (triggers the existing `/profile` logic)
- `h` to toggle hooks on/off (existing `/hooks on|off`)
- `e` to open config file in `$EDITOR` (shell out, return to TUI after)
- Conventions preview — first 10 lines of loaded conventions file
- Profile list on the left if multiple profiles exist — highlight active, arrow keys to browse

### 6p-v. Stats tab (telemetry dashboard) - Generally not bad - reactivity to current session could be better

The existing stats bar is great. This tab expands it into a full dashboard — the kind of thing you screenshot and share.

```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─  Git ─┬─ 📊 Stats ─┐
│                                                   │
│  Session: 12 tasks · 4.2h · claude-sonnet         │
│                                                   │
│  Tokens        ████████████░░░░  74.2k (avg 6.2k) │
│  Tool calls    ████████░░░░░░░░  48 (avg 4/task)  │
│  Compression   ███░░░░░░░░░░░░░  22% avg          │
│  Budget hits   █░░░░░░░░░░░░░░░  3 enforcements   │
│                                                   │
│  Task breakdown:                                  │
│  ─────────────                                    │
│  #1  "add JWT auth"     12.4k tok  8 tools  ✓     │
│  #2  "fix CSS header"    3.1k tok  3 tools  ✓     │
│  #3  "rename columns"    1.8k tok  2 tools  ✓     │
│  ...                                              │
│                                                   │
│  Est. cost this session: $0.12                    │
│  vs estimated OpenCode equiv: ~$0.80              │
└───────────────────────────────────────────────────┘
```

**Features:**
- Bar charts using Unicode block characters (▏▎▍▌▋▊▉█) — no external charting needed
- Per-task breakdown with token count, tool calls, success/failure
- Running cost estimate (using profile's `cost_per_mtok` if configured)
- Comparative estimate ("vs OpenCode equivalent") — based on the 5-10x multiplier. This is the screenshot-worthy feature.
- Session totals and averages
- Export: `x` key to dump session stats to `.parecode/stats-export.json`

### 6p-vi. Plan tab (active plan viewer)

Only appears when a plan is active or was recently completed. Shows the full plan with live step status.

```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─  Git ─┬─ 📋 Plan ──┐
│                                                   │
│  Plan: add JWT authentication                     │
│  4 steps · est. 12k–18k tokens · ~$0.004         │
│                                                   │
│  ✓ Step 1: Add JWT dependency to Cargo.toml       │
│    └─ modified: Cargo.toml [jsonwebtoken]         │
│    └─ 2.1k tokens, 3 tool calls                  │
│                                                   │
│  ⟳ Step 2: Implement token validation middleware  │
│    └─ files: src/auth.rs, src/middleware.rs        │
│    └─ running... 4.2k tokens so far               │
│                                                   │
│  ○ Step 3: Add auth routes                        │
│  ○ Step 4: Integration tests                      │
│                                                   │
│  [a] Annotate step  [p] Pause  [Enter] View step  │
└───────────────────────────────────────────────────┘
```

**Features:**
- Live-updating step status (✓ complete, ⟳ running, ○ pending, ✗ failed)
- Expand a step (Enter) to see its carry-forward summary, tool calls, files modified
- Annotations visible inline
- Running token count per step and cumulative
- Plan review mode accessible from this tab (before execution starts)

### 6p-vii. Visual polish (cross-cutting)

**Syntax highlighting in chat:**
- Code blocks in model responses get language-aware syntax colouring
- Use `syntect` crate (commonly paired with ratatui) or `tree-sitter-highlight`
- Fallback: backtick-delimited blocks get monospace styling without colour

**Markdown rendering in chat:**
- Bold, italic, headers, bullet lists rendered with proper ratatui `Style`
- Links shown as underlined + blue
- Tables rendered with box-drawing characters
- This alone makes the chat output dramatically more readable

**Responsive layout:**
- < 80 cols: compact mode — no sidebar, abbreviated status bar, single-line tabs
- 80–120 cols: standard mode — current layout + tabs
- > 120 cols: full mode — sidebar visible by default, expanded stats

**Theme support (config-driven):**
- `theme = "dark"` (default), `"light"`, `"monokai"`, `"solarized"`
- Defined as named colour palettes in config — simple to add community themes later
- `[theme.colors]` table in config for per-element customisation

### 6p-viii. Ratatui feasibility notes

All of this is achievable with ratatui's built-in widget set:

| Feature | Ratatui widget/approach |
|---|---|
| Tab bar | `Tabs` widget (built-in) |
| Sidebar | `Layout::Horizontal` split |
| Diff viewer | `Paragraph` with styled `Span`s per line |
| Bar charts | `Paragraph` with Unicode block chars, or `BarChart` widget |
| Scrollable lists | `List` with `ListState` scroll tracking |
| Collapsible sections | Custom `StatefulWidget` tracking expanded state |
| Syntax highlighting | `syntect` → `Style` mapping, or manual keyword colouring |
| Markdown rendering | Parse to `Vec<Line<'_>>` with styled `Span`s |
| Responsive layout | `Constraint::Percentage` + terminal size check |

The tab architecture requires restructuring `draw_ui()` in `render.rs` from a single monolithic function to a dispatcher: `match active_tab { Tab::Chat => draw_chat(f, area, state), Tab::Git => draw_git(f, area, state), ... }`. Each tab becomes its own render function in its own module under `src/tui/`.

**Proposed file structure:**
```
src/tui/
├── mod.rs          # event loop, state, tab switching
├── render.rs       # top-level draw dispatcher, tab bar, status bar
├── chat.rs         # chat view (most of current render.rs moves here)
├── sidebar.rs      # session sidebar
├── git_view.rs     # git tab — diff viewer, checkpoint list
├── config_view.rs  # config tab — profile/hooks/MCP display
├── stats_view.rs   # stats tab — telemetry dashboard
├── plan_view.rs    # plan tab — step list, live status
├── markdown.rs     # markdown → ratatui Span/Line converter
└── theme.rs        # colour palette definitions
```

**GIT WARNING**
Git integration complexity. 6m is marked ESSENTIAL and it is, but git is a minefield. Dirty working trees, detached HEAD, submodules, shallow clones, worktrees, repos with 100k+ files. The "works automatically if in a git repo, skips silently if not" design is correct, but the edge cases will take real-world testing to flush out. Keep the initial implementation conservative — checkpoint via commit on a temp branch is safer than stash (stash has more failure modes).

### Check in with token usage - we are aiming to lead the market in efficiency
System prompt size. You're now injecting: conventions, session context, step carry-forward summaries, git status, change-impact warnings, hook descriptions, and tool schemas. On a 32k local model, that preamble could consume 20-30% of the window before the user even types. You may need a preamble budget that mirrors the token budget — prioritise and compress injected context, not just conversation history.

---

## Version 1 — Publish, Validate, and Gate Phase 7

> **This is the quality gate.** Phase 7 does not start until every benchmark category below passes. The goal is publishable evidence that PareCode's efficiency claims are real, and a regression baseline that protects them going forward.

**Prerequisites before starting validation:**
- Phase 6b (distribution / cargo-dist) complete — test on a clean install, not a dev build
- Phase 6c (first-run wizard) complete — test the real new-user flow, not a hand-configured setup
- All 6a–6o (ideally some of the good parts of 6P) phases building and shipping in the release binary - COMPLETE

**Metrics to record for every test run** (telemetry captures most of this automatically in `.parecode/telemetry.jsonl`):

| Metric | How to get it |
|---|---|
| Input tokens | `-v` flag or telemetry stats bar |
| Output tokens | same |
| Tool calls | telemetry `tool_calls` field |
| Wall time | telemetry `duration_secs` |
| Re-reads | count `read_file` calls on already-seen paths |
| Loops | count repeated `(tool, args)` pairs |
| Success | did the task complete correctly with no user intervention? |

Save the telemetry snapshot after each run. These become the regression baseline — any Phase 7 change that regresses these numbers by >10% is a blocker.

---

### V1-A. Baseline: Qwen3 14B (Ollama, local)

> The hardest test. If PareCode guides a messy 14B model better than OpenCode, that's the headline claim validated.

**Setup:** `tsc --noEmit` hook auto-detected and active for TypeScript tasks. Run the same tasks in OpenCode first and record its numbers — the diff is the publishable story.

| Task | OpenCode result (record before testing PareCode) | PareCode target |
|---|---|---|
| Replace all instances of a term project-wide | Loops, re-reads, often fails | ≤ 4 tool calls, 0 re-reads, correct |
| Update HTML + SCSS: change colours, improve styling | Loses context mid-task, wrong file edits | Completes in ≤ 6 tool calls, hook catches TSC errors |
| Angular: migrate `input` binding to `@input()` decorator | Classic OpenCode death — loops on search | ≤ 5 tool calls, uses search to verify no instances remain |

For each task record the full metric set above. The `tsc --noEmit` hook injection is the key thing to observe — does the model read the error output and self-correct in the same loop without a re-read?

---

### V1-B. Hooks self-correction validation (Claude Sonnet)

> This is the money shot for the hooks system. A capable model that reads `⚙ cargo check -q (exit 1): error[E0308]…` and self-corrects in the same tool loop — no extra read_file round-trip — is the proof that on_edit injection works as designed.

**Setup:** Claude Sonnet profile with `cargo check -q` hook (PareCode Rust codebase, or any real Rust project).

| Test | What to observe |
|---|---|
| Make a deliberate type error, ask PareCode to add a function | Does Claude see the hook output and fix the error without re-reading? |
| Multi-step plan on a real feature | Do all steps pass verification? Do step carry-forward summaries give Claude correct context? |
| Edit a file that has shifted since last read | Does the hash anchor mismatch fire? Does Claude re-read and retry correctly? |
| Compare token count: PareCode+Claude vs Claude Code on same task | Record both. This is the efficiency headline. |

Hash-anchored edits (Phase 6g) are specifically worth testing here — Claude will actually use the optional `anchor` parameter, Qwen 14B likely ignores it.

---

### V1-C. Cloud mid-range: Qwen3-Coder 72B (OpenRouter)

> The realistic ceiling for users who want local-model quality without Anthropic pricing. If PareCode makes 72B usable for complex multi-file tasks, that's a strong story for the cost-conscious segment.

**Setup:** OpenRouter profile. Tests validate that lean schemas and context management work across provider backends — OpenRouter wraps the API differently from Ollama.

| Test | Target |
|---|---|
| Same Angular migration task as V1-A | Compare tool call count and success rate vs Qwen3 14B. Expect meaningful improvement. |
| Multi-file refactor (rename a type used across 5+ files) | Should complete with plan mode. Record step count and carry-forward summary accuracy. |
| Schema compatibility | Confirm all tools dispatch correctly — OpenRouter backends sometimes reject strict schemas |

---

### V1-D. MCP integration (Claude Sonnet + web search)

> MCP is not validated by unit tests. The interesting failure mode is the model hitting a knowledge boundary mid-task and either not reaching for web search, or using it incorrectly. This must work cleanly before Phase 7 adds more complexity on top.

**Setup:** Claude Sonnet profile with `brave` or `fetch` MCP server configured.

| Test | What to validate |
|---|---|
| "Update this library to use the v4 API" (where v4 released after training cutoff) | Does Claude autonomously call web search? Does it use the result to inform the edit? |
| Multi-step plan where one step requires fetching a doc | Does MCP dispatch work correctly inside plan step context isolation? |
| Two MCP servers active simultaneously | No cross-contamination, both tools visible in tool list |
| MCP server that fails to start | Silently skipped, rest of session unaffected |

The key signal: web search should feel like a natural tool call, not a special case. If the model hesitates or fails to use it when it clearly should, that's a system prompt or tool schema issue to fix before Phase 7.

---

### V1-E. Regression baseline

After V1-A through V1-D pass:

1. **Save telemetry snapshots** — copy `.parecode/telemetry.jsonl` to `benchmarks/v1-baseline-{model}.jsonl` for each model tested
2. **Document the passing task set** — these become the fixed regression suite; any future change that causes a previously-passing task to fail or regress by >10% in tokens/tool-calls is a blocker before merge
3. **Publish results** — the token efficiency comparison (PareCode vs OpenCode on the same tasks) is the viral moment. Even a blog post or README table is enough for early traction.

**Phase 7 is gated on:** all four test categories above showing clean results, regression baseline saved, and at least the Qwen3 14B + Claude Sonnet comparisons documented.

---

## Phase 7 — Advanced Orchestration

### 7a. Automatic model routing by category

Extend `planner_model` into a full `model_routes` table. Tasks and plan steps declare a category; the harness picks the right model automatically.

**Categories:**
| Category | Profile model example | When used |
|---|---|---|
| `deep` | `claude-opus-4-6` | Complex multi-file refactors, architecture decisions |
| `standard` | `claude-sonnet-4-6` | Default — most coding tasks |
| `quick` | `claude-haiku-4-5-20251001` | Single-file edits, quick queries |
| `search` | cheapest available | Web search, grep, read-only research |

**Config:**
```toml
[profiles.claude.model_routes]
deep     = "claude-opus-4-6"
standard = "claude-sonnet-4-6"
quick    = "claude-haiku-4-5-20251001"
search   = "claude-haiku-4-5-20251001"
```

**Integration with plan steps:**
- Plan generation adds a `category` field to each step based on instruction complexity
- Agent loop selects model per step rather than once per session
- Quick mode auto-routes to `quick` category

### 7b. Background parallel plan steps

Execute independent plan steps concurrently. Sequential by default; parallel only when steps have no file overlap.

**Dependency analysis (static, no model call):**
- Build a directed graph: step A → step B if B lists a file that A modifies
- Steps with no shared files and no dependency edge → eligible for parallel execution
- Max concurrency: configurable `parallel_steps = 3` in config (default: 1 = sequential)

**Execution:**
- `tokio::spawn` per eligible step group
- Each step gets its own `McpClient` scope (MCP connections not shared across parallel steps)
- Results collected in order; step summaries merged before next sequential step
- TUI shows parallel steps as a grouped block with individual ✓/✗ per step

**Constraints:**
- Steps that call `bash` with side effects are always sequential (conservative)
- File write conflicts → pause, surface to user for resolution
- Requires 7a (model routing) to be useful — parallel steps should use `quick`/`search` routes

### 7c. MCP skill scoping

Scope MCP servers to specific plan step categories or task keywords rather than loading all servers globally.

**Config:**
```toml
[[profiles.local.mcp_servers]]
name    = "playwright"
command = ["npx", "-y", "@playwright/mcp"]
scope   = ["visual", "frontend", "test-e2e"]   # only loaded for these categories
```

**Behaviour:**
- At plan step start: check step category against each server's `scope`
- Only matching servers included in tool list for that step
- Reduces tool list size by 60-80% for non-matching steps — keeps model focused

### 7d. Image/multimodal support

**Increasingly table-stakes.** "Fix this CSS — here's a screenshot" is a real workflow. Not critical for V1, but competitors are adding it and user expectations are shifting. Multimodal input turns PareCode from a text-only coding agent into a visual-aware development partner.

**Core capabilities:**

**7d-i. Image input in TUI:**
- Drag-and-drop or paste image into the TUI input (terminal image protocols: iTerm2 inline images, Kitty graphics protocol, Sixel)
- `@screenshot.png` file attachment — same `@` picker as text files, but detected as image by extension
- `/screenshot` command — capture the current terminal or a region and attach automatically
- Images encoded as base64 and sent via the `image_url` content block in the OpenAI-compatible API (supported by Claude, GPT-4o, Gemini, and increasingly by local multimodal models)

**7d-ii. Use cases:**
| Scenario | Value |
|---|---|
| "Fix this CSS — here's what it looks like" | Visual debugging without describing layout issues in words |
| "Implement this design" (attach mockup) | Design-to-code from a screenshot or Figma export |
| "What's wrong with this error?" (attach terminal screenshot) | Non-text error formats (stack traces with colour, GUI error dialogs) |
| "Match the style of this component" (attach reference) | Visual consistency without manual style description |

**7d-iii. Implementation:**
- `src/client.rs` — extend `MessageContent` to support `image_url` content blocks alongside text
- `src/tui/mod.rs` — image attachment via `@` picker (filter by image extensions: png, jpg, jpeg, gif, webp, svg), base64 encoding on attach
- `src/agent.rs` — pass image content blocks through to API call, strip images from context on budget compression (images are expensive — ~1k tokens per image, and stale images should be evicted first)
- `src/budget.rs` — images get a higher compression priority (evict old images before old text)
- Fallback: if the model/endpoint doesn't support vision, return a clear error: `"This model does not support image input. Switch to a vision-capable model (Claude Sonnet, GPT-4o, etc.)"`

**7d-iv. Model compatibility:**
| Model | Vision support |
|---|---|
| Claude Sonnet/Opus | ✓ |
| GPT-4o | ✓ |
| Gemini Pro/Flash | ✓ |
| Qwen-VL (local) | ✓ (Ollama) |
| Qwen3 14B (text-only) | ✗ — clear error message |
| Most local coding models | ✗ — clear error message |

---

## File Structure (target)

```
src/
├── main.rs           # clap CLI, single-shot + TUI dispatch
├── client.rs         # HTTP client, SSE streaming, tool call parsing
├── agent.rs          # agent loop, project map, conventions loading, build check
├── history.rs        # tool output compression (model vs display summaries)
├── cache.rs          # file read cache + re-read prevention
├── budget.rs         # proactive token budget, loop detection
├── sessions.rs       # session persistence, JSONL, context injection (8k cap)
├── ui.rs             # tool glyphs
├── config.rs         # profile system, config file load/write
├── mcp.rs            # MCP client — spawn servers, JSON-RPC, tool discovery + dispatch
├── index.rs          # project symbol index — fn/struct/class/impl → file path, used by plan gen
├── telemetry.rs      # SessionStats, TaskRecord, JSONL persistence
├── plan.rs           # plan data structure, step execution, step summaries
├── git.rs            # git integration — checkpoint, undo, diff, blame, co-change analysis
├── tools/
│   ├── mod.rs         # tool registry + dispatch
│   ├── read.rs        # read_file with smart excerpting + symbols=true index
│   ├── write.rs       # write_file (overwrite guard)
│   ├── edit.rs        # edit_file (fuzzy matching, ±15 line failure hint)
│   ├── bash.rs        # bash execution (async, timeout, 200-line cap)
│   ├── recall.rs      # retrieve full stored output by id or tool name
│   ├── patch.rs       # patch_file — unified diff application, fuzzy context matching
│   ├── search.rs      # ripgrep wrapper (zero-match → declare done)
│   └── list.rs        # list_files
└── tui/
    ├── mod.rs          # event loop, state, tab switching, input handling
    ├── render.rs       # top-level draw dispatcher, tab bar, status bar
    ├── chat.rs         # chat view — conversation history, streaming output
    ├── sidebar.rs      # session sidebar — grouped by date, resume on select
    ├── git_view.rs     # git tab — syntax-highlighted diff viewer, checkpoint list
    ├── config_view.rs  # config tab — profile/hooks/MCP status display
    ├── stats_view.rs   # stats tab — telemetry dashboard, bar charts, cost tracking
    ├── plan_view.rs    # plan tab — step list, live status, carry-forward summaries
    ├── markdown.rs     # markdown → ratatui Span/Line converter
    └── theme.rs        # colour palette definitions, theme switching
```