# PareCode — Implementation Plan
> Build a Rust CLI coding agent that matches OpenCode's baseline, then beats it on token efficiency and small-model reliability. Hyper-optimised orchestration + smart deterministic programming where a model call would be wasteful.
---
## Market Position
**The core bet:** context efficiency is the hard problem. Features are plumbing. A model drowning in 60k tokens of accumulated history fails. A model given 8k tokens of clean, relevant context succeeds — and on a 14B local model, this is the difference between working and not working.
**Why this wins:**
| Dimension | OpenCode / Cursor / Claude Code | PareCode |
|---|---|---|
| Token usage per task | 20k–60k (reactive compression, full file reads) | 3k–12k (proactive, compressed from the start) |
| Local model support | Broken on most OSS backends (Zod schemas, context bloat) | First-class — designed for Qwen3 14B, Ollama |
| Plan/execute isolation | Plans in conversation — model loses thread by step 3 | Each step: fresh context, bounded instruction, scaffold carries state |
| Loop detection | 3 identical calls before intervention | 2 calls — injects cached result immediately |
| Cost | Cloud API required; usage compounds | Works on free local inference; cloud optional |
| Enterprise / IP | Code leaves the building | Self-hosted, air-gapped capable |
**The efficiency story compounds over time.** As local models improve (Qwen4, etc), PareCode gets better for free. We're not locked to any provider's pricing decisions. And every token saved is real money: a team of 10 running 50 tasks/day at OpenCode's token rate vs PareCode's is hundreds of dollars a month difference.
**What's genuinely novel:**
- Plan/execute separation where the scaffold owns state and the model only sees one bounded step at a time. No other agent does this.
- Tool output compression that is deterministic and immediate, not a reactive LLM call at 90% capacity.
- Per-step file symbol summaries carried forward between steps — the model knows what changed without seeing implementation detail.
---
## Why OpenCode Falls Over (Validated Against Their Codebase)
| Failure | Impact |
|---|---|
| System prompt bloat (one user hit 217,905 tokens) | Entire context consumed before conversation starts |
| Full file reads (up to 50KB per read) | Most content irrelevant, wastes model attention |
| Glob returns 100K+ tokens per call | Known issue, unfixed |
| Tool outputs never compressed mid-session | History balloons; blunt compaction fires at 90% |
| Compaction is reactive LLM call | Costs tokens to save tokens |
| Doom loop detection fires at 3 identical calls | Already wasted 3 tool round-trips |
| Zod schemas break on OSS backends (SGLang, K2.5) | Tools literally don't work on many local models |
| No per-step context isolation | Small models lose the plan by step 3 |
| Hidden cheap-model calls (Haiku) | Unexpected cost accumulation |
| No conversation persistence | Can't resume, roll back, or compare sessions |
---
## ✅ Phase 1 — Match OpenCode — COMPLETE
**`src/client.rs`** — Ollama/OpenAI-compatible HTTP client
- POST to `/v1/chat/completions` with streaming SSE
- Parse streamed tool call deltas into complete tool calls
- `stream_options: {include_usage: true}` for Ollama token counts
- Config: endpoint URL + model from `~/.config/parecode/config.toml`
**`src/tools/`** — Core tool set with lean handwritten JSON schemas
- `read_file`, `write_file`, `edit_file`, `bash`, `search`, `list_files`
- All schemas minimal — work correctly on Qwen3 14B, Ollama backends
**`src/agent.rs`** — Agent loop with streaming output
**`src/main.rs`** — CLI via `clap` — `parecode "task"`, `--dry-run`, `-v`, `--profile`, `--init`, `--profiles`
---
## ✅ Phase 2 — Easy Wins That Beat OpenCode — COMPLETE
### ✅ 2a. Tool Output Compression (`src/history.rs`)
- `read_file` content kept full in model context (needed for editing)
- Separate `display_summary` (one-liner) shown in TUI sidebar
- Budget enforcer compresses older read results when threshold hit
- On `edit_file` failure: file content injected into error response so model can self-correct without re-reading
### ✅ 2b. File Read Cache (`src/cache.rs`)
- All reads cached; cache-hit returns content instantly with age note
- Invalidated on write/edit
### ✅ 2c. Proactive Token Budget (`src/budget.rs`)
- Enforced before every API call (not reactive at 90%)
- Pass 1: compress older tool results, leave most recent intact
- Pass 2: trim oldest turns (protects index 0 — original task)
- Loop detection fires at 2 identical calls (vs OpenCode's 3)
### ✅ 2d. Smart File Excerpting (`src/tools/read.rs`)
- Max 150 lines by default; explicit `line_range` for full access
- `symbols=true` mode returns function/struct/class index with line numbers — lets model navigate large files without reading them
### ✅ 2e. Lean Tool Schemas
- Handwritten, minimal — no Zod, no extra metadata
### ✅ Additional: Ratatui TUI (`src/tui/`)
- Full alternate-screen TUI with conversation history, status bar, input
- Context % and token count in status bar
- `@` file picker overlay (fuzzy search)
- **Attached files panel** — `@` adds file as a pinned chip above input; content injected as preamble in every agent call; protected from budget eviction; Tab/Del to manage chips
- Ctrl+P command palette (`/cd`, `/profile`, `/profiles`, `/clear`, `/ts`, `/quit`)
- Agent cancellation (Ctrl+C)
- Conventions loading: auto-discovers `AGENTS.md` / `CLAUDE.md` / `.parecode/conventions.md`
### Observed results vs OpenCode
- ~2.3k tokens for a file analysis task that cost OpenCode 20k+ tokens
- ~443 tokens for a simple query (OpenCode spikes to 10k immediately)
- Model successfully self-corrects edit_file failures without re-reading
- Attached files prevent the "context forget" that caused OpenCode to loop
---
## ✅ Phase 3 — Multi-Turn Conversation Persistence — COMPLETE
### ✅ 3a. In-session conversation history (`src/sessions.rs`)
- `Vec<ConversationTurn>` in `AppState` accumulates across agent runs
- Each turn: user message, agent response text, tool summary
- Prior context injected as preamble on each new run (8k token cap — ~25% of a 32k window)
- Short reply hint: model told "yes/ok/go ahead" are responses to the previous message
### ✅ 3b. Persistent conversation storage
- JSONL files in `~/.local/share/parecode/sessions/{ts}_{basename}.jsonl`
- Auto-resumed on startup for the matching cwd
### ✅ 3c. Session management
- `/sessions`, `/resume [n]`, `/rollback [n]`, `/new` slash commands
- `Ctrl+H` session browser overlay — date, project, turn count, first message preview
- Status bar indicator: `◈ N↩` shows active turn count and resumed state
### ✅ 3d. Rollback
- Active turn pointer — rolling back branches without deleting archived turns
---
## ✅ Phase 4 — Plan/Execute Mode — COMPLETE
**The core architectural differentiator.** Plan is a data structure owned by the scaffold. Each step gets fresh, minimal context. The model only ever sees the current step. The scaffold carries all state.
### ✅ Plan data structure (`src/plan.rs`)
- `Plan { task, steps, current, status, created_at, project }`
- `PlanStep { description, instruction, files, verify, status, tool_budget, user_annotation, completed_summary }`
- `Verification`: None | FileChanged | PatternAbsent | CommandSuccess | BuildSuccess
### ✅ Per-step context isolation
- Fresh `messages` vec per step — zero bleed from previous steps
- Only `step.files` loaded as attached context
- Single bounded instruction to model
### ✅ Step carry-forward summaries
- After each step passes, `summarise_completed_step()` scans modified files deterministically
- Extracts top symbols (fn/struct/class/def) from recently modified files
- Result: `"modified src/auth.rs [validate_token, AuthError]; modified src/handler.rs [handle_request]"`
- Injected into next step's preamble — model knows exact function names without seeing implementation
- Zero model calls, ~5 lines of context per completed step
### ✅ TUI plan review
- `/plan "task"` — generate plan, enter inline review mode
- `↑↓` navigate steps, `e` annotate, `a` approve, `Esc` cancel
- Annotations injected as `"\n\nUser note: {}"` into the step instruction
- All steps must be individually approved before execution begins
- Per-step ✓/✗ shown in conversation history during execution
### ✅ Plan persistence
- Plans saved to `.parecode/plans/{timestamp}-plan.json` (JSON, machine-readable)
- Plans written to `.parecode/plan.md` (Markdown, human-readable — open in editor while plan runs)
- Failed plans paused at the failing step, resumable
### ✅ Plan UX polish
- Overlay closes immediately on Enter confirm — mode transitions to `PlanRunning` synchronously, no async lag
- Planning message shows which model is thinking when `planner_model` is configured: `⟳ planning via claude-opus-4-6: task`
---
## ✅ Phase 5 — Agent Reliability — COMPLETE
### ✅ 5a. `recall` tool
- Schema: `{ tool_call_id?, tool_name? }` — either works
- Handled before dispatch in `agent.rs` — not recorded in history (prevents recursion)
- `recall_by_name()` fallback for local models that don't echo IDs reliably
### ✅ 5b. Bash timeout (async)
- `tokio::process::Command` + `tokio::time::timeout`
- `execute_tool` is now `async fn`
- `MAX_OUTPUT_LINES` = 200
### ✅ 5c. Smart bash summarisation
- Error-line aware: keeps `error:`, `FAILED`, `panic` lines (up to 20)
- Build check failures pass through history compression unchanged
- Build check success prompts model to verify via search before declaring done
### ✅ 5d. Fuzzy `edit_file` matching
- CRLF → LF → per-line trim() → per-line trim_end() cascade
- Only applies if exactly one candidate found
- On failure: ±15 line context hint instead of full file dump
### ✅ 5e. `write_file` existence guard
- `overwrite: bool` required to replace existing files
- Prevents silent overwrites by local models that don't track what exists
### ✅ 5f. Token counting fix
- `s.chars().count() / 4` — correct for multi-byte Unicode
- Prevents premature compression on non-ASCII codebases
### ✅ 5g. Unicode panic fix
- `format_args_summary` now uses `.chars().take(N).collect()` not `&s[..N]`
- Prevents panic on multi-byte chars in tool arg display (∑, Chinese, emoji)
### ✅ 5h. System prompt hardening
- "Do not ask permission mid-task — make necessary changes and report what you did"
- "For replacement tasks, search to confirm no instances remain before declaring done"
- "Do not re-read files already read this session"
- Auto build-check after every file mutation (`cargo check -q` / `tsc --noEmit`)
---
## ✅ Phase 5i — Sub-agent model split — COMPLETE
`planner_model` config field per profile:
- If set, plan generation uses `planner_model`; step execution uses `model`
- Enables Opus plan + Haiku execute — high reasoning where it counts (planning), cheap tokens where they're plentiful (execution)
- Planning is ~1–2k tokens; execution is 10–40k. The split is economically significant.
- Falls back to `model` if `planner_model` not set — zero behaviour change for existing configs
- See `CONFIG.md` for full examples
---
## ✅ Phase 6a — MCP Client — COMPLETE
Full Model Context Protocol client (`src/mcp.rs`):
- Spawns any MCP server process (Node/Python/binary) configured per-profile
- JSON-RPC 2.0 over stdin/stdout with proper `initialize` / `notifications/initialized` handshake
- Dynamic tool discovery via `tools/list` — tools appear as `<server>.<tool>` (e.g. `brave.brave_web_search`)
- Dispatched transparently alongside native tools — model sees one unified tool list
- Multiple servers per profile, all running concurrently
- Silently skips servers that fail to start (logs to stderr)
- Config in `config.toml` per-profile:
```toml
[[profiles.local.mcp_servers]]
name = "brave"
command = ["npx", "-y", "@modelcontextprotocol/server-brave-search"]
[profiles.local.mcp_servers.env]
BRAVE_API_KEY = "BSA..."
```
- Commented examples in default config: Brave Search, filesystem, fetch (`uvx mcp-server-fetch`)
---
## Phase 6b — Distribution
The Rust binary is PareCode's biggest distribution advantage. Every competitor requires a language runtime: OpenCode and Claude Code need Node.js, Aider needs Python, oh-my-opencode needs both. PareCode is a single static binary — zero dependencies, starts in <10ms. The goal: install to productive in under 60 seconds, better than any competitor.
### 6b-i. Binary releases with cargo-dist SECOND NEXT - TEST MYSELF - install setup, qwen scenarios, then Claude
**cargo-dist** automates the entire release pipeline from a single `dist init`. On every version tag push, GitHub Actions builds all targets, produces platform installers, updates the Homebrew tap, and creates the GitHub Release — zero manual steps.
**Target matrix:**
| Target | Platform | Notes |
|---|---|---|
| `x86_64-unknown-linux-musl` | Linux x86_64 | Statically linked — works on any Linux, any glibc version |
| `aarch64-unknown-linux-musl` | Linux ARM64 | AWS Graviton, Raspberry Pi, ARM servers |
| `x86_64-apple-darwin` | macOS Intel | Older Macs |
| `aarch64-apple-darwin` | macOS Apple Silicon | M1/M2/M3 — now majority of Macs |
| `x86_64-pc-windows-msvc` | Windows x86_64 | Primary Windows target |
**musl is non-negotiable for Linux.** Statically linked = no "error while loading shared libraries" ever. This eliminates the most common class of post-install failures on Linux.
**Cargo.toml / dist.toml configuration:**
```toml
[workspace.metadata.dist]
cargo-dist-version = "0.30.4"
ci = ["github"]
installers = ["shell", "powershell", "homebrew"]
tap = "PartTimer1996/homebrew-parecode"
targets = [
"x86_64-unknown-linux-musl",
"aarch64-unknown-linux-musl",
"x86_64-apple-darwin",
"aarch64-apple-darwin",
"x86_64-pc-windows-msvc",
]
publish-jobs = ["homebrew"]
[profile.dist]
inherits = "release"
lto = "thin"
```
**Release process:** `git tag v0.1.0 && git push --tags` — that's it.
**What cargo-dist produces automatically:**
- GitHub Release with 5 platform binaries + SHA256 checksums for each
- Shell installer script (`parecode-installer.sh`) with checksum validation
- PowerShell installer script (`parecode-installer.ps1`) for Windows
- Homebrew formula pushed to `PartTimer1996/homebrew-parecode` tap
### 6b-ii. Install methods (README-ready)
```bash
# macOS / Linux — one-liner, zero dependencies
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/PartTimer1996/parecode/releases/latest/download/parecode-installer.sh | sh
# macOS — Homebrew
brew install PartTimer1996/parecode/parecode
# Windows — PowerShell
irm https://github.com/PartTimer1996/parecode/releases/latest/download/parecode-installer.ps1 | iex
```
**Competitive install comparison:**
| Tool | Install command | Requires |
|---|---|---|
| **PareCode** | `curl ... \| sh` | Nothing |
| OpenCode | `npm install -g opencode` | Node.js |
| oh-my-opencode | npm + manual agent config | Node.js + setup time |
| Claude Code | `npm install -g @anthropic-ai/claude-code` | Node.js |
| Aider | `pip install aider-chat` | Python |
| Plandex | `curl ... \| bash` | Nothing (also compiled binary) |
PareCode and Plandex are the only zero-dependency installs in the category.
### 6b-iii. Distribution channel rollout
**Week 1 (ship with first release):**
- GitHub Releases (cargo-dist, automated)
- Shell installer (cargo-dist, automated)
- Homebrew tap (cargo-dist, automated)
**Week 2:**
- **AUR** (`parecode-bin`) — binary PKGBUILD, targets Arch Linux developers. Highly technical early-adopter audience. Minimal maintenance: update `pkgver` + `sha256sums` on each release.
- **WinGet** — pre-installed on Windows 11. `wingetcreate new <release-url>` generates the manifest; `vedantmgoyal9/winget-releaser` GitHub Action automates future updates.
- **Shell completions** — generate for bash/zsh/fish via clap's `generate` feature. Included in the tarball, install instructions in README. Makes PareCode feel native.
**Later (when users ask):**
- `flake.nix` for Nix users — provide in repo, they can `nix profile install github:PartTimer1996/parecode`
- nixpkgs submission — often happens organically when the tool gains traction
- deb/rpm — only worth building if significant Ubuntu/Fedora user base requests it
**Do not bother:**
- Snap (sandboxing breaks tool, wrong audience)
- Flatpak (designed for GUI apps)
- Docker (not a server application)
- npm/pip wrappers (adds maintenance surface for marginal gain)
### 6b-iv. `parecode update` self-upgrade command
curl-installed users have no package manager to update through. `parecode update` re-runs the install script against latest, replaces the binary in-place.
```
$ parecode update
Checking for updates... parecode 0.1.0 → 0.2.1 available
Downloading parecode 0.2.1 for aarch64-apple-darwin... ✓
Verifying checksum... ✓
Replacing /home/user/.local/bin/parecode... ✓
parecode 0.2.1 installed.
```
Implementation: `src/main.rs` — `--update` subcommand, fetches GitHub API `/releases/latest`, compares version, re-runs platform-specific installer script.
### 6b-v. Benchmarking suite
Run on the tasks that caused Qwen3 14B to loop in OpenCode. Record token counts, tool calls, success rate, wall time. Publish results — this is the "viral moment" that proves the token efficiency claim.
| Task | Target |
|---|---|
| `"remove all console.log from src/"` | ≤ 5 tool calls, < 5k tokens |
| `"rename columns → allColumns in data-table.component.ts"` | No re-reads, clean 1-shot |
| `"reorganise SCSS in header.component.scss"` | < 3k tokens |
Model matrix: Qwen3 14B (Ollama), Mistral 7B, DeepSeek-Coder, Claude Sonnet (API). Publish side-by-side with OpenCode numbers.
### 6b-vi. Expose PareCode as an MCP server (`--mcp` flag)
- JSON-RPC over stdin/stdout, `--mcp` flag
- Makes PareCode usable as a backend from any MCP-compatible IDE (Cursor, Zed, etc.)
- Reuses all existing tool infrastructure
### 6b-vii. VSCode extension (trivial packaging, large surface area)
- `package.json` + launch PareCode subprocess + pipe events to webview
- Reuses all existing TUI event infrastructure
- Gives access to VSCode's file tree, git integration, diff viewer
---
## Phase 6c DONE! — First-Run Experience (install → productive in 60 seconds)
**The target flow:**
```
install → parecode → interactive setup → working
```
**Nobody's current flow:**
```
install → run → error: no config → read docs → create config → run again → maybe works
```
PareCode should be the tool that just works.
### 6c-i. First-run detection and setup wizard
When `parecode` is launched with no config file present, run an interactive setup wizard instead of erroring:
```
Welcome to PareCode ⚒
No config found at ~/.config/parecode/config.toml. Let's get you set up.
? How do you want to run PareCode?
❯ Local (Ollama) — free, private, works offline
Anthropic Claude — best quality, requires API key
OpenAI — GPT-4o, requires API key
OpenRouter — any model, one API key
Skip — I'll configure manually
[If Ollama selected — after silently probing localhost:11434]
Checking for Ollama... ✓ found (3 models installed)
? Which model?
❯ qwen3:14b (recommended for coding tasks)
qwen2.5-coder:14b
llama3.1:8b
Config written to ~/.config/parecode/config.toml ✓
Running /init to detect project context... ✓ written to .parecode/conventions.md
Ready. What would you like to build?
▶
```
**Auto-detection shortcuts (skip the wizard entirely):**
- If `ANTHROPIC_API_KEY` env var present → auto-configure Claude profile, skip wizard
- If `OPENAI_API_KEY` env var present → auto-configure OpenAI profile, skip wizard
- If Ollama responds at `localhost:11434` with models → default to local, only ask which model
- If only one model installed → skip even that question, just use it
**Implementation:**
- `src/setup.rs` — `run_setup_wizard() -> ResolvedConfig` — terminal prompts (no TUI, runs before TUI starts)
- `src/main.rs` — check `config_path().exists()` before launching TUI; if missing, run wizard first
- Wizard uses `dialoguer` crate for interactive prompts (or hand-rolled crossterm prompts to avoid extra dependency)
### 6c-ii. Ollama auto-detection
On every startup (not just first run), silently probe `localhost:11434/api/tags` (100ms timeout). If Ollama is running:
- Show `◉ Ollama` indicator in TUI status bar when using local profile
- If user is on a cloud profile but Ollama is also running: show soft hint `◉ Local models available — /profile local to switch`
- On first run: Ollama presence triggers local-first default in the wizard
### 6c-iii. `/init` auto-prompt on new project
On first `parecode` launch in a directory with no `.parecode/` folder:
```
No project conventions found.
Run /init to prime PareCode with your project's stack and style? [Y/n]
```
If Y: runs `/init` inline (see Phase 6i), shows result, asks to save. If N: continues normally, can run `/init` later.
### 6c-iv. `parecode update` and version awareness
Status bar shows version and available update indicator:
```
parecode 0.1.0 · new version 0.2.1 available — run `parecode update`
```
Checked once per session against GitHub API (cached for 24h in `~/.local/share/parecode/update-check`). Never blocks startup.
### 6c-v. Shell completion install hint
On first run after install, if completions aren't installed:
```
Tip: install shell completions for tab-completion of commands and flags:
parecode --completions zsh > ~/.zfunc/_parecode # zsh
parecode --completions bash > ~/.bash_completion.d/parecode # bash
parecode --completions fish > ~/.config/fish/completions/parecode.fish # fish
```
Shown once, suppressed after. Completions generated via clap's `generate` feature, shipped in release tarballs.
### ✅ 6d. Smarter file selection — COMPLETE
`src/index.rs` — project symbol index, built on every `/plan` invocation (zero model calls):
- Walks project files (Rust, TS/JS, Python, Go, C/C++), extracts top-level symbols: `fn`, `struct`, `enum`, `trait`, `impl`, `class`, `def`, `func`, `const`
- Caps at 500 files, < 100ms, pure regex/text scan
- Injected into plan prompt as a compact file map — model sees real symbol names and paths, not a directory listing
- Post-parse resolution: `files: ["validate_token"]` → scaffold resolves to `src/middleware/jwt.rs` via index
- Model names what it needs; scaffold resolves where it lives
- 7 unit tests: Rust/TS/Python extraction, symbol resolve, ident parsing
### 6e. Mechanical mode (`--mechanical`)
- Pure grep/sed for pattern tasks, zero model calls
- `parecode --mechanical "replace foo with bar in src/"` — explicit flag only, never auto-routed
- For rename/replace tasks this is 100x faster and cheaper than any model approach
### ✅ 6f. Telemetry & analytics — COMPLETE
- `src/telemetry.rs` — `SessionStats` (live) + `TaskRecord` (persisted)
- Per-task: input/output tokens, tool calls, compression ratio, model, profile
- Flushed to `.parecode/telemetry.jsonl` after every completed agent run (JSONL, appendable, aggregatable)
- **Always-visible stats bar** in TUI — second line below status bar, no toggle needed:
- `∑ N tasks X.Xktok avg Y/task Z tool calls W% compressed peak P%`
- Dimmed/purple palette so it doesn't compete with active status bar
- Budget enforcement count and peak context % tracked separately
- Foundation for a hosted dashboard / benchmarking comparisons
---
## ✅ Phase 6g — Hash-Anchored Edits (correctness) — COMPLETE
**The single biggest correctness improvement available.** Inspired by oh-my-opencode's hash-anchored edit validation, which moved task success from 6.7% → 68.3% on complex tasks. Stale-line edits — where the file has shifted since it was read — are the most common silent failure mode.
**How it works:**a
- `read_file` output annotates each line with a short content hash: `42#a3f: fn validate_token(...)`
- Hashes are compact (4–5 chars), placed at the start of the line number field — subtle, not noisy
- `edit_file` accepts an optional `anchor` hash alongside `old_str`
- Before applying: verify the hash still matches the line at the expected position
- If hash mismatch → return error: `"Anchor mismatch at line 42 — file has changed since last read. Re-read to get current hashes."`
- If no anchor provided → fall through to existing fuzzy matching (backwards compatible)
**Implementation:**
- `src/tools/read.rs` — hash generation (CRC32 or FNV-1a of the line content, base36, 4 chars)
- `src/tools/edit.rs` — anchor verification before fuzzy match
- `src/cache.rs` — cache stores hashes alongside content; invalidated on write/edit
- Hash format: `{line_num}#{hash}:` prefix — stripped before content is used
**Design constraints:**
- Hashes must be invisible to the model's reasoning (it should use them for anchoring, not describe them)
- System prompt addition: `"Each line in read_file output is prefixed {line}#{hash}: — use the hash as an anchor in edit_file calls to prevent stale-line errors"`
- Backwards compatible: anchor param is optional; existing edit calls continue to work
---
## ✅ Phase 6h — Hooks System — COMPLETE
**First-class workflow automation.** Config-driven pre/post hooks that run deterministic shell commands at key points in the agent lifecycle. The key innovation beyond a simple CI config: `on_edit` output is **injected directly into the model's tool result**, so the model sees compile/lint errors immediately and can self-correct without an extra read-file round-trip.
**Hook events:**
| Event | Trigger | Injection | Common use |
|---|---|---|---|
| `on_edit` | After any `write_file` or `edit_file` call | ✓ Injected into tool result | `cargo check -q`, `tsc --noEmit` |
| `on_task_done` | After every completed agent run | TUI only | `cargo test -q 2>&1 \| tail -5` |
| `on_plan_step_done` | After each plan step completes | TUI only | lint, format |
| `on_session_start` | TUI startup | TUI only | `git pull`, environment check |
| `on_session_end` | TUI quit | stderr only | `git status --short` |
**Auto-detection (the key UX win):**
On first run with no hooks in config, PareCode scans the project root for language markers and auto-configures sensible defaults — no manual setup required:
| Marker | `on_edit` | `on_task_done` |
|---|---|---|
| `Cargo.toml` | `cargo check -q` | `cargo test -q 2>&1 \| tail -5` |
| `tsconfig.json` | `tsc --noEmit` | — |
| `go.mod` | `go build ./...` | — |
| `pyproject.toml` / `setup.py` + ruff in PATH | `ruff check .` | — |
Detection runs **once** then writes a `[profiles.{name}.hooks]` section into `~/.config/parecode/config.toml` (append-only, preserving all comments). The written block includes active detected commands plus all 5 event types commented out as examples — so users can see and edit every option. Subsequent startups read from config; detection never repeats.
**Config (per-profile):**
```toml
[profiles.local.hooks]
on_edit = ["cargo check -q"]
on_task_done = ["cargo test -q 2>&1 | tail -5"]
# on_plan_step_done = []
# on_session_start = []
# on_session_end = []
```
Set `hooks_disabled = true` in a profile to permanently suppress all hooks including auto-detected ones.
**UX behaviour:**
- Startup: `⚙ hooks on_edit: cargo check -q · on_task_done: cargo test -q … (/list-hooks for details)` shown as a system message so hooks are never invisible
- `on_edit` output appended inline to the model's tool result — model sees `⚙ \`cargo check -q\` (exit 1): error[E0308]: …` and self-corrects immediately
- Hook output rendered in TUI as dimmed `⚙` block; amber on non-zero exit
- 30s timeout per hook; 50-line output cap to avoid context bloat
- `/hooks on|off` — per-session toggle (survives across tasks within a session)
- `/hooks` alone shows current status and usage hint
- `/list-hooks` — full breakdown of all 5 event types with their commands, toggle state, and profile-level disabled status; includes config file edit hint
- `hooks_disabled = true` in profile → permanent kill switch, overrides `/hooks on`
**Implementation:**
- `src/hooks.rs` — `HookConfig { on_edit, on_task_done, on_plan_step_done, on_session_start, on_session_end }`, `HookResult { output, exit_code }`, `detect_language_hooks()`, `write_hooks_to_config(profile_name)`, `run_hook(cmd) -> HookResult`; `HookConfig::summary()` (one-liner for startup), `HookConfig::detail()` (multi-line for `/list-hooks`)
- `src/config.rs` — `hooks: HookConfig` and `hooks_disabled: bool` added to `Profile` and `ResolvedConfig`, both `#[serde(default)]` for backwards compatibility
- `src/agent.rs` — `AgentConfig { hooks: Arc<HookConfig>, hooks_enabled: bool }`; after each successful mutating tool call, runs `on_edit` hooks and appends output to `result_content`; after the main loop runs `on_task_done` hooks (TUI display only)
- `src/tui/mod.rs` — `UiEvent::HookOutput { event, output, exit_code }`, `ConversationEntry::HookOutput { event, output, success }`, `AppState.hooks_enabled`; hook bootstrap in `event_loop` (calls `write_hooks_to_config`, updates `resolved.hooks` in-place); `resolve_hooks()` helper gates on `hooks_enabled`/`hooks_disabled`; `on_session_start` hooks fire as `tokio::spawn` after `ui_tx` created; `on_session_end` hooks run synchronously before returning; `on_plan_step_done` hooks fire in `launch_plan` after each passing step
- `src/tui/render.rs` — `ConversationEntry::HookOutput` rendered as dimmed `⚙ on_edit ✓` / amber `⚙ on_edit ✗` with up to 10 lines of output
---
## ✅ Phase 6i — `/init` Command — COMPLETE
**One-shot project context priming.** Walks the project and auto-generates `.parecode/conventions.md` from existing project files. Eliminates manual conventions setup for new projects.
**Sources (in priority order):**
1. `README.md` — first 50 lines (project description, stack, install)
2. `Cargo.toml` / `package.json` / `pyproject.toml` / `go.mod` — name, language, key dependencies
3. `AGENTS.md` / `CLAUDE.md` — if already exists, merge rather than overwrite
4. `.eslintrc` / `rustfmt.toml` / `pyproject.toml [tool.ruff]` — style rules detected
5. Test directory structure — infer test runner from `jest.config`, `pytest.ini`, `#[cfg(test)]`
**Output format (`.parecode/conventions.md`):**
```markdown
# Project: my-app
Language: TypeScript (Bun runtime)
Test runner: `bun test` — tests in `src/__tests__/`
Lint: `eslint src/` — run after edits
Key dependencies: React 19, Drizzle ORM, Hono
## Style
- Prefer `const` over `let`
- No default exports
- Zod for all external input validation
```
**TUI integration:**
- `/init` slash command — runs inline, shows progress, opens result in pager overlay for review/edit before saving
- On first `parecode` run in a new directory (no `.parecode/` present): prompt "No conventions found. Run `/init` to prime project context? [y/N]"
- `parecode --init` CLI flag (already exists for config) — extend to also run project init if in a project directory
**Implementation:**
- `src/init.rs` — `run_project_init(cwd) -> String` — pure text extraction, no model calls
- `src/tui/mod.rs` — `/init` command handler, first-run prompt
---
## ✅ Phase 6j — Cost Estimation in Plan Overlay — COMPLETE
**Pre-task cost transparency.** Before running a plan, show an estimated token cost and (optionally) API cost. Nobody does this. Users burned $638+ in 6 weeks on AI agents without forewarning.
**Estimation method (no model call, heuristic):**
- Per step: `base_tokens (500) + sum(file_sizes_in_step / 4) + instruction_len / 4`
- Total: `sum(step_estimates) × 1.3` (overhead factor for tool results and responses)
- API cost: `total_tokens × rate_per_token` — rates configured per-profile, or use known defaults (Haiku: $0.25/Mtok input)
**Plan overlay addition:**
```
┌─ Plan: add JWT authentication ────────────────────────┐
│ 4 steps · est. 12k–18k tokens · ~$0.004 at Haiku │
│ │
│ ▶ Step 1: Add JWT dependency to Cargo.toml │
│ Step 2: Implement token validation middleware │
│ ... │
```
**Config:**
```toml
[profiles.claude]
cost_per_mtok_input = 0.25 # optional, enables cost display
cost_per_mtok_output = 1.25
```
**Implementation:**
- `src/plan.rs` — `estimate_plan_cost(plan, index) -> CostEstimate { tokens_low, tokens_high, usd }`
- `src/tui/render.rs` — add estimate row to plan overlay header
- `src/config.rs` — `cost_per_mtok_input/output` optional fields on `Profile`
---
## ✅ Phase 6k — Quick Mode / Tiered Autonomy — COMPLETE
**Right-sized agent for right-sized tasks.** The full agent loop (plan → load context → multi-turn tool loop → verify) is overkill for a one-line fix. Quick mode skips the overhead entirely.
**Trigger:**
- `parecode --quick "task"` — explicit flag
- Auto-detect heuristic (opt-in via config `auto_quick = true`): task < 20 words, no file `@` attachments, no `/plan` prefix → quick mode
- `/quick "task"` in TUI
**Quick mode behaviour:**
- Single API call — no multi-turn loop
- No plan generation, no step isolation
- Context: system prompt + task only (no file loading, no session history)
- Tools available: `edit_file`, `bash` (read-only commands only), `search`
- Max 1 tool call before returning to user
- Token target: < 2k tokens total
- TUI: shows `⚡ quick` badge in status bar instead of spinner
**When NOT to use quick mode:**
- Task contains words like "refactor", "add feature", "implement", "plan" → warn and suggest normal mode
- Task references multiple files → warn
**Implementation:**
- `src/agent.rs` — `run_quick(task, config) -> AgentResult` — simplified single-shot path
- `src/main.rs` — `--quick` flag, auto-detect logic
- `src/tui/mod.rs` — `/quick` command, badge in status bar
## Phase 6l - DONE
Simple for / autocomplete show options, similar to @, simple yet massive for UX
---
## ✅ Phase 6m — Git Integration — COMPLETE
**Every competitor has git integration.** Aider's entire edit model is built on git diffs. Claude Code auto-commits. OpenCode has git tools. For a tool that modifies files, not having automatic checkpoints is a safety gap users will notice immediately — one bad edit with no easy undo and you've lost a user forever.
### ✅ 6m-i. Auto-checkpoint before tasks
- Before every agent run, `git add -A && git commit --no-verify -m "parecode: checkpoint before \"<task>\""` if tree is dirty
- Clean tree → record HEAD hash as checkpoint (zero cost, no commit created)
- `--no-verify` bypasses user pre-commit hooks — checkpoints must never be blocked by lint
- Skip silently if not in a git repo
### ✅ 6m-ii. Post-task diff display
- After every completed agent run, `⎇ N files changed — press 5 to review, d to diff, /undo to revert` in chat
- `d` key from any tab opens full-screen syntax-coloured diff overlay (green/red/cyan, scroll)
- `/diff` command switches to Git tab + opens diff overlay
- **Bug fixed**: diffs compare checkpoint against working tree (`git diff <hash>`), not commit-to-commit (`git diff <hash> HEAD`)
### ✅ 6m-iii. Undo via git
- `/undo` slash command — opens interactive checkpoint picker in Git tab (↑↓ select, Enter revert, Esc cancel)
- `u` key in Git tab opens the same picker
- `UndoPicker` mode: full-area checkpoint list with hash, age, message columns; amber/orange danger palette
- Warning bar: `⚠ git reset --hard — this cannot be undone`
- After undo: clears checkpoint hash, diff content, and stat so stale data doesn't linger
### ✅ 6m-iv. Auto-commit on task success (opt-in)
- Config: `auto_commit = true` in profile (default: false)
- On successful task completion: `git add -A && git commit --no-verify -m "<prefix><task summary>"`
- `auto_commit_prefix = "parecode: "` configurable
### ✅ 6m-v. Git-aware context
- `git status --short` injected into system prompt preamble when `git_context = true` (default)
- Lightweight — model knows which files have uncommitted changes without a tool call
**Implementation:**
- `src/git.rs` — `GitRepo { root: PathBuf }`, `checkpoint()`, `undo()`, `diff_stat_from()`, `diff_full_from()`, `auto_commit()`, `status_short()`, `list_checkpoints()`, `is_git_repo(path) -> bool`
- Uses `std::process::Command` — no libgit2, keeps binary lean
- `src/tui/git_view.rs` — Git tab: checkpoint header, diff stat, undo picker overlay
- `src/tui/overlays.rs` — `draw_diff_overlay()` — full-screen syntax-coloured diff viewer
- `src/tui/mod.rs` — `/undo`, `/diff` commands, `UndoPicker` mode, `UiEvent::GitChanges/GitAutoCommit/GitError`
- `src/config.rs` — `auto_commit`, `auto_commit_prefix`, `git_context` on `Profile`
**Config:**
```toml
[profiles.local]
git_context = true # inject git status into system prompt; enables checkpoints
auto_commit = false # default — don't auto-commit
auto_commit_prefix = "parecode: " # prefix for auto-commit messages
```
---
## ✅ Phase 6n — Diff/Patch Edit Mode — COMPLETE
**More token-efficient editing for multi-hunk changes.** The current `edit_file` tool uses search-and-replace (`old_str` → `new_str`), which works well for single edits but becomes expensive for multi-hunk changes — the model must send the full old content and full new content for each hunk. A unified-diff mode sends only the changes, which aligns directly with PareCode's efficiency thesis.
**Aider proved this works.** Their unified-diff edit format reduced token usage by 30-50% on multi-hunk edits compared to search-and-replace, with comparable accuracy on capable models. The key insight: models are already trained on diff output — it's a natural format for them.
### ✅ 6n-ii. Adaptive tool selection
- System prompt guidance: "Use `edit_file` for single-location changes. Use `patch_file` for multi-hunk edits or when changing multiple related locations in the same file."
- Both tools remain available — model chooses based on task
### ✅ 6n-iii. Fuzzy patch application
- 3-tier cascade: exact match → whitespace-normalised → hint-biased on multiple candidates
- Context lines used for anchoring — if context matches but line numbers are off, apply at the matched location
- Critical for local models that produce slightly incorrect line numbers in `@@` headers
**Implementation:**
- `src/tools/patch.rs` — `parse_hunks()`, `apply_hunk()`, `find_needle()` with 3-tier fuzzy matching; 6 unit tests
- `src/tools/mod.rs` — registered in `all_definitions()`, `is_native()`, `dispatch()`
- `src/agent.rs` — system prompt guidance, `is_mutating` check, hook/telemetry arm
---
## Phase 6o — Multi-File Awareness via Git - We can do this last - cargo and typescript compilars will work quite well without this for now
**Leverages Phase 6m's git integration to detect and handle cross-file breakage.** Currently, when a model edits `auth.rs` and breaks `handler.rs`, the only detection mechanism is the `cargo check` hook — which only works for languages with fast type-checkers. This phase makes cross-file impact visible to the model proactively.
### 6o-i. Change-impact analysis (git-powered)
- After each file edit, run `git diff --name-only` against the checkpoint to get the full list of modified files
- Cross-reference modified files against the project symbol index (`src/index.rs`): which symbols in modified files are imported/used by other files?
- If a modified symbol is referenced in files not yet touched by the model → inject a warning into the tool result:
`"⚠ Modified \`validate_token\` in src/auth.rs — referenced by: src/handler.rs:14, src/middleware.rs:8. Consider updating these files."`
- Zero model calls — pure deterministic analysis using the symbol index + basic import/use scanning
### 6o-ii. Scope-aware file loading in plan mode
- When generating a plan, use git history to identify co-change patterns: files that are frequently modified together
- `git log --name-only --pretty=format: -50` → parse file co-occurrence matrix
- If a plan step targets `auth.rs` and history shows `auth.rs` + `handler.rs` are modified together in 60%+ of commits → auto-include `handler.rs` in the step's file list
- Surfaces as a suggestion in the plan review overlay: `"history suggests handler.rs is usually modified alongside auth.rs — include? [y/N]"`
### 6o-iii. Post-task validation sweep
- After a full agent run or plan execution completes, run a lightweight validation:
1. `git diff --name-only` → list all modified files
2. For each modified file: check if any exported symbol's signature changed
3. For each changed signature: grep for usages in non-modified files
4. If stale references found → report: `"⚠ 3 files may need updates: src/handler.rs, src/test_auth.rs, src/middleware.rs"`
- Model can then be prompted to fix these, or user can review manually
- This catches the cross-file breakage that single-file hooks miss
### 6o-iv. Git blame for context
- When reading a file for editing, optionally show recent git blame annotations for the target region
- Helps the model understand code authorship and recency: recently-changed code is more likely to be the target of a bug fix
- Exposed as `read_file` parameter: `blame: true` → adds `(3 days ago, user)` annotations to relevant lines
- Lightweight: only fetches blame for the requested line range, not the entire file
**Implementation:**
- `src/git.rs` — `changed_files()`, `co_change_matrix()`, `blame_range()`, `changed_symbols()`
- `src/index.rs` — extend with `find_usages(symbol, exclude_files) -> Vec<(path, line)>` for cross-reference scanning
- `src/agent.rs` — post-edit change-impact warning injection, post-task validation sweep
- `src/plan.rs` — co-change suggestions in plan generation
- `src/tools/read.rs` — optional `blame` parameter
## Phase 6p — TUI Visual Overhaul
**Turn the TUI from "functional terminal app" into "this looks like a real product."** Ratatui was absolutely the right choice here — it has first-class `Tabs`, `Table`, split layouts, scrollable viewports, and inline syntax highlighting via `syntect`. Everything below is achievable without changing framework. This is the phase where PareCode stops looking like a dev tool and starts looking like a product.
### 6p-i. Tab bar (top of screen) - Working pretty nicely also
Replace the current single-view layout with a tab bar across the top. Each tab is a full-screen view. `1-5` number keys or `Ctrl+Tab` to switch.
```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─ Git ─┬─ 📊 Stats ─┬─ 📋 Plan ─┐
│ │
```
| Tab | Contents | Key |
|---|---|---|
| DONE - **Chat** (default) | Current conversation view — what exists today | `1` |
| Mostly - DONE - **Config** | Profile switcher, hooks status, MCP servers, conventions preview | `2` |
| NOT DONE - **Git** | Diff viewer, commit history, checkpoint list, undo controls | `3` |
| Needs fixed - **Stats** | Telemetry dashboard — session totals, per-task breakdown, cost tracking | `4` |
| Needs tested - **Plan** | Plan viewer when a plan is active — step list, status, carry-forward summaries | `5` |
**Design notes:**
- Tabs use ratatui's `Tabs` widget — already built into the library, just needs importing
- Only the Chat tab exists at launch; other tabs appear contextually (Git tab only if in a git repo, Plan tab only when a plan is active)
- Tab bar is a single row — minimal vertical space cost
- Active tab highlighted, inactive tabs dimmed
- Each tab has its own scroll state — switching tabs preserves position
### DONE 6p-ii. Session sidebar (left panel, Chat tab) - Working pretty nicely
A collapsible sidebar on the left showing session history — like the sidebar in ChatGPT/Claude web UI. This is the single biggest UX improvement for multi-session users.
```
┌──────────┬────────────────────────────────────────┐
│ Sessions │ Chat │
│──────────│ │
│ ▶ Today │ You: add auth to the API │
│ jwt auth│ ⚒ reading src/routes.ts... │
│ fix css │ │
│ │ │
│ ▶ Yday │ │
│ refactor│ │
│ tests │ │
│──────────│ │
│ [+] New │ │
└──────────┴────────────────────────────────────────┘
```
**Behaviour:**
- `Ctrl+B` toggles sidebar visibility (like VSCode)
- Default: hidden on terminals < 120 cols, visible on wider terminals
- Sidebar width: 20 chars fixed, or configurable
- Sessions grouped by date (Today, Yesterday, This Week, Older)
- Click/Enter on a session to resume it — replaces `/resume` for most users
- Active session highlighted
- `[+] New` at bottom to start fresh session (replaces `/new` for most users)
- Session entries show: first message preview (truncated), turn count, model used
**Implementation:**
- `src/tui/render.rs` — `Layout::default().direction(Direction::Horizontal)` split: sidebar + main chat area
- `src/tui/mod.rs` — `AppState.sidebar_visible: bool`, `AppState.sidebar_selected: usize`
- Sessions loaded from existing `~/.local/share/parecode/sessions/` JSONL files
### 6p-iii. Git tab (full diff viewer)
**The terminal diff viewer.** This is the "mad but really cool" one — and it's very doable in ratatui. `delta` and `diff-so-fancy` proved terminal diffs can look great. We don't need to shell out — we can render it natively.
```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─ Git ─┬─ 📊 Stats ─┐
│ │
│ Checkpoint: parecode: before "add JWT auth" │
│ 3 files changed, +42 -8 │
│ │
│ src/auth.rs ──────────────────────────────────── │
│ @@ -12,6 +12,14 @@ │
│ fn validate_token(token: &str) -> Result<...> │
│ - let claims = decode(token)?; │
│ + let claims = decode(token) │
│ + .map_err(|e| AuthError::Invalid(...))?; │
│ + log::info!("validated: {}", claims.sub); │
│ Ok(claims) │
│ │
│ [u] Undo to checkpoint [c] Commit [s] Stash │
└───────────────────────────────────────────────────┘
```
**Features:**
- Syntax-highlighted diff — added lines green, removed lines red, context lines dimmed
- File headers as collapsible sections (Enter to expand/collapse a file's hunks)
- Scrollable — `↑↓` to navigate, `Page Up/Down` for fast scroll
- Bottom action bar: `u` undo to checkpoint, `c` commit changes, `s` stash
- Checkpoint history list (left side or top selector): navigate between checkpoints
- `git diff --stat` summary at the top
**Implementation:**
- `src/tui/git_view.rs` — new module for git tab rendering
- Parse `git diff` output into structured hunks (or use `src/git.rs` from Phase 6m)
- Syntax colouring: line-prefix-based (`+` = green, `-` = red, `@@` = cyan) — no `syntect` needed for diffs
- Scrollable viewport: ratatui's built-in scroll support
### 6p-iv. Config tab (profile/hooks/MCP management) - Done, needs edit file functionality directly
A read/edit view of the current configuration — eliminates the need to leave PareCode to edit `config.toml`.
```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─ Git ─┬─ 📊 Stats ─┐
│ │
│ Profile: local (active) │
│ ───────────────────────── │
│ endpoint: http://localhost:11434 │
│ model: qwen3:14b │
│ context_tokens: 32768 │
│ planner_model: — │
│ │
│ Hooks │
│ ───── │
│ on_edit: cargo check -q ✓ enabled │
│ on_task_done: cargo test -q ✓ enabled │
│ │
│ MCP Servers │
│ ─────────── │
│ brave: running (3 tools) │
│ fetch: running (1 tool) │
│ │
│ Conventions: .parecode/conventions.md (loaded) │
│ │
│ [p] Switch profile [e] Edit config [h] Toggle │
└───────────────────────────────────────────────────┘
```
**Features:**
- Shows all profile fields, hooks, MCP server status (running/stopped/error + tool count)
- `p` to switch profile (triggers the existing `/profile` logic)
- `h` to toggle hooks on/off (existing `/hooks on|off`)
- `e` to open config file in `$EDITOR` (shell out, return to TUI after)
- Conventions preview — first 10 lines of loaded conventions file
- Profile list on the left if multiple profiles exist — highlight active, arrow keys to browse
### 6p-v. Stats tab (telemetry dashboard) - Generally not bad - reactivity to current session could be better
The existing stats bar is great. This tab expands it into a full dashboard — the kind of thing you screenshot and share.
```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─ Git ─┬─ 📊 Stats ─┐
│ │
│ Session: 12 tasks · 4.2h · claude-sonnet │
│ │
│ Tokens ████████████░░░░ 74.2k (avg 6.2k) │
│ Tool calls ████████░░░░░░░░ 48 (avg 4/task) │
│ Compression ███░░░░░░░░░░░░░ 22% avg │
│ Budget hits █░░░░░░░░░░░░░░░ 3 enforcements │
│ │
│ Task breakdown: │
│ ───────────── │
│ #1 "add JWT auth" 12.4k tok 8 tools ✓ │
│ #2 "fix CSS header" 3.1k tok 3 tools ✓ │
│ #3 "rename columns" 1.8k tok 2 tools ✓ │
│ ... │
│ │
│ Est. cost this session: $0.12 │
│ vs estimated OpenCode equiv: ~$0.80 │
└───────────────────────────────────────────────────┘
```
**Features:**
- Bar charts using Unicode block characters (▏▎▍▌▋▊▉█) — no external charting needed
- Per-task breakdown with token count, tool calls, success/failure
- Running cost estimate (using profile's `cost_per_mtok` if configured)
- Comparative estimate ("vs OpenCode equivalent") — based on the 5-10x multiplier. This is the screenshot-worthy feature.
- Session totals and averages
- Export: `x` key to dump session stats to `.parecode/stats-export.json`
### 6p-vi. Plan tab (active plan viewer)
Only appears when a plan is active or was recently completed. Shows the full plan with live step status.
```
┌─ ⚒ Chat ─┬─ ⚙ Config ─┬─ Git ─┬─ 📋 Plan ──┐
│ │
│ Plan: add JWT authentication │
│ 4 steps · est. 12k–18k tokens · ~$0.004 │
│ │
│ ✓ Step 1: Add JWT dependency to Cargo.toml │
│ └─ modified: Cargo.toml [jsonwebtoken] │
│ └─ 2.1k tokens, 3 tool calls │
│ │
│ ⟳ Step 2: Implement token validation middleware │
│ └─ files: src/auth.rs, src/middleware.rs │
│ └─ running... 4.2k tokens so far │
│ │
│ ○ Step 3: Add auth routes │
│ ○ Step 4: Integration tests │
│ │
│ [a] Annotate step [p] Pause [Enter] View step │
└───────────────────────────────────────────────────┘
```
**Features:**
- Live-updating step status (✓ complete, ⟳ running, ○ pending, ✗ failed)
- Expand a step (Enter) to see its carry-forward summary, tool calls, files modified
- Annotations visible inline
- Running token count per step and cumulative
- Plan review mode accessible from this tab (before execution starts)
### 6p-vii. Visual polish (cross-cutting)
**Syntax highlighting in chat:**
- Code blocks in model responses get language-aware syntax colouring
- Use `syntect` crate (commonly paired with ratatui) or `tree-sitter-highlight`
- Fallback: backtick-delimited blocks get monospace styling without colour
**Markdown rendering in chat:**
- Bold, italic, headers, bullet lists rendered with proper ratatui `Style`
- Links shown as underlined + blue
- Tables rendered with box-drawing characters
- This alone makes the chat output dramatically more readable
**Responsive layout:**
- < 80 cols: compact mode — no sidebar, abbreviated status bar, single-line tabs
- 80–120 cols: standard mode — current layout + tabs
- > 120 cols: full mode — sidebar visible by default, expanded stats
**Theme support (config-driven):**
- `theme = "dark"` (default), `"light"`, `"monokai"`, `"solarized"`
- Defined as named colour palettes in config — simple to add community themes later
- `[theme.colors]` table in config for per-element customisation
### 6p-viii. Ratatui feasibility notes
All of this is achievable with ratatui's built-in widget set:
| Feature | Ratatui widget/approach |
|---|---|
| Tab bar | `Tabs` widget (built-in) |
| Sidebar | `Layout::Horizontal` split |
| Diff viewer | `Paragraph` with styled `Span`s per line |
| Bar charts | `Paragraph` with Unicode block chars, or `BarChart` widget |
| Scrollable lists | `List` with `ListState` scroll tracking |
| Collapsible sections | Custom `StatefulWidget` tracking expanded state |
| Syntax highlighting | `syntect` → `Style` mapping, or manual keyword colouring |
| Markdown rendering | Parse to `Vec<Line<'_>>` with styled `Span`s |
| Responsive layout | `Constraint::Percentage` + terminal size check |
The tab architecture requires restructuring `draw_ui()` in `render.rs` from a single monolithic function to a dispatcher: `match active_tab { Tab::Chat => draw_chat(f, area, state), Tab::Git => draw_git(f, area, state), ... }`. Each tab becomes its own render function in its own module under `src/tui/`.
**Proposed file structure:**
```
src/tui/
├── mod.rs # event loop, state, tab switching
├── render.rs # top-level draw dispatcher, tab bar, status bar
├── chat.rs # chat view (most of current render.rs moves here)
├── sidebar.rs # session sidebar
├── git_view.rs # git tab — diff viewer, checkpoint list
├── config_view.rs # config tab — profile/hooks/MCP display
├── stats_view.rs # stats tab — telemetry dashboard
├── plan_view.rs # plan tab — step list, live status
├── markdown.rs # markdown → ratatui Span/Line converter
└── theme.rs # colour palette definitions
```
**GIT WARNING**
Git integration complexity. 6m is marked ESSENTIAL and it is, but git is a minefield. Dirty working trees, detached HEAD, submodules, shallow clones, worktrees, repos with 100k+ files. The "works automatically if in a git repo, skips silently if not" design is correct, but the edge cases will take real-world testing to flush out. Keep the initial implementation conservative — checkpoint via commit on a temp branch is safer than stash (stash has more failure modes).
### Check in with token usage - we are aiming to lead the market in efficiency
System prompt size. You're now injecting: conventions, session context, step carry-forward summaries, git status, change-impact warnings, hook descriptions, and tool schemas. On a 32k local model, that preamble could consume 20-30% of the window before the user even types. You may need a preamble budget that mirrors the token budget — prioritise and compress injected context, not just conversation history.
---
## Version 1 — Publish, Validate, and Gate Phase 7
> **This is the quality gate.** Phase 7 does not start until every benchmark category below passes. The goal is publishable evidence that PareCode's efficiency claims are real, and a regression baseline that protects them going forward.
**Prerequisites before starting validation:**
- Phase 6b (distribution / cargo-dist) complete — test on a clean install, not a dev build
- Phase 6c (first-run wizard) complete — test the real new-user flow, not a hand-configured setup
- All 6a–6o (ideally some of the good parts of 6P) phases building and shipping in the release binary - COMPLETE
**Metrics to record for every test run** (telemetry captures most of this automatically in `.parecode/telemetry.jsonl`):
| Metric | How to get it |
|---|---|
| Input tokens | `-v` flag or telemetry stats bar |
| Output tokens | same |
| Tool calls | telemetry `tool_calls` field |
| Wall time | telemetry `duration_secs` |
| Re-reads | count `read_file` calls on already-seen paths |
| Loops | count repeated `(tool, args)` pairs |
| Success | did the task complete correctly with no user intervention? |
Save the telemetry snapshot after each run. These become the regression baseline — any Phase 7 change that regresses these numbers by >10% is a blocker.
---
### V1-A. Baseline: Qwen3 14B (Ollama, local)
> The hardest test. If PareCode guides a messy 14B model better than OpenCode, that's the headline claim validated.
**Setup:** `tsc --noEmit` hook auto-detected and active for TypeScript tasks. Run the same tasks in OpenCode first and record its numbers — the diff is the publishable story.
| Task | OpenCode result (record before testing PareCode) | PareCode target |
|---|---|---|
| Replace all instances of a term project-wide | Loops, re-reads, often fails | ≤ 4 tool calls, 0 re-reads, correct |
| Update HTML + SCSS: change colours, improve styling | Loses context mid-task, wrong file edits | Completes in ≤ 6 tool calls, hook catches TSC errors |
| Angular: migrate `input` binding to `@input()` decorator | Classic OpenCode death — loops on search | ≤ 5 tool calls, uses search to verify no instances remain |
For each task record the full metric set above. The `tsc --noEmit` hook injection is the key thing to observe — does the model read the error output and self-correct in the same loop without a re-read?
---
### V1-B. Hooks self-correction validation (Claude Sonnet)
> This is the money shot for the hooks system. A capable model that reads `⚙ cargo check -q (exit 1): error[E0308]…` and self-corrects in the same tool loop — no extra read_file round-trip — is the proof that on_edit injection works as designed.
**Setup:** Claude Sonnet profile with `cargo check -q` hook (PareCode Rust codebase, or any real Rust project).
| Test | What to observe |
|---|---|
| Make a deliberate type error, ask PareCode to add a function | Does Claude see the hook output and fix the error without re-reading? |
| Multi-step plan on a real feature | Do all steps pass verification? Do step carry-forward summaries give Claude correct context? |
| Edit a file that has shifted since last read | Does the hash anchor mismatch fire? Does Claude re-read and retry correctly? |
| Compare token count: PareCode+Claude vs Claude Code on same task | Record both. This is the efficiency headline. |
Hash-anchored edits (Phase 6g) are specifically worth testing here — Claude will actually use the optional `anchor` parameter, Qwen 14B likely ignores it.
---
### V1-C. Cloud mid-range: Qwen3-Coder 72B (OpenRouter)
> The realistic ceiling for users who want local-model quality without Anthropic pricing. If PareCode makes 72B usable for complex multi-file tasks, that's a strong story for the cost-conscious segment.
**Setup:** OpenRouter profile. Tests validate that lean schemas and context management work across provider backends — OpenRouter wraps the API differently from Ollama.
| Test | Target |
|---|---|
| Same Angular migration task as V1-A | Compare tool call count and success rate vs Qwen3 14B. Expect meaningful improvement. |
| Multi-file refactor (rename a type used across 5+ files) | Should complete with plan mode. Record step count and carry-forward summary accuracy. |
| Schema compatibility | Confirm all tools dispatch correctly — OpenRouter backends sometimes reject strict schemas |
---
### V1-D. MCP integration (Claude Sonnet + web search)
> MCP is not validated by unit tests. The interesting failure mode is the model hitting a knowledge boundary mid-task and either not reaching for web search, or using it incorrectly. This must work cleanly before Phase 7 adds more complexity on top.
**Setup:** Claude Sonnet profile with `brave` or `fetch` MCP server configured.
| Test | What to validate |
|---|---|
| "Update this library to use the v4 API" (where v4 released after training cutoff) | Does Claude autonomously call web search? Does it use the result to inform the edit? |
| Multi-step plan where one step requires fetching a doc | Does MCP dispatch work correctly inside plan step context isolation? |
| Two MCP servers active simultaneously | No cross-contamination, both tools visible in tool list |
| MCP server that fails to start | Silently skipped, rest of session unaffected |
The key signal: web search should feel like a natural tool call, not a special case. If the model hesitates or fails to use it when it clearly should, that's a system prompt or tool schema issue to fix before Phase 7.
---
### V1-E. Regression baseline
After V1-A through V1-D pass:
1. **Save telemetry snapshots** — copy `.parecode/telemetry.jsonl` to `benchmarks/v1-baseline-{model}.jsonl` for each model tested
2. **Document the passing task set** — these become the fixed regression suite; any future change that causes a previously-passing task to fail or regress by >10% in tokens/tool-calls is a blocker before merge
3. **Publish results** — the token efficiency comparison (PareCode vs OpenCode on the same tasks) is the viral moment. Even a blog post or README table is enough for early traction.
**Phase 7 is gated on:** all four test categories above showing clean results, regression baseline saved, and at least the Qwen3 14B + Claude Sonnet comparisons documented.
---
## Phase 7 — Advanced Orchestration
### 7a. Automatic model routing by category
Extend `planner_model` into a full `model_routes` table. Tasks and plan steps declare a category; the harness picks the right model automatically.
**Categories:**
| Category | Profile model example | When used |
|---|---|---|
| `deep` | `claude-opus-4-6` | Complex multi-file refactors, architecture decisions |
| `standard` | `claude-sonnet-4-6` | Default — most coding tasks |
| `quick` | `claude-haiku-4-5-20251001` | Single-file edits, quick queries |
| `search` | cheapest available | Web search, grep, read-only research |
**Config:**
```toml
[profiles.claude.model_routes]
deep = "claude-opus-4-6"
standard = "claude-sonnet-4-6"
quick = "claude-haiku-4-5-20251001"
search = "claude-haiku-4-5-20251001"
```
**Integration with plan steps:**
- Plan generation adds a `category` field to each step based on instruction complexity
- Agent loop selects model per step rather than once per session
- Quick mode auto-routes to `quick` category
### 7b. Background parallel plan steps
Execute independent plan steps concurrently. Sequential by default; parallel only when steps have no file overlap.
**Dependency analysis (static, no model call):**
- Build a directed graph: step A → step B if B lists a file that A modifies
- Steps with no shared files and no dependency edge → eligible for parallel execution
- Max concurrency: configurable `parallel_steps = 3` in config (default: 1 = sequential)
**Execution:**
- `tokio::spawn` per eligible step group
- Each step gets its own `McpClient` scope (MCP connections not shared across parallel steps)
- Results collected in order; step summaries merged before next sequential step
- TUI shows parallel steps as a grouped block with individual ✓/✗ per step
**Constraints:**
- Steps that call `bash` with side effects are always sequential (conservative)
- File write conflicts → pause, surface to user for resolution
- Requires 7a (model routing) to be useful — parallel steps should use `quick`/`search` routes
### 7c. MCP skill scoping
Scope MCP servers to specific plan step categories or task keywords rather than loading all servers globally.
**Config:**
```toml
[[profiles.local.mcp_servers]]
name = "playwright"
command = ["npx", "-y", "@playwright/mcp"]
scope = ["visual", "frontend", "test-e2e"] # only loaded for these categories
```
**Behaviour:**
- At plan step start: check step category against each server's `scope`
- Only matching servers included in tool list for that step
- Reduces tool list size by 60-80% for non-matching steps — keeps model focused
### 7d. Image/multimodal support
**Increasingly table-stakes.** "Fix this CSS — here's a screenshot" is a real workflow. Not critical for V1, but competitors are adding it and user expectations are shifting. Multimodal input turns PareCode from a text-only coding agent into a visual-aware development partner.
**Core capabilities:**
**7d-i. Image input in TUI:**
- Drag-and-drop or paste image into the TUI input (terminal image protocols: iTerm2 inline images, Kitty graphics protocol, Sixel)
- `@screenshot.png` file attachment — same `@` picker as text files, but detected as image by extension
- `/screenshot` command — capture the current terminal or a region and attach automatically
- Images encoded as base64 and sent via the `image_url` content block in the OpenAI-compatible API (supported by Claude, GPT-4o, Gemini, and increasingly by local multimodal models)
**7d-ii. Use cases:**
| Scenario | Value |
|---|---|
| "Fix this CSS — here's what it looks like" | Visual debugging without describing layout issues in words |
| "Implement this design" (attach mockup) | Design-to-code from a screenshot or Figma export |
| "What's wrong with this error?" (attach terminal screenshot) | Non-text error formats (stack traces with colour, GUI error dialogs) |
| "Match the style of this component" (attach reference) | Visual consistency without manual style description |
**7d-iii. Implementation:**
- `src/client.rs` — extend `MessageContent` to support `image_url` content blocks alongside text
- `src/tui/mod.rs` — image attachment via `@` picker (filter by image extensions: png, jpg, jpeg, gif, webp, svg), base64 encoding on attach
- `src/agent.rs` — pass image content blocks through to API call, strip images from context on budget compression (images are expensive — ~1k tokens per image, and stale images should be evicted first)
- `src/budget.rs` — images get a higher compression priority (evict old images before old text)
- Fallback: if the model/endpoint doesn't support vision, return a clear error: `"This model does not support image input. Switch to a vision-capable model (Claude Sonnet, GPT-4o, etc.)"`
**7d-iv. Model compatibility:**
| Model | Vision support |
|---|---|
| Claude Sonnet/Opus | ✓ |
| GPT-4o | ✓ |
| Gemini Pro/Flash | ✓ |
| Qwen-VL (local) | ✓ (Ollama) |
| Qwen3 14B (text-only) | ✗ — clear error message |
| Most local coding models | ✗ — clear error message |
---
## File Structure (target)
```
src/
├── main.rs # clap CLI, single-shot + TUI dispatch
├── client.rs # HTTP client, SSE streaming, tool call parsing
├── agent.rs # agent loop, project map, conventions loading, build check
├── history.rs # tool output compression (model vs display summaries)
├── cache.rs # file read cache + re-read prevention
├── budget.rs # proactive token budget, loop detection
├── sessions.rs # session persistence, JSONL, context injection (8k cap)
├── ui.rs # tool glyphs
├── config.rs # profile system, config file load/write
├── mcp.rs # MCP client — spawn servers, JSON-RPC, tool discovery + dispatch
├── index.rs # project symbol index — fn/struct/class/impl → file path, used by plan gen
├── telemetry.rs # SessionStats, TaskRecord, JSONL persistence
├── plan.rs # plan data structure, step execution, step summaries
├── git.rs # git integration — checkpoint, undo, diff, blame, co-change analysis
├── tools/
│ ├── mod.rs # tool registry + dispatch
│ ├── read.rs # read_file with smart excerpting + symbols=true index
│ ├── write.rs # write_file (overwrite guard)
│ ├── edit.rs # edit_file (fuzzy matching, ±15 line failure hint)
│ ├── bash.rs # bash execution (async, timeout, 200-line cap)
│ ├── recall.rs # retrieve full stored output by id or tool name
│ ├── patch.rs # patch_file — unified diff application, fuzzy context matching
│ ├── search.rs # ripgrep wrapper (zero-match → declare done)
│ └── list.rs # list_files
└── tui/
├── mod.rs # event loop, state, tab switching, input handling
├── render.rs # top-level draw dispatcher, tab bar, status bar
├── chat.rs # chat view — conversation history, streaming output
├── sidebar.rs # session sidebar — grouped by date, resume on select
├── git_view.rs # git tab — syntax-highlighted diff viewer, checkpoint list
├── config_view.rs # config tab — profile/hooks/MCP status display
├── stats_view.rs # stats tab — telemetry dashboard, bar charts, cost tracking
├── plan_view.rs # plan tab — step list, live status, carry-forward summaries
├── markdown.rs # markdown → ratatui Span/Line converter
└── theme.rs # colour palette definitions, theme switching
```