nitpicker 0.5.1

# nitpicker

Multi-reviewer code review using LLMs. Spawns parallel agents with different models/prompts, aggregates their feedback into a final verdict.

## Contributor memo

- Before opening a PR, update `README.md` and `CLAUDE.md` for any user-facing or architecture-relevant changes.
- If you bump the version, add a short summary entry to the changelog in `README.md`.

## Quick start

```bash
# Review current PR/diff (debate by default)
cargo run -- --repo .

# Use parallel aggregation instead of debate
cargo run -- --repo . --no-debate
cargo run -- --repo . --no-debate --max-turns 40

# Static analysis of existing code
cargo run -- --repo . --analyze
cargo run -- --repo . --analyze src/db/

# Custom focus
cargo run -- --repo . --prompt "focus on SQL injection"

# Ask a free-form question (debate by default)
cargo run -- ask "should we use eyre or thiserror?"

# Ask with parallel aggregation instead
cargo run -- ask --no-debate "should we use eyre or thiserror?"

# Review current branch's open PR and post result as a comment (requires gh CLI)
cargo run -- pr

# Review a remote PR by URL
cargo run -- pr https://github.com/owner/repo/pull/42

# Reflect on saved sessions
cargo run -- reflect
cargo run -- reflect --n 10

# Gemini OAuth (first-time setup)
cargo run -- --gemini-oauth

# Generate config preferring OpenRouter experimental free models
cargo run -- init --free

# Alloy mode: pool all reviewer models into one shared random-selection client
cargo run -- --alloy
cargo run -- ask --alloy "should we use eyre or thiserror?"
```

## Architecture

```
main.rs         CLI, config loading, wires everything together
config.rs       TOML config deserialization (Config, ReviewerConfig, AggregatorConfig)
review.rs       orchestrates parallel reviewers → aggregation
debate.rs       sequential actor/critic debate loop → meta-review
agent.rs        agentic tool-use loop for a single reviewer
llm.rs          LLM client trait, per-provider impls, retry wrapper
tools.rs        tool definitions: read_file, glob, grep, git
pr.rs           GitHub PR subcommand: fetch metadata via gh, review, post comment
reflect.rs      Reflect subcommand: analyze saved session trajectories and synthesize improvements
gemini_proxy/   local HTTP proxy that translates Gemini API calls to Google Code Assist
```

### Review flow

1. `review.rs` spawns one `tokio::task` per `[[reviewer]]` in config
2. Each task runs `agent.rs::run_agent` — an agentic loop: call LLM → execute tool calls → feed results back → repeat until the model returns text (default max 100 turns, overrideable via config/CLI)
3. All reviewer outputs are collected, concatenated, and sent to the aggregator model in a single completion call
4. The aggregator's response is printed to stdout

### Debate flow (default review mode and `ask`)

1. `reviewer[0]` = Actor/Reviewer, `reviewer[1]` = Critic/Validator, `aggregator` = Meta-reviewer
2. Each round: Actor turn → Critic turn. Both have access to all file/git tools plus `submit_verdict(verdict, agree)`
3. `agree=true` from Critic → convergence, loop ends early
4. After all rounds: meta-reviewer synthesizes the full dialogue in a single non-agentic completion
5. Default stdout shows only the final synthesized result; `--verbose` also prints the intermediate debate text and transcript path
6. Transcript saved to the OS temp dir as `debate-{ts}.md` (topic) or `review-debate-{ts}.md` (code review)
7. `DebateMode::Topic` (from `ask`) uses Actor/Critic roles and general debate prompts
8. `DebateMode::Review` (from default review mode) uses Reviewer/Validator roles and code-review-focused prompts

**Alloy mode** (`--alloy` / `defaults.alloy = true`): instead of pinning actor and critic to `reviewer[0]`/`reviewer[1]`, builds an `AlloyClient` that randomly selects from all configured reviewer models each turn. Requires ≥ 2 reviewers.

### Agent execution (`agent.rs`)

- Each reviewer runs an agentic loop with file/git tools until it returns text or reaches the turn limit
- Review prompts encourage a quick local map, a short working plan, and an early wave of subagents for bounded disjoint investigations
- Reviewers can delegate deeper investigations via `spawn_subagent`
- Subagent depth is capped at 2 to bound recursion and cost
- Subagents return results through a hidden `finish(result)` tool; debate agents use `submit_verdict(verdict, agree)` instead
- Repetitive tool-call cycles are blocked, and the agent can force a context reset to break out of loops

### PR flow (`pr.rs`)

1. `check_gh()` verifies the `gh` CLI is available
2. `PrFlow` enum picks the path: `CurrentBranch` (no URL), `InPlace` (URL + origin matches + no `--clone`), or `TempClone`. `PrLock` is acquired BEFORE any git mutation for the first two; `TempClone` is lock-free (unique temp dir per process). Liveness uses `libc::kill(pid, 0)`.
3. In-place: refresh remote-tracking branches, skip checkout if `HEAD == headRefOid`, otherwise require a clean working tree and `git switch -c` to a namespaced `nitpicker/pr-N` from `FETCH_HEAD`. Restored on exit via `git switch --`.
4. Temp clone: `git clone --filter=blob:none` (partial clone, so merge-base is reachable) then fetch + switch to the PR head; `TempDir` drops at the end.
5. `fetch_pr_meta` retrieves title, body, and `headRefOid` via `gh pr view --json`; `fetch_pr_comments` pulls issue-level comments separately.
6. `build_pr_prompt` assembles the review prompt from PR title + body + PR comments + diff context + optional `--prompt`.
7. Review runs via `debate::run_debate` by default, or `review::run_review` with `--no-debate`. Unless `--no-comment`, result is posted back via `gh pr comment`.

### Reflect flow (`reflect.rs`)

1. Load recent session directories from `~/.nitpicker/sessions` or explicit `--session` paths
2. Parse per-agent JSONL tool traces and `aggregation.json` into typed session records
3. Format each session into a compact markdown summary of agents, tool activity, and final verdict
4. Run one analysis task per session using the first reviewer model
5. Synthesize the per-session analyses into a final report using the second reviewer model when available, otherwise reuse the first

### LLM abstraction (`llm.rs`)

- `LLMClient` trait: one method, `completion(Completion) -> Result<CompletionResponse>`
- Per-provider impls: `anthropic::Client`, `gemini::Client`, `openai::CompletionsClient`
- `AlloyClient` wraps a vec of `(Arc<dyn LLMClientDyn>, model_name)` slots and picks one at random per call (XBOW Alloy technique)
- `RetryingLLM<C>` wraps any client with jittered exponential backoff (4 attempts, 250ms–5s). Skips retry on 4xx errors.
- Always wrap clients with `.with_retry()` — the OAuth Gemini path is no exception

### Tools (`tools.rs`)

Tools return `String`, never `Err` — errors are returned as `"Error: ..."` strings so the LLM can self-correct. The exception is truly unrecoverable errors (e.g. missing required argument).

`GitTool` only allows a fixed allowlist of read-only subcommands. Commands are passed directly to `Command::new("git").args(tokens)` — no shell involved.

`GrepTool` recursively searches files and skips binary files. Context loading for `CLAUDE.md` / `AGENTS.md` also skips binary files.

Tool outputs are intentionally a bit self-describing: `read_file` includes file/range headers, `glob`/`grep` return explicit no-match messages, and truncation messages say when output is partial.

### Session artifacts (`session.rs`)

- When `[defaults].log_trajectories = true`, nitpicker writes session artifacts under `~/.nitpicker/sessions/session-<timestamp>-<pid>/`
- Reviewer and debate-turn traces are stored as per-agent JSONL files
- Final synthesized output is saved as `aggregation.json`

### Gemini AG2 proxy (`gemini_proxy/`)

When `auth = "agy-keyring"` is set for a Gemini reviewer/aggregator, nitpicker:
1. Runs a local axum HTTP server on a random port
2. Translates incoming Gemini API requests to Google Code Assist API format
3. Attaches the Antigravity OAuth Bearer token read from the system keyring
4. Sends chat through `v1internal:streamGenerateContent?alt=sse` and folds SSE chunks back into Gemini-style JSON

The token is read via the `keyring` crate (Secret Service on Linux, Keychain on macOS, Credential Manager on Windows) at `service=gemini`, `account=antigravity`, decoding the optional `go-keyring-base64:` wrapper. Refresh is delegated to `agy` — if the token is expired the proxy bails with "run `agy` to refresh it". `fetchAvailableModels` is called on proxy startup to discover available model IDs; tested AG2 models are `gemini-3.1-pro-low` and `gemini-3.5-flash-low` (others like `gemini-3-flash-agent` should work but are untested).

This auth path is explicitly disallowed by AG2 ToS Section 6 ("using the Service in connection with products not provided by us") and Google is actively suspending paid accounts for third-party OAuth bridges — keep it framed as research only in any user-facing copy.

The legacy `auth = "oauth"` (browser PKCE flow with file-backed token storage) was removed in 0.5.0 — the proxy was retargeted at AG2 endpoints whose matching client_secret is not public, so the flow could not complete. The config validator now rejects `auth = "oauth"` with a migration hint to `agy-keyring` or `GEMINI_API_KEY`.

## Configuration

Config hierarchy (first wins):
1. `--config <path>` (explicit)
2. `nitpicker.toml` in repo root
3. `~/.nitpicker/config.toml` (global)

Reviewers automatically load project context from `CLAUDE.md` or `AGENTS.md` if present in the repo root.

`nitpicker init --free` prefers OpenRouter in the generated config and writes `model = "free"` for OpenRouter slots when `OPENROUTER_API_KEY` is set. When the generated config uses two reviewer slots, it emits two OpenRouter free reviewers so both slots get free-model auto-selection. If the key is missing, init warns and falls back to the normal provider order.

## Adding a new provider

1. Add a variant to `ProviderType` in `config.rs` with a `#[serde(rename = "...")]`
2. Add a new arm to `provider_from_config` in `review.rs`
3. Add a new variant to `LLMProvider` in `llm.rs` and implement `client_from_env`
4. Implement `LLMClient` for the provider's client type

## Key constraints

- Reviewers run concurrently — reviewer code must be `Send + Sync`
- Parallel review execution is capped at 8 concurrent reviewers
- Tool results are truncated to 50k bytes before being sent to the LLM
- Git tool output is truncated to 50k chars
- Agent and debate turn loops default to 100 turns and can be overridden via config or CLI
- Context files (`CLAUDE.md`, `AGENTS.md`) are limited to 50k chars
- Prefer `match` over `if let` for better exhaustiveness checking, even if it requires a `_ => unreachable!()` arm