oven-cli 0.9.0

CLI that runs Claude Code agent pipelines against GitHub issues
Documentation
# Oven CLI

Oven (`oven-cli` on crates.io, `oven` binary) is a Rust CLI that orchestrates Claude Code agent pipelines against GitHub issues. Kitchen theme throughout. Inspired by [ocak](~/dev/ocak), rewritten with cleaner design.

## What this project does

Users label GitHub issues (or local issues) with `o-ready`. Oven picks them up (oldest first), plans them as a dependency graph (DAG), creates draft PRs, and runs a pipeline of Claude Code agents against each PR: implement -> review -> fix (up to 2 cycles) -> merge. All agent comments go on the PR, not the issue. The runner continuously polls for new issues and can parallelize independent work mid-run.

Issue source is configurable: GitHub (default) or local `.oven/issues/` files. PRs are always created on GitHub regardless of issue source. Local issue PRs say "From local issue #N" instead of "Resolves #N".

## Architecture

### CLI commands (clap)
- `oven prep` - scaffold project (recipe.toml, .claude/agents/, .oven/)
- `oven on [IDS]` - start pipeline. IDS are comma-separated issue numbers. Flags: `-d` (detached), `-m` (auto-merge), `--trust` (skip author validation). Prints a run ID (8 hex chars from uuid).
- `oven off` - kill detached process (reads .oven/oven.pid)
- `oven look [RUN_ID]` - view logs. Tails if active, dumps if done. `--agent <NAME>` filters. `--stream` shows agent progress/status from DB.
- `oven report [RUN_ID]` - cost, runtime, summary. `--all` for history, `--json` for machine output, `--graph` for dependency graph.
- `oven clean` - remove worktrees, logs, merged branches. `--only-logs`, `--only-trees`, `--only-branches`.
- `oven ticket create|list|view|close|label|edit` - local issue management in .oven/issues/

### Agents (4, all invoked via `claude -p --output-format stream-json`)
1. **Planner** - read-only. Analyzes issues and produces a dependency graph (DAG) with `nodes` and `depends_on` edges. Each node has issue metadata, complexity, and predicted files. Also accepts `graph_context` about in-flight issues for incremental re-planning. Legacy `batches` format auto-converts to DAG.
2. **Implementer** - full access. Writes code + tests in a worktree. PR is marked ready-for-review and description is filled after implementation. Also used for agent-assisted rebase conflict resolution.
3. **Reviewer** - read-only. Code quality + security + simplify in one pass. Outputs structured findings (critical/warning/info). Receives prior disputed/unresolved findings to avoid re-raising them. Cycle 2+ reviews are scoped to fixer changes only.
4. **Fixer** - full access. Addresses critical + warning findings from reviewer. Outputs structured JSON with resolved/disputed findings and a summary. Pipeline is resilient to fixer failures (silent commits, no output).

### Agent tool scoping
| Agent | Allowed tools |
|-------|---------------|
| Planner | Read, Glob, Grep |
| Implementer | Read, Write, Edit, Glob, Grep, Bash |
| Reviewer | Read, Glob, Grep |
| Fixer | Read, Write, Edit, Glob, Grep, Bash |

### Review-fix loop
Max 2 cycles: implement -> review -> fix -> review -> fix -> final review. The fixer can dispute findings it believes are incorrect; disputed findings are passed back to the reviewer so they aren't re-raised. Unresolved findings (fixer produced no output) are also tracked to prevent goalpost-moving. Cycle 2+ reviews are scoped to fixer changes via `pre_fix_ref`. If still broken after max cycles, stop and comment on PR with unresolved findings.

The fixer is resilient to failures: if it produces no structured output but makes commits, findings are inferred from git. If it does nothing, findings are marked not actionable. Merging is done directly via `gh pr merge` (no merger agent).

Hard caps: per-cycle cap (2 fix rounds), cost cap (configurable per-pipeline budget), turn cap (max turns per agent invocation).

### Issue source abstraction
`IssueProvider` trait abstracts over GitHub and local issue sources. Both produce `PipelineIssue` structs. The pipeline doesn't care which source is active. The `/cook` skill reads `recipe.toml` and creates issues via `oven ticket create` or `gh issue create` accordingly.

### Config (TOML, two levels)
- User: `~/.config/oven/recipe.toml` - defaults, multi-repo repo path mappings (only user config can set `[repos]` for security)
- Project: `recipe.toml` in repo root - overrides user config
- `[models]` section: per-agent model overrides (`default`, `planner`, `implementer`, `reviewer`, `fixer`). Values passed as `--model` flag to claude CLI.

### Multi-repo
Issues in a "god repo" can target other repos via `target_repo` frontmatter (in issue body for GitHub, in YAML frontmatter for local). Repo name -> local path mappings live in user config. PRs and worktrees go to the target repo; labels and comments stay on the god repo.

### State
- `.oven/oven.db` - SQLite. Pipeline state, cost tracking, agent run history, issue_source per run.
- `.oven/logs/<run_id>/` - per-run log files
- `.oven/worktrees/` - git worktrees per issue
- `.oven/issues/` - local issue markdown files
- `.oven/oven.pid` - PID file for detached mode

### Labels
`o-ready`, `o-cooking`, `o-complete`, `o-failed`

### Dependency graph
The planner outputs a DAG of issues with explicit `depends_on` edges. The runner executes issues layer-by-layer: all issues with no unmet dependencies run in parallel, then the next layer, etc. Cycle detection prevents invalid graphs. The graph is persisted to SQLite (`graph_nodes` + `graph_edges` tables) so it survives restarts.

### Continuous polling
Polling loop spawns tasks non-blocking with a shared `JoinSet` and `Semaphore` across poll cycles. In-flight tracking prevents double-spawning. Labels (`o-ready` -> `o-cooking`) provide GitHub-level dedup. The runner polls PR merge state for `AwaitingMerge` nodes to detect external merges.

## Tech stack
- Rust (edition 2024), tokio async runtime
- clap 4 derive API (CLI), rusqlite with bundled SQLite (state), toml (config)
- askama for compile-time agent prompt templates (`.txt` files in `templates/`)
- async-trait for `IssueProvider` dyn dispatch
- tracing + tracing-subscriber + tracing-appender (logging)
- gh CLI for all GitHub operations
- claude CLI for all agent invocations (`claude -p --output-format stream-json`)
- Git worktrees for isolation

## Project structure
```
src/
  main.rs                   thin entry point, clap parse + delegate
  lib.rs                    module declarations
  cli/
    mod.rs                  Cli struct, Commands enum (clap derive)
    prep.rs                 oven prep
    on.rs                   oven on
    off.rs                  oven off
    look.rs                 oven look
    report.rs               oven report
    clean.rs                oven clean
    ticket.rs               oven ticket subcommands
  config/
    mod.rs                  Config struct, layered loading
  db/
    mod.rs                  connection setup, migrations, pragmas
    runs.rs                 pipeline run CRUD
    agent_runs.rs           agent execution records
    graph.rs                dependency graph persistence (nodes + edges)
  git/
    mod.rs                  worktree management, branch ops
  process/
    mod.rs                  subprocess runner (tokio::process)
    stream.rs               claude stream-json parser, cost extraction
  github/
    mod.rs                  gh CLI wrapper
    labels.rs               label create/add/remove
    issues.rs               issue fetch/comment/transition
    prs.rs                  PR create/update/merge
  issues/
    mod.rs                  IssueProvider trait, PipelineIssue type
    local.rs                LocalIssueProvider (reads .oven/issues/)
    github.rs               GithubIssueProvider (wraps GhClient)
  agents/
    mod.rs                  AgentRole enum, invocation logic
    planner.rs
    implementer.rs
    reviewer.rs             structured findings output
    fixer.rs
  pipeline/
    mod.rs                  module declarations
    graph.rs                in-memory dependency graph, cycle detection, layer scheduling
    runner.rs               orchestration, polling loop, DAG-driven batch execution
    state.rs                status transitions, state machine
    executor.rs             step execution, review-fix loop, PR description building
  logging.rs                tracing setup (file + stderr)
templates/
  planner.txt               askama prompt templates per agent
  implementer.txt
  reviewer.txt
  fixer.txt
  skills/
    cook.md                 /cook skill template (scaffolded by oven prep)
    refine.md               /refine skill template
action/
  action.yml                GitHub Action metadata (node20 runtime)
  src/
    index.ts                action entry point
    install.ts              oven + claude CLI installer
    run.ts                  pipeline runner, issue comment posting
  __tests__/                vitest unit tests
  dist/                     ncc-compiled bundle (committed)
tests/
  common/
    mod.rs                  shared test helpers, fixtures
  cli_tests.rs              assert_cmd integration tests
  pipeline_tests.rs         pipeline integration tests
  db_tests.rs               database tests
```

## Dependencies
```toml
[dependencies]
tokio = { version = "1", features = ["full"] }
clap = { version = "4", features = ["derive"] }
rusqlite = { version = "0.32", features = ["bundled"] }
rusqlite_migration = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
toml = "0.8"
anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
tracing-appender = "0.2"
uuid = { version = "1", features = ["v4"] }
dirs = "6"
chrono = { version = "0.4", features = ["serde"] }
tokio-util = "0.7"
async-trait = "0.1"
askama = "0.14.0"

[dev-dependencies]
assert_cmd = "2"
predicates = "3"
assert_fs = "1"
tempfile = "3"
rstest = "0.23"
mockall = "0.13"
proptest = "1"
```

## Formatting
`rustfmt.toml` at repo root. Run `cargo +nightly fmt` (import grouping requires nightly).
```toml
edition = "2024"
max_width = 100
tab_spaces = 4
use_small_heuristics = "Max"
imports_granularity = "Crate"
group_imports = "StdExternalCrate"
```

## Linting
Configured in `Cargo.toml`. Run `cargo clippy --all-targets -- -D warnings`.
```toml
[lints.rust]
unsafe_code = "forbid"

[lints.clippy]
all = { level = "deny", priority = -1 }
pedantic = { level = "warn", priority = -1 }
nursery = { level = "warn", priority = -1 }
module_name_repetitions = "allow"
must_use_candidate = "allow"
missing_errors_doc = "allow"
missing_panics_doc = "allow"
```

## Code conventions
- No unnecessary abstractions. Three similar lines > premature helper function.
- Error handling: anyhow for all errors. Use `.context("what you were doing")?` for rich errors.
- No unwrap() in non-test code.
- `unsafe` is forbidden via lint. No exceptions.
- All SQL queries use parameterized statements with `params![]`. Never interpolate.
- Keep modules focused. One responsibility per file.
- The bread emoji (🍞) is used in PR comments and footers as the project's brand mark. No other emojis in code or user-facing output.
- Run `cargo clippy` and `cargo +nightly fmt` before committing.

## Testing
- **Unit tests**: `#[cfg(test)] mod tests` inline in every module with logic.
- **Integration tests**: `tests/` directory using assert_cmd + predicates for CLI commands.
- **Database tests**: `Connection::open_in_memory()` with migrations applied. Real SQLite, no mocks.
- **Async tests**: `#[tokio::test]` for anything async.
- **External CLI mocking**: Define traits for gh/claude interactions, mock with mockall. Never call real CLIs in tests.
- **Property tests**: proptest for config parsing, ID generation, serialization roundtrips.
- **Filesystem tests**: assert_fs or tempfile for temp directories with auto-cleanup.
- **Test runner**: cargo-nextest.
- **Coverage**: cargo-llvm-cov, 85% line coverage minimum.
- **Shared helpers**: `tests/common/mod.rs` for fixtures and builders.

## Database conventions
SQLite with these pragmas on every connection:
```rust
conn.pragma_update(None, "journal_mode", "WAL")?;
conn.pragma_update(None, "synchronous", "NORMAL")?;
conn.pragma_update(None, "busy_timeout", "5000")?;
conn.pragma_update(None, "foreign_keys", "ON")?;
```
Migrations via rusqlite_migration (user_version based, no migration table). Test migrations with `MIGRATIONS.validate()`.

## Process management
- `tokio::process::Command` with `kill_on_drop(true)` for all subprocesses.
- Graceful shutdown: `CancellationToken` from tokio-util combined with `tokio::signal::ctrl_c()`.
- Agent invocation: `claude -p --verbose --output-format stream-json --allowedTools <TOOLS> [--model <MODEL>]`.
- Detached mode: spawn new process with `std::process::Command` (not fork). Write PID to `.oven/oven.pid`.
- Always `wait()` on child processes to prevent zombies.

## CI pipeline
GitHub Actions with dtolnay/rust-toolchain + Swatinem/rust-cache. Run locally with `just ci` (or `just check` for the quick subset without coverage/deny). Jobs:
- `fmt` - nightly rustfmt check
- `clippy` - lint with -D warnings
- `test` - cargo-nextest on stable + MSRV (1.85)
- `coverage` - cargo-llvm-cov with 85% threshold
- `deny` - cargo-deny for license/advisory/source audits

## Releasing
1. Merge the fix/feature PR into `main`.
2. Create a release branch (`release/vX.Y.Z`), bump the version in `Cargo.toml`, commit, push, open PR, and merge.
3. Tag the merge on `main`: `git tag vX.Y.Z && git push origin vX.Y.Z`.
4. The `release.yml` workflow handles the rest: runs CI, publishes to crates.io, and creates a GitHub Release with auto-generated notes.

Do NOT run `cargo publish` manually. The release workflow owns that step.

## Skills (Claude Code skills, not oven commands)
- `/cook` - interactive issue design. Researches the codebase, asks targeted questions, drafts an implementation-ready issue, creates it via `gh issue create` or `oven ticket create` depending on `issue_source` config. Scaffolded into `.claude/skills/` by `oven prep`.
- `/refine` - codebase audit across 6 dimensions (security, errors, patterns, tests, data, dependencies). Produces a prioritized findings report and offers to create issues from critical/high findings. Scaffolded by `oven prep`.

## GitHub Action
JS/TS action in `action/` directory, compiled to a single `dist/index.js` via `@vercel/ncc`. Node 20 runtime (action.yml), Node >= 23 for development (engines in package.json).

- Triggers on `issues: [labeled]` (filters to `o-ready` only) and `workflow_dispatch` with issue number input
- Per-issue concurrency groups prevent duplicate runs
- GitHub App auth via `actions/create-github-app-token` (short-lived tokens, bot identity)
- Installs oven (cargo install or pre-built binary) and claude CLI (npm)
- Runs `oven on <issue-number>`, posts summary comment on the issue
- Outputs: `run-id`, `status`, `cost`, `pr-number`
- SIGTERM handler for graceful shutdown
- See `action/README.md` for consumer setup