agentchrome 1.51.2

# AgentChrome

**Give your AI agent browser superpowers.**

![CI](https://github.com/Nunley-Media-Group/AgentChrome/actions/workflows/ci.yml/badge.svg)
![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)
![Crates.io](https://img.shields.io/crates/v/agentchrome)

AgentChrome is a native CLI tool for browser automation via the Chrome DevTools Protocol, designed for AI coding agents such as Codex. Every command outputs structured JSON, uses accessibility-tree UIDs for element targeting, and returns typed exit codes for programmatic error handling. No Node.js, no Python, no MCP server — just a fast Rust binary your agent calls from the shell.

## Codex Integration

**1. Install AgentChrome** (see [Installation](#installation) for more options)

```sh
cargo install agentchrome
```

**2. Add AgentChrome guidance to Codex**

```sh
agentchrome skill install --tool codex
```

This writes the AgentChrome Codex skill to `$CODEX_HOME/skills/agentchrome/SKILL.md`, or to `~/.codex/skills/agentchrome/SKILL.md` when `CODEX_HOME` is not set. AgentChrome is self-documenting through `agentchrome --help`, `agentchrome capabilities`, and `agentchrome examples`, so Codex can discover the current CLI surface directly from the installed binary.

For project-local guidance, you can also add the AGENTS template:

```sh
curl -fsSL https://raw.githubusercontent.com/Nunley-Media-Group/AgentChrome/main/examples/AGENTS.md.example > AGENTS.md
```

**3. Ask Codex to use the browser**

> "Check if the login form at localhost:3000 works correctly"

Behind the scenes, Codex will run commands like:

```sh
agentchrome connect --launch --headless
agentchrome navigate http://localhost:3000/login --wait-until networkidle
agentchrome page snapshot
agentchrome form fill-many '[{"uid": "s2", "value": "test@example.com"}, {"uid": "s3", "value": "password123"}]'
agentchrome interact click s4 --include-snapshot
agentchrome console read --errors-only
```

See the full [Codex Integration Guide](docs/codex.md) for workflows, efficiency tips, and error handling patterns.

For agentic tools supported by AgentChrome's built-in skill installer, use `agentchrome skill install` to write the tool-specific guidance file and `agentchrome skill update` after upgrading AgentChrome. Supported targets include Claude Code, Windsurf, Aider, Continue.dev, GitHub Copilot, Cursor, Gemini CLI, and Codex; Codex users can target it explicitly with `agentchrome skill install --tool codex`.

## Features

### Built for AI Agents

- **JSON output by default** — every command returns structured, parseable output
- **Accessibility tree snapshots with UIDs** — `page snapshot` assigns stable UIDs (e.g., `s1`, `s5`) to interactive elements for reliable targeting
- **Structured exit codes** — 0 (success), 1 (general error), 2 (connection error), 3 (target error), 4 (timeout), 5 (protocol error) for programmatic error handling
- **Self-documenting CLI** — `agentchrome capabilities` outputs a machine-readable JSON manifest of every command, flag, and argument
- **`--include-snapshot` on interactions** — get the updated accessibility tree in the same response as a click or form fill, cutting round-trips in half

<details>
<summary><strong>Full Browser Control</strong></summary>

- **Tab management** — list, create, close, and activate browser tabs
- **URL navigation** — navigate to URLs, go back/forward, reload with wait strategies
- **Page inspection** — accessibility trees, text extraction, element search
- **Screenshots** — full-page, viewport, element, or region captures
- **JavaScript execution** — run scripts in page context, return results as JSON
- **User interactions** — click, hover, drag, type, press keys, scroll, coordinate-based drag and decomposed mouse actions
- **Form filling** — fill inputs, select options, upload files, batch fill with `fill-many`, ARIA combobox support
- **Iframe/frame targeting** — list frames, target specific iframes with `--frame` on all commands, cross-origin OOPIF support
- **Media control** — list, play, pause, seek audio/video elements with CSS selector or bulk targeting
- **Page analysis** — structure discovery with iframe detection, framework identification, overlay/blocker detection, and hit testing for click debugging
- **DOM event introspection** — inspect event listeners on any element via CDP
- **Cookie management** — list, set, delete, and clear browser cookies
- **Network monitoring** — list, inspect, and follow requests in real time
- **Console capture** — read and follow console messages with type filtering
- **Device emulation** — mobile devices, network/CPU throttling, geolocation, color scheme
- **Performance tracing** — record traces, analyze insights, measure Core Web Vitals
- **Lighthouse auditing** — run audits returning structured category scores with filtering
- **Dialog handling** — accept, dismiss, or respond to alert/confirm/prompt dialogs

</details>

### Comparison

How AgentChrome stacks up for giving AI agents browser access:

| | AgentChrome | Puppeteer / Playwright | Chrome DevTools MCP |
|---|---|---|---|
| **AI agent integration** | CLI — works with any agent that runs shell commands | Requires JavaScript wrapper | MCP protocol (specific client required) |
| **Runtime** | No Node.js — native Rust binary | Node.js | Node.js |
| **Install** | Single binary, `cargo install` | `npm install` | `npx` |
| **Interface** | CLI / shell scripts | JavaScript API | MCP protocol |
| **Startup time** | < 50ms | ~500ms+ | Varies |
| **Binary size** | < 10 MB | ~100 MB+ (with deps) | Varies |
| **Shell pipelines** | First-class (`| jq`, `| grep`) | Requires wrapper scripts | Not designed for CLI |

## Architecture

```
Agent → CLI (clap) → Command Dispatch → CDP Client (WebSocket) → Chrome (DevTools Protocol)
```

AgentChrome communicates with Chrome using the [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/) (CDP) over WebSocket. Run `agentchrome connect --launch --headless` to start a session — subsequent commands reuse the connection automatically. Native Rust binary, <50ms startup, <10MB on disk.

## Installation

### Cargo install

```sh
cargo install agentchrome
```

### Pre-built binaries

Download the latest release for your platform from [GitHub Releases](https://github.com/Nunley-Media-Group/AgentChrome/releases).

<details>
<summary>Quick install via curl (macOS / Linux)</summary>

```sh
# macOS (Apple Silicon)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-aarch64-apple-darwin.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# macOS (Intel)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-x86_64-apple-darwin.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# Linux (x86_64)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-x86_64-unknown-linux-gnu.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# Linux (ARM64)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-aarch64-unknown-linux-gnu.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/
```

</details>

<details>
<summary>Build from source</summary>

```sh
git clone https://github.com/Nunley-Media-Group/AgentChrome.git
cd AgentChrome
cargo build --release
# Binary is at target/release/agentchrome
```

</details>

### Supported platforms

| Platform | Target | Archive |
|---|---|---|
| macOS (Apple Silicon) | `aarch64-apple-darwin` | `.tar.gz` |
| macOS (Intel) | `x86_64-apple-darwin` | `.tar.gz` |
| Linux (x86_64) | `x86_64-unknown-linux-gnu` | `.tar.gz` |
| Linux (ARM64) | `aarch64-unknown-linux-gnu` | `.tar.gz` |
| Windows (x86_64) | `x86_64-pc-windows-msvc` | `.zip` |

## CLI Quick Start

For shell scripting and manual use:

**1. Install AgentChrome** (see [Installation](#installation) above)

**2. Start Chrome with remote debugging enabled:**

```sh
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

# Or launch headless Chrome directly via AgentChrome:
agentchrome connect --launch --headless
```

**3. Connect to Chrome:**

```sh
agentchrome connect
```

**4. Navigate to a URL:**

```sh
agentchrome navigate https://example.com
```

**5. Inspect the page:**

```sh
agentchrome page snapshot
```

## Command Reference

| Command | Description |
|---|---|
| `connect` | Connect to or launch a Chrome instance |
| `tabs` | Tab management (list, create, close, activate) |
| `navigate` | URL navigation and history |
| `page` | Page inspection (screenshot, text, accessibility tree, find, analyze, hittest, frames, workers) |
| `dom` | DOM inspection, manipulation, and event listener introspection |
| `js` | JavaScript execution in page context |
| `console` | Console message reading and monitoring |
| `network` | Network request monitoring and interception |
| `interact` | Mouse, keyboard, scroll, and coordinate-based drag interactions |
| `form` | Form input, submission, and ARIA combobox support |
| `emulate` | Device and network emulation |
| `perf` | Performance tracing and metrics |
| `cookie` | Browser cookie management (list, set, delete, clear) |
| `dialog` | Browser dialog handling (alert, confirm, prompt, beforeunload) |
| `media` | Media element control (list, play, pause, seek) |
| `audit` | Run audits against the current page (Lighthouse) |
| `diagnose` | Pre-automation challenge scan (iframes, overlays, media gates, frameworks, patterns) |
| `skill` | Agentic tool skill installation and management |
| `config` | Configuration file management (show, init, path) |
| `completions` | Generate shell completion scripts |
| `examples` | Show usage examples for commands |
| `capabilities` | Output a machine-readable manifest of all CLI capabilities |
| `man` | Display man pages for AgentChrome commands |

Run `agentchrome <command> --help` for detailed usage, `agentchrome examples <command>` for practical examples, or `agentchrome capabilities` for the full machine-readable command manifest.

## Session resilience

Long-running automation sessions survive transient WebSocket drops and idle timeouts without manual intervention.

**Auto-reconnect.** When a command's stored `ws_url` is no longer reachable but Chrome is still running on the recorded port, AgentChrome transparently re-discovers the current browser-level WebSocket URL, rewrites the session file (preserving `pid`, `port`, and `active_tab_id`), and retries the command within the same invocation. Diagnostics are stderr-only; stdout stays pure JSON.

**Keep-alive.** While a command holds the CDP session open longer than the keep-alive interval (default `30000` ms), AgentChrome sends a WebSocket Ping frame to prevent idle-timeout disconnects. A missing Pong within 10 s triggers the existing reconnect path.

**Configuration precedence:** `--keepalive-interval <ms>` flag > `AGENTCHROME_KEEPALIVE_INTERVAL` env var > `[keepalive] interval_ms` in `config.toml` > built-in default (30000 ms).

```sh
# Run a long command with a 60-second keep-alive
agentchrome --keepalive-interval 60000 console follow

# Disable keep-alive (--no-keepalive or interval 0)
agentchrome --no-keepalive page snapshot --json
```

**Scripting against connection-loss errors.** When auto-reconnect cannot recover the session, the CLI emits a structured JSON error on stderr with a `kind` discriminator and a `recoverable` boolean:

| Condition | `kind` | `recoverable` | Suggested remediation |
|-----------|--------|---------------|-----------------------|
| Chrome process is gone | `"chrome_terminated"` | `false` | `agentchrome connect --launch` |
| Probe failed but Chrome may still be alive | `"transient"` | `true` | `agentchrome connect` |

Scripts can branch on these fields to decide whether to relaunch Chrome or simply re-discover the connection.

## Usage Examples

<details>
<summary><strong>Page inspection</strong></summary>

```sh
# Get the accessibility tree with element UIDs
agentchrome page snapshot

# Extract visible text content
agentchrome page text

# Find elements by text or role
agentchrome page find "Submit" --role button
```

</details>

<details>
<summary><strong>Form filling</strong></summary>

```sh
# Snapshot to discover form field UIDs
agentchrome page snapshot

# Fill multiple fields at once (returns updated snapshot)
agentchrome form fill-many --include-snapshot \
  '[{"uid": "s5", "value": "hello@example.com"}, {"uid": "s8", "value": "MyPassword123"}]'

# Or fill fields individually
agentchrome form fill s5 "hello@example.com"

# Click the submit button
agentchrome interact click s10
```

</details>

<details>
<summary><strong>Screenshots</strong></summary>

```sh
# Viewport screenshot
agentchrome page screenshot --file screenshot.png

# Full-page screenshot
agentchrome page screenshot --full-page --file full-page.png
```

</details>

<details>
<summary><strong>JavaScript execution</strong></summary>

```sh
# Run a JavaScript expression and get the result
agentchrome js exec "document.title"

# Run JavaScript from a file
agentchrome js exec --file script.js
```

</details>

<details>
<summary><strong>Network monitoring</strong></summary>

```sh
# List recent network requests
agentchrome network list

# Filter requests by URL pattern
agentchrome network list --filter "api"

# Get details for a specific request
agentchrome network get <request-id>
```

</details>

<details>
<summary><strong>Pre-automation diagnostics</strong></summary>

```sh
# Scan a new URL for automation challenges before you begin
agentchrome diagnose https://example.com/course

# Diagnose the already-loaded page in place (no navigation)
agentchrome diagnose --current

# Extract strategy suggestions for each matched pattern
agentchrome diagnose --current | jq -r '.patterns[].suggestion'

# Check if the page is straightforward to automate
agentchrome diagnose --current | jq '.summary.straightforward'
```

</details>

<details>
<summary><strong>Interaction strategy guides</strong></summary>

```sh
# List all scenario-based interaction strategy guides
agentchrome examples strategies

# Show the full guide for iframe automation
agentchrome examples strategies iframes

# Get all strategy guides as JSON (for AI agent discovery)
agentchrome examples strategies --json

# Get the full SCORM/LMS strategy as JSON
agentchrome examples strategies scorm --json
```

Available strategies: `iframes`, `overlays`, `scorm`, `drag-and-drop`, `shadow-dom`, `spa-navigation-waits`, `react-controlled-inputs`, `debugging-failed-interactions`, `authentication-cookie-reuse`, `multi-tab-workflows`.

</details>

## Related Projects

- **Chrome DevTools MCP servers** — If you need MCP-based browser control rather than a CLI tool, use a CDP-backed MCP server.

## Contributing

All contributions must follow the [NMG-SDLC](https://github.com/Nunley-Media-Group/nmg-plugins) workflow without deviation. NMG-SDLC is a BDD spec-driven development toolkit that enforces a structured delivery lifecycle: issue creation, specification writing, implementation, verification, and PR creation. Contributions that bypass the SDLC process will not be accepted.

### Prerequisites

- [Rust](https://rustup.rs/) 1.93.0 or later (pinned via `rust-toolchain.toml`)
- Chrome or Chromium (for integration testing)
- Codex with the [NMG-SDLC plugin](https://github.com/Nunley-Media-Group/nmg-plugins) installed

### Build and test

```sh
git clone https://github.com/Nunley-Media-Group/AgentChrome.git
cd AgentChrome

# Build
cargo build

# Run tests
cargo test

# Lint
cargo clippy -- -D warnings
cargo fmt --check

# Generate man pages
cargo xtask man
```

### Code style

This project uses strict Clippy configuration (`all = "deny"`, `pedantic = "warn"`) and rustfmt with the 2024 edition. All warnings must be resolved before merging.

## License

Licensed under either of [MIT License](LICENSE-MIT) or [Apache License, Version 2.0](LICENSE-APACHE) at your option.