agentchrome 1.51.2

A CLI tool for browser automation via the Chrome DevTools Protocol
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
# AgentChrome

**Give your AI agent browser superpowers.**

![CI](https://github.com/Nunley-Media-Group/AgentChrome/actions/workflows/ci.yml/badge.svg)
![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)
![Crates.io](https://img.shields.io/crates/v/agentchrome)

AgentChrome is a native CLI tool for browser automation via the Chrome DevTools Protocol, designed for AI coding agents such as Codex. Every command outputs structured JSON, uses accessibility-tree UIDs for element targeting, and returns typed exit codes for programmatic error handling. No Node.js, no Python, no MCP server — just a fast Rust binary your agent calls from the shell.

## Codex Integration

**1. Install AgentChrome** (see [Installation](#installation) for more options)

```sh
cargo install agentchrome
```

**2. Add AgentChrome guidance to Codex**

```sh
agentchrome skill install --tool codex
```

This writes the AgentChrome Codex skill to `$CODEX_HOME/skills/agentchrome/SKILL.md`, or to `~/.codex/skills/agentchrome/SKILL.md` when `CODEX_HOME` is not set. AgentChrome is self-documenting through `agentchrome --help`, `agentchrome capabilities`, and `agentchrome examples`, so Codex can discover the current CLI surface directly from the installed binary.

For project-local guidance, you can also add the AGENTS template:

```sh
curl -fsSL https://raw.githubusercontent.com/Nunley-Media-Group/AgentChrome/main/examples/AGENTS.md.example > AGENTS.md
```

**3. Ask Codex to use the browser**

> "Check if the login form at localhost:3000 works correctly"

Behind the scenes, Codex will run commands like:

```sh
agentchrome connect --launch --headless
agentchrome navigate http://localhost:3000/login --wait-until networkidle
agentchrome page snapshot
agentchrome form fill-many '[{"uid": "s2", "value": "test@example.com"}, {"uid": "s3", "value": "password123"}]'
agentchrome interact click s4 --include-snapshot
agentchrome console read --errors-only
```

See the full [Codex Integration Guide](docs/codex.md) for workflows, efficiency tips, and error handling patterns.

For agentic tools supported by AgentChrome's built-in skill installer, use `agentchrome skill install` to write the tool-specific guidance file and `agentchrome skill update` after upgrading AgentChrome. Supported targets include Claude Code, Windsurf, Aider, Continue.dev, GitHub Copilot, Cursor, Gemini CLI, and Codex; Codex users can target it explicitly with `agentchrome skill install --tool codex`.

## Features

### Built for AI Agents

- **JSON output by default** — every command returns structured, parseable output
- **Accessibility tree snapshots with UIDs**`page snapshot` assigns stable UIDs (e.g., `s1`, `s5`) to interactive elements for reliable targeting
- **Structured exit codes** — 0 (success), 1 (general error), 2 (connection error), 3 (target error), 4 (timeout), 5 (protocol error) for programmatic error handling
- **Self-documenting CLI**`agentchrome capabilities` outputs a machine-readable JSON manifest of every command, flag, and argument
- **`--include-snapshot` on interactions** — get the updated accessibility tree in the same response as a click or form fill, cutting round-trips in half

<details>
<summary><strong>Full Browser Control</strong></summary>

- **Tab management** — list, create, close, and activate browser tabs
- **URL navigation** — navigate to URLs, go back/forward, reload with wait strategies
- **Page inspection** — accessibility trees, text extraction, element search
- **Screenshots** — full-page, viewport, element, or region captures
- **JavaScript execution** — run scripts in page context, return results as JSON
- **User interactions** — click, hover, drag, type, press keys, scroll, coordinate-based drag and decomposed mouse actions
- **Form filling** — fill inputs, select options, upload files, batch fill with `fill-many`, ARIA combobox support
- **Iframe/frame targeting** — list frames, target specific iframes with `--frame` on all commands, cross-origin OOPIF support
- **Media control** — list, play, pause, seek audio/video elements with CSS selector or bulk targeting
- **Page analysis** — structure discovery with iframe detection, framework identification, overlay/blocker detection, and hit testing for click debugging
- **DOM event introspection** — inspect event listeners on any element via CDP
- **Cookie management** — list, set, delete, and clear browser cookies
- **Network monitoring** — list, inspect, and follow requests in real time
- **Console capture** — read and follow console messages with type filtering
- **Device emulation** — mobile devices, network/CPU throttling, geolocation, color scheme
- **Performance tracing** — record traces, analyze insights, measure Core Web Vitals
- **Lighthouse auditing** — run audits returning structured category scores with filtering
- **Dialog handling** — accept, dismiss, or respond to alert/confirm/prompt dialogs

</details>

### Comparison

How AgentChrome stacks up for giving AI agents browser access:

| | AgentChrome | Puppeteer / Playwright | Chrome DevTools MCP |
|---|---|---|---|
| **AI agent integration** | CLI — works with any agent that runs shell commands | Requires JavaScript wrapper | MCP protocol (specific client required) |
| **Runtime** | No Node.js — native Rust binary | Node.js | Node.js |
| **Install** | Single binary, `cargo install` | `npm install` | `npx` |
| **Interface** | CLI / shell scripts | JavaScript API | MCP protocol |
| **Startup time** | < 50ms | ~500ms+ | Varies |
| **Binary size** | < 10 MB | ~100 MB+ (with deps) | Varies |
| **Shell pipelines** | First-class (`| jq`, `| grep`) | Requires wrapper scripts | Not designed for CLI |

## Architecture

```
Agent → CLI (clap) → Command Dispatch → CDP Client (WebSocket) → Chrome (DevTools Protocol)
```

AgentChrome communicates with Chrome using the [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/) (CDP) over WebSocket. Run `agentchrome connect --launch --headless` to start a session — subsequent commands reuse the connection automatically. Native Rust binary, <50ms startup, <10MB on disk.

## Installation

### Cargo install

```sh
cargo install agentchrome
```

### Pre-built binaries

Download the latest release for your platform from [GitHub Releases](https://github.com/Nunley-Media-Group/AgentChrome/releases).

<details>
<summary>Quick install via curl (macOS / Linux)</summary>

```sh
# macOS (Apple Silicon)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-aarch64-apple-darwin.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# macOS (Intel)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-x86_64-apple-darwin.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# Linux (x86_64)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-x86_64-unknown-linux-gnu.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/

# Linux (ARM64)
curl -fsSL https://github.com/Nunley-Media-Group/AgentChrome/releases/latest/download/agentchrome-aarch64-unknown-linux-gnu.tar.gz \
  | tar xz && mv agentchrome-*/agentchrome /usr/local/bin/
```

</details>

<details>
<summary>Build from source</summary>

```sh
git clone https://github.com/Nunley-Media-Group/AgentChrome.git
cd AgentChrome
cargo build --release
# Binary is at target/release/agentchrome
```

</details>

### Supported platforms

| Platform | Target | Archive |
|---|---|---|
| macOS (Apple Silicon) | `aarch64-apple-darwin` | `.tar.gz` |
| macOS (Intel) | `x86_64-apple-darwin` | `.tar.gz` |
| Linux (x86_64) | `x86_64-unknown-linux-gnu` | `.tar.gz` |
| Linux (ARM64) | `aarch64-unknown-linux-gnu` | `.tar.gz` |
| Windows (x86_64) | `x86_64-pc-windows-msvc` | `.zip` |

## CLI Quick Start

For shell scripting and manual use:

**1. Install AgentChrome** (see [Installation](#installation) above)

**2. Start Chrome with remote debugging enabled:**

```sh
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

# Or launch headless Chrome directly via AgentChrome:
agentchrome connect --launch --headless
```

**3. Connect to Chrome:**

```sh
agentchrome connect
```

**4. Navigate to a URL:**

```sh
agentchrome navigate https://example.com
```

**5. Inspect the page:**

```sh
agentchrome page snapshot
```

## Command Reference

| Command | Description |
|---|---|
| `connect` | Connect to or launch a Chrome instance |
| `tabs` | Tab management (list, create, close, activate) |
| `navigate` | URL navigation and history |
| `page` | Page inspection (screenshot, text, accessibility tree, find, analyze, hittest, frames, workers) |
| `dom` | DOM inspection, manipulation, and event listener introspection |
| `js` | JavaScript execution in page context |
| `console` | Console message reading and monitoring |
| `network` | Network request monitoring and interception |
| `interact` | Mouse, keyboard, scroll, and coordinate-based drag interactions |
| `form` | Form input, submission, and ARIA combobox support |
| `emulate` | Device and network emulation |
| `perf` | Performance tracing and metrics |
| `cookie` | Browser cookie management (list, set, delete, clear) |
| `dialog` | Browser dialog handling (alert, confirm, prompt, beforeunload) |
| `media` | Media element control (list, play, pause, seek) |
| `audit` | Run audits against the current page (Lighthouse) |
| `diagnose` | Pre-automation challenge scan (iframes, overlays, media gates, frameworks, patterns) |
| `skill` | Agentic tool skill installation and management |
| `config` | Configuration file management (show, init, path) |
| `completions` | Generate shell completion scripts |
| `examples` | Show usage examples for commands |
| `capabilities` | Output a machine-readable manifest of all CLI capabilities |
| `man` | Display man pages for AgentChrome commands |

Run `agentchrome <command> --help` for detailed usage, `agentchrome examples <command>` for practical examples, or `agentchrome capabilities` for the full machine-readable command manifest.

## Session resilience

Long-running automation sessions survive transient WebSocket drops and idle timeouts without manual intervention.

**Auto-reconnect.** When a command's stored `ws_url` is no longer reachable but Chrome is still running on the recorded port, AgentChrome transparently re-discovers the current browser-level WebSocket URL, rewrites the session file (preserving `pid`, `port`, and `active_tab_id`), and retries the command within the same invocation. Diagnostics are stderr-only; stdout stays pure JSON.

**Keep-alive.** While a command holds the CDP session open longer than the keep-alive interval (default `30000` ms), AgentChrome sends a WebSocket Ping frame to prevent idle-timeout disconnects. A missing Pong within 10 s triggers the existing reconnect path.

**Configuration precedence:** `--keepalive-interval <ms>` flag > `AGENTCHROME_KEEPALIVE_INTERVAL` env var > `[keepalive] interval_ms` in `config.toml` > built-in default (30000 ms).

```sh
# Run a long command with a 60-second keep-alive
agentchrome --keepalive-interval 60000 console follow

# Disable keep-alive (--no-keepalive or interval 0)
agentchrome --no-keepalive page snapshot --json
```

**Scripting against connection-loss errors.** When auto-reconnect cannot recover the session, the CLI emits a structured JSON error on stderr with a `kind` discriminator and a `recoverable` boolean:

| Condition | `kind` | `recoverable` | Suggested remediation |
|-----------|--------|---------------|-----------------------|
| Chrome process is gone | `"chrome_terminated"` | `false` | `agentchrome connect --launch` |
| Probe failed but Chrome may still be alive | `"transient"` | `true` | `agentchrome connect` |

Scripts can branch on these fields to decide whether to relaunch Chrome or simply re-discover the connection.

## Usage Examples

<details>
<summary><strong>Page inspection</strong></summary>

```sh
# Get the accessibility tree with element UIDs
agentchrome page snapshot

# Extract visible text content
agentchrome page text

# Find elements by text or role
agentchrome page find "Submit" --role button
```

</details>

<details>
<summary><strong>Form filling</strong></summary>

```sh
# Snapshot to discover form field UIDs
agentchrome page snapshot

# Fill multiple fields at once (returns updated snapshot)
agentchrome form fill-many --include-snapshot \
  '[{"uid": "s5", "value": "hello@example.com"}, {"uid": "s8", "value": "MyPassword123"}]'

# Or fill fields individually
agentchrome form fill s5 "hello@example.com"

# Click the submit button
agentchrome interact click s10
```

</details>

<details>
<summary><strong>Screenshots</strong></summary>

```sh
# Viewport screenshot
agentchrome page screenshot --file screenshot.png

# Full-page screenshot
agentchrome page screenshot --full-page --file full-page.png
```

</details>

<details>
<summary><strong>JavaScript execution</strong></summary>

```sh
# Run a JavaScript expression and get the result
agentchrome js exec "document.title"

# Run JavaScript from a file
agentchrome js exec --file script.js
```

</details>

<details>
<summary><strong>Network monitoring</strong></summary>

```sh
# List recent network requests
agentchrome network list

# Filter requests by URL pattern
agentchrome network list --filter "api"

# Get details for a specific request
agentchrome network get <request-id>
```

</details>

<details>
<summary><strong>Pre-automation diagnostics</strong></summary>

```sh
# Scan a new URL for automation challenges before you begin
agentchrome diagnose https://example.com/course

# Diagnose the already-loaded page in place (no navigation)
agentchrome diagnose --current

# Extract strategy suggestions for each matched pattern
agentchrome diagnose --current | jq -r '.patterns[].suggestion'

# Check if the page is straightforward to automate
agentchrome diagnose --current | jq '.summary.straightforward'
```

</details>

<details>
<summary><strong>Interaction strategy guides</strong></summary>

```sh
# List all scenario-based interaction strategy guides
agentchrome examples strategies

# Show the full guide for iframe automation
agentchrome examples strategies iframes

# Get all strategy guides as JSON (for AI agent discovery)
agentchrome examples strategies --json

# Get the full SCORM/LMS strategy as JSON
agentchrome examples strategies scorm --json
```

Available strategies: `iframes`, `overlays`, `scorm`, `drag-and-drop`, `shadow-dom`, `spa-navigation-waits`, `react-controlled-inputs`, `debugging-failed-interactions`, `authentication-cookie-reuse`, `multi-tab-workflows`.

</details>

## Related Projects

- **Chrome DevTools MCP servers** — If you need MCP-based browser control rather than a CLI tool, use a CDP-backed MCP server.

## Contributing

All contributions must follow the [NMG-SDLC](https://github.com/Nunley-Media-Group/nmg-plugins) workflow without deviation. NMG-SDLC is a BDD spec-driven development toolkit that enforces a structured delivery lifecycle: issue creation, specification writing, implementation, verification, and PR creation. Contributions that bypass the SDLC process will not be accepted.

### Prerequisites

- [Rust]https://rustup.rs/ 1.93.0 or later (pinned via `rust-toolchain.toml`)
- Chrome or Chromium (for integration testing)
- Codex with the [NMG-SDLC plugin]https://github.com/Nunley-Media-Group/nmg-plugins installed

### Build and test

```sh
git clone https://github.com/Nunley-Media-Group/AgentChrome.git
cd AgentChrome

# Build
cargo build

# Run tests
cargo test

# Lint
cargo clippy -- -D warnings
cargo fmt --check

# Generate man pages
cargo xtask man
```

### Code style

This project uses strict Clippy configuration (`all = "deny"`, `pedantic = "warn"`) and rustfmt with the 2024 edition. All warnings must be resolved before merging.

## License

Licensed under either of [MIT License](LICENSE-MIT) or [Apache License, Version 2.0](LICENSE-APACHE) at your option.