AgentChrome
Give your AI agent browser superpowers.
AgentChrome is a native CLI tool for browser automation via the Chrome DevTools Protocol, designed for AI coding agents such as Codex. Every command outputs structured JSON, uses accessibility-tree UIDs for element targeting, and returns typed exit codes for programmatic error handling. No Node.js, no Python, no MCP server — just a fast Rust binary your agent calls from the shell.
Codex Integration
1. Install AgentChrome (see Installation for more options)
2. Add AgentChrome guidance to Codex
This writes the AgentChrome Codex skill to $CODEX_HOME/skills/agentchrome/SKILL.md, or to ~/.codex/skills/agentchrome/SKILL.md when CODEX_HOME is not set. AgentChrome is self-documenting through agentchrome --help, agentchrome capabilities, and agentchrome examples, so Codex can discover the current CLI surface directly from the installed binary.
For project-local guidance, you can also add the AGENTS template:
3. Ask Codex to use the browser
"Check if the login form at localhost:3000 works correctly"
Behind the scenes, Codex will run commands like:
See the full Codex Integration Guide for workflows, efficiency tips, and error handling patterns.
For agentic tools supported by AgentChrome's built-in skill installer, use agentchrome skill install to write the tool-specific guidance file and agentchrome skill update after upgrading AgentChrome. Supported targets include Claude Code, Windsurf, Aider, Continue.dev, GitHub Copilot, Cursor, Gemini CLI, and Codex; Codex users can target it explicitly with agentchrome skill install --tool codex.
Features
Built for AI Agents
- JSON output by default — every command returns structured, parseable output
- Accessibility tree snapshots with UIDs —
page snapshotassigns stable UIDs (e.g.,s1,s5) to interactive elements for reliable targeting - Structured exit codes — 0 (success), 1 (general error), 2 (connection error), 3 (target error), 4 (timeout), 5 (protocol error) for programmatic error handling
- Self-documenting CLI —
agentchrome capabilitiesoutputs a machine-readable JSON manifest of every command, flag, and argument --include-snapshoton interactions — get the updated accessibility tree in the same response as a click or form fill, cutting round-trips in half
- Tab management — list, create, close, and activate browser tabs
- URL navigation — navigate to URLs, go back/forward, reload with wait strategies
- Page inspection — accessibility trees, text extraction, element search
- Screenshots — full-page, viewport, element, or region captures
- JavaScript execution — run scripts in page context, return results as JSON
- User interactions — click, hover, drag, type, press keys, scroll, coordinate-based drag and decomposed mouse actions
- Form filling — fill inputs, select options, upload files, batch fill with
fill-many, ARIA combobox support - Iframe/frame targeting — list frames, target specific iframes with
--frameon all commands, cross-origin OOPIF support - Media control — list, play, pause, seek audio/video elements with CSS selector or bulk targeting
- Page analysis — structure discovery with iframe detection, framework identification, overlay/blocker detection, and hit testing for click debugging
- DOM event introspection — inspect event listeners on any element via CDP
- Cookie management — list, set, delete, and clear browser cookies
- Network monitoring — list, inspect, and follow requests in real time
- Console capture — read and follow console messages with type filtering
- Device emulation — mobile devices, network/CPU throttling, geolocation, color scheme
- Performance tracing — record traces, analyze insights, measure Core Web Vitals
- Lighthouse auditing — run audits returning structured category scores with filtering
- Dialog handling — accept, dismiss, or respond to alert/confirm/prompt dialogs
Comparison
How AgentChrome stacks up for giving AI agents browser access:
| AgentChrome | Puppeteer / Playwright | Chrome DevTools MCP | |
|---|---|---|---|
| AI agent integration | CLI — works with any agent that runs shell commands | Requires JavaScript wrapper | MCP protocol (specific client required) |
| Runtime | No Node.js — native Rust binary | Node.js | Node.js |
| Install | Single binary, cargo install |
npm install |
npx |
| Interface | CLI / shell scripts | JavaScript API | MCP protocol |
| Startup time | < 50ms | ~500ms+ | Varies |
| Binary size | < 10 MB | ~100 MB+ (with deps) | Varies |
| Shell pipelines | First-class (` | jq, |
grep`) |
Architecture
Agent → CLI (clap) → Command Dispatch → CDP Client (WebSocket) → Chrome (DevTools Protocol)
AgentChrome communicates with Chrome using the Chrome DevTools Protocol (CDP) over WebSocket. Run agentchrome connect --launch --headless to start a session — subsequent commands reuse the connection automatically. Native Rust binary, <50ms startup, <10MB on disk.
Installation
Cargo install
Pre-built binaries
Download the latest release for your platform from GitHub Releases.
# macOS (Apple Silicon)
| &&
# macOS (Intel)
| &&
# Linux (x86_64)
| &&
# Linux (ARM64)
| &&
# Binary is at target/release/agentchrome
Supported platforms
| Platform | Target | Archive |
|---|---|---|
| macOS (Apple Silicon) | aarch64-apple-darwin |
.tar.gz |
| macOS (Intel) | x86_64-apple-darwin |
.tar.gz |
| Linux (x86_64) | x86_64-unknown-linux-gnu |
.tar.gz |
| Linux (ARM64) | aarch64-unknown-linux-gnu |
.tar.gz |
| Windows (x86_64) | x86_64-pc-windows-msvc |
.zip |
CLI Quick Start
For shell scripting and manual use:
1. Install AgentChrome (see Installation above)
2. Start Chrome with remote debugging enabled:
# macOS
# Linux
# Or launch headless Chrome directly via AgentChrome:
3. Connect to Chrome:
4. Navigate to a URL:
5. Inspect the page:
Command Reference
| Command | Description |
|---|---|
connect |
Connect to or launch a Chrome instance |
tabs |
Tab management (list, create, close, activate) |
navigate |
URL navigation and history |
page |
Page inspection (screenshot, text, accessibility tree, find, analyze, hittest, frames, workers) |
dom |
DOM inspection, manipulation, and event listener introspection |
js |
JavaScript execution in page context |
console |
Console message reading and monitoring |
network |
Network request monitoring and interception |
interact |
Mouse, keyboard, scroll, and coordinate-based drag interactions |
form |
Form input, submission, and ARIA combobox support |
emulate |
Device and network emulation |
perf |
Performance tracing and metrics |
cookie |
Browser cookie management (list, set, delete, clear) |
dialog |
Browser dialog handling (alert, confirm, prompt, beforeunload) |
media |
Media element control (list, play, pause, seek) |
audit |
Run audits against the current page (Lighthouse) |
diagnose |
Pre-automation challenge scan (iframes, overlays, media gates, frameworks, patterns) |
skill |
Agentic tool skill installation and management |
config |
Configuration file management (show, init, path) |
completions |
Generate shell completion scripts |
examples |
Show usage examples for commands |
capabilities |
Output a machine-readable manifest of all CLI capabilities |
man |
Display man pages for AgentChrome commands |
Run agentchrome <command> --help for detailed usage, agentchrome examples <command> for practical examples, or agentchrome capabilities for the full machine-readable command manifest.
Session resilience
Long-running automation sessions survive transient WebSocket drops and idle timeouts without manual intervention.
Auto-reconnect. When a command's stored ws_url is no longer reachable but Chrome is still running on the recorded port, AgentChrome transparently re-discovers the current browser-level WebSocket URL, rewrites the session file (preserving pid, port, and active_tab_id), and retries the command within the same invocation. Diagnostics are stderr-only; stdout stays pure JSON.
Keep-alive. While a command holds the CDP session open longer than the keep-alive interval (default 30000 ms), AgentChrome sends a WebSocket Ping frame to prevent idle-timeout disconnects. A missing Pong within 10 s triggers the existing reconnect path.
Configuration precedence: --keepalive-interval <ms> flag > AGENTCHROME_KEEPALIVE_INTERVAL env var > [keepalive] interval_ms in config.toml > built-in default (30000 ms).
# Run a long command with a 60-second keep-alive
# Disable keep-alive (--no-keepalive or interval 0)
Scripting against connection-loss errors. When auto-reconnect cannot recover the session, the CLI emits a structured JSON error on stderr with a kind discriminator and a recoverable boolean:
| Condition | kind |
recoverable |
Suggested remediation |
|---|---|---|---|
| Chrome process is gone | "chrome_terminated" |
false |
agentchrome connect --launch |
| Probe failed but Chrome may still be alive | "transient" |
true |
agentchrome connect |
Scripts can branch on these fields to decide whether to relaunch Chrome or simply re-discover the connection.
Usage Examples
# Get the accessibility tree with element UIDs
# Extract visible text content
# Find elements by text or role
# Snapshot to discover form field UIDs
# Fill multiple fields at once (returns updated snapshot)
# Or fill fields individually
# Click the submit button
# Viewport screenshot
# Full-page screenshot
# Run a JavaScript expression and get the result
# Run JavaScript from a file
# List recent network requests
# Filter requests by URL pattern
# Get details for a specific request
# Scan a new URL for automation challenges before you begin
# Diagnose the already-loaded page in place (no navigation)
# Extract strategy suggestions for each matched pattern
|
# Check if the page is straightforward to automate
|
# List all scenario-based interaction strategy guides
# Show the full guide for iframe automation
# Get all strategy guides as JSON (for AI agent discovery)
# Get the full SCORM/LMS strategy as JSON
Available strategies: iframes, overlays, scorm, drag-and-drop, shadow-dom, spa-navigation-waits, react-controlled-inputs, debugging-failed-interactions, authentication-cookie-reuse, multi-tab-workflows.
Related Projects
- Chrome DevTools MCP servers — If you need MCP-based browser control rather than a CLI tool, use a CDP-backed MCP server.
Contributing
All contributions must follow the NMG-SDLC workflow without deviation. NMG-SDLC is a BDD spec-driven development toolkit that enforces a structured delivery lifecycle: issue creation, specification writing, implementation, verification, and PR creation. Contributions that bypass the SDLC process will not be accepted.
Prerequisites
- Rust 1.85.0 or later (pinned via
rust-toolchain.toml) - Chrome or Chromium (for integration testing)
- Codex with the NMG-SDLC plugin installed
Build and test
# Build
# Run tests
# Lint
# Generate man pages
Code style
This project uses strict Clippy configuration (all = "deny", pedantic = "warn") and rustfmt with the 2024 edition. All warnings must be resolved before merging.
License
Licensed under either of MIT License or Apache License, Version 2.0 at your option.