chromium-bridge 0.2.1

CLI bridging agents to Chromium-based browsers via Chrome DevTools Protocol
# chromium-bridge — Specification

## Purpose

A Rust CLI that bridges AI agents and scripts to Chromium-based browsers via the Chrome DevTools Protocol (CDP). Provides deterministic, scriptable operations for browser automation without requiring a full MCP server.

## Architecture

```
Agent / Script
chromium-bridge CLI
    ▼ HTTP + WebSocket
CDP endpoint (127.0.0.1:9222)
Brave / Chrome / Chromium
```

### CDP Communication

- **HTTP API** (`/json/*` endpoints): Tab listing, version info, health checks
- **WebSocket API** (`ws://...`): Page-level commands (navigate, evaluate, screenshot)

## Commands

### `check`
Health check. Hits `/json/version`, prints browser name and protocol version. Exit 0 if responding, exit 1 if not.

### `list`
Lists open tabs. Hits `/json/list`, outputs tab title + URL. Supports `--json` for machine-readable output.

### `navigate <url>`
Opens a URL. Creates a new tab via `/json/new?<url>` or navigates the active tab via WebSocket `Page.navigate`.

### `evaluate <expression>`
Runs JavaScript in the active tab via WebSocket `Runtime.evaluate`. Prints the return value to stdout.

### `screenshot [url]`
Captures a screenshot via WebSocket `Page.captureScreenshot`. If URL is provided, navigates first. Outputs PNG to stdout or file via `--output`.

### `markdown <url>`
Navigates to URL, extracts page content, converts to clean markdown. Uses `Readability`-style extraction via injected JS.

### `click <selector>`
Clicks an element by CSS selector. Resolves the element via `DOM.querySelector`, computes the center coordinates from the content quad via `DOM.getBoxModel`, and dispatches real mouse events (mouseMoved, mousePressed, mouseReleased) at those coordinates.

### `type <selector> <text>`
Types text into an element. Focuses the element via CSS selector, then sends text via CDP `Input.insertText`. Double-newlines (`\n\n`) are converted to Shift+Enter keypresses for paragraph breaks in contenteditable fields (LinkedIn, Gmail, Slack, Messenger).

### `select-tab <pattern>`
Activates a browser tab by bringing it to the foreground. Accepts a numeric index or substring pattern matching against URL or title. Uses HTTP `GET /json/activate/{tab_id}`.

### `wait <selector>`
Polls for a CSS selector to appear in the DOM. Default timeout: 10 seconds (`--wait-timeout <ms>`). Polls every 250ms.

### `snapshot`
Dumps the page accessibility tree via CDP `Accessibility.getFullAXTree`. Human-readable output shows `[role] name` for each node. Supports `--depth <n>` to limit tree depth and `--json` for full AXNode array.

### `skill install`
Installs the bundled SKILL.md to the project's `.claude/skills/chromium-bridge/` directory. Detects the project root via git superproject or toplevel. The SKILL.md is embedded at build time via `include_str!()`.

### `setup`
Interactive setup wizard:
1. Detects installed Chromium browsers (Brave, Chrome, Chromium)
2. Checks for `--remote-debugging-port` in the appropriate flags file
3. Verifies the port is responding

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CHROMIUM_BRIDGE_PORT` | `9222` | CDP port |
| `CHROMIUM_BRIDGE_HOST` | `127.0.0.1` | CDP host |

### CLI Flags

- `--port <N>` — override CDP port
- `--host <addr>` — override CDP host
- `--json` — machine-readable JSON output
- `--timeout <ms>` — connection timeout (default: 5000)

## Dependencies

- `tokio` — async runtime
- `reqwest` — HTTP client for CDP REST API
- `cdpkit` — type-safe CDP client (auto-generated typed bindings for all CDP domains)
- `clap` — CLI argument parsing
- `serde` / `serde_json` — JSON serialization
- `base64` — screenshot decoding
- `futures` — event stream subscription

## Error Handling

- Connection refused → "Browser not responding on {host}:{port}. Is remote debugging enabled?"
- WebSocket timeout → "Command timed out after {timeout}ms"
- Invalid JS → Forward CDP error message

## Security

- CDP port binds to localhost only — no external exposure
- No secrets handled by the CLI itself
- Screenshot output goes to local files only