# tauri-plugin-agent-control
[](https://opensource.org/licenses/MIT)
[](https://v2.tauri.app)
**Dev-only HTTP bridge for Tauri webviews — like Chrome DevTools Protocol, but for Tauri.**
## What it does
`tauri-plugin-agent-control` exposes your Tauri webview over a local HTTP API at `http://localhost:9876`. You can snapshot the DOM, click buttons, fill inputs, evaluate JavaScript, intercept network requests, take screenshots, and more — all via simple `curl` commands or any HTTP client. The server only runs in debug builds, so it's completely stripped from production.
## Installation
### 1. Add the Rust dependency
**From crates.io:**
```toml
[dependencies]
tauri-plugin-agent-control = "0.1"
```
**Or from git:**
```toml
[dependencies]
tauri-plugin-agent-control = { git = "https://github.com/palanik1/tauri-plugin-agent-control" }
```
### 2. Register the plugin
In your `src-tauri/src/lib.rs` (where `tauri::Builder` is configured):
```rust
tauri::Builder::default()
.plugin(tauri_plugin_agent_control::init())
// ... your other plugins and handlers
.run(tauri::generate_context!())
.expect("error while running tauri application");
```
### 3. Add capability permissions
In your `src-tauri/capabilities/default.json` (create it if it doesn't exist):
```json
{
"identifier": "default",
"windows": ["main"],
"permissions": [
"agent-control:default"
]
}
```
> `agent-control:default` grants `allow-agent-control-respond` — the only Tauri command the plugin registers. It allows the JS shim to send eval results back to the Rust HTTP server.
### 4. Run in dev mode
```bash
cargo tauri dev
```
The HTTP bridge starts automatically on `http://localhost:9876` in debug builds. Verify it's running:
```bash
curl -s http://localhost:9876/health
# {"ok":true}
```
> **Production builds:** The server is completely stripped — it's gated behind `#[cfg(debug_assertions)]`. No code, no port, no surface area.
## AI Agent Setup
### Install the Amp/AI agent skill
Copy the `SKILL.md` from this package into your project's skill directory:
```bash
# Project-level (recommended)
mkdir -p .agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md
# Or user-level (available across all projects)
mkdir -p ~/.config/agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md ~/.config/agents/skills/tauri-agent-control/SKILL.md
```
Or install via npm and copy:
```bash
npm install tauri-plugin-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md
```
Once installed, your AI agent (Amp, Claude Code, etc.) can interact with your running Tauri app via natural language:
- *"Take a screenshot of the app"*
- *"Click the Submit button"*
- *"Fill in the email field with test@example.com"*
- *"Check if the sidebar is visible"*
The agent uses the skill to translate these into the appropriate `curl` commands to the HTTP bridge.
## Quick Start
**1. Snapshot the page to see what's on screen:**
```bash
curl http://localhost:9876/snapshot?format=compact
```
```
[page] My App — http://localhost:1420/
@e1 [button] "Submit" role="button"
@e2 [input type="text"] placeholder="Enter name" name="username"
@e3 [a] "Home" href="/"
```
**2. Interact with elements using their refs:**
```bash
# Fill an input
curl -X POST http://localhost:9876/fill \
-d '{"ref":"@e2","text":"hello world"}'
# Click a button
curl -X POST http://localhost:9876/click \
-d '{"ref":"@e1"}'
```
**3. Re-snapshot to see the updated state:**
```bash
curl http://localhost:9876/snapshot?format=compact
```
**4. Evaluate arbitrary JavaScript:**
```bash
curl -X POST http://localhost:9876/eval \
-d '{"code":"document.title"}'
```
## API Reference
### Snapshot
| `GET` | `/snapshot` | `?format=compact\|json` `&scope=<selector>` `&depth=<n>` `&interactive=true\|false` | Snapshot visible elements. `compact` returns a text tree, `json` returns full element data. `interactive=false` includes all elements, not just interactive ones. |
### Interactions
| `POST` | `/click` | `{"ref":"@e1"}` | Click an element |
| `POST` | `/dblclick` | `{"ref":"@e1"}` | Double-click an element |
| `POST` | `/hover` | `{"ref":"@e1"}` | Hover over an element (mouseenter + mouseover) |
| `POST` | `/focus` | `{"ref":"@e1"}` | Focus an element |
| `POST` | `/fill` | `{"ref":"@e1","text":"value"}` | Set input/textarea value directly (fires input + change) |
| `POST` | `/type` | `{"ref":"@e1","text":"value"}` | Type text character-by-character (fires keydown/keypress/input/keyup per char) |
| `POST` | `/press` | `{"key":"Enter"}` | Press a key. Supports modifiers: `"Control+a"`, `"Meta+Shift+z"` |
| `POST` | `/check` | `{"ref":"@e1"}` | Check a checkbox |
| `POST` | `/uncheck` | `{"ref":"@e1"}` | Uncheck a checkbox |
| `POST` | `/select` | `{"ref":"@e1","values":["opt1","opt2"]}` | Select options in a `<select>` element |
| `POST` | `/scroll` | `{"direction":"down","amount":500}` | Scroll the page. Optional `"selector"` to scroll a container. Directions: `up`, `down`, `left`, `right` |
| `POST` | `/scrollintoview` | `{"ref":"@e1"}` | Scroll an element into view (smooth, centered) |
| `POST` | `/drag` | `{"from":"@e1","to":"@e2"}` | Drag and drop between two elements |
| `POST` | `/upload` | `{"ref":"@e1","dataUrl":"data:..."}` | Upload a file to a file input via base64 data URL |
### Getters
| `GET` | `/get/title` | Get `document.title` |
| `GET` | `/get/url` | Get `location.href` |
| `GET` | `/get/text/{ref}` | Get `textContent` of an element |
| `GET` | `/get/html/{ref}` | Get `innerHTML` of an element |
| `GET` | `/get/value/{ref}` | Get `value` of an input/textarea |
| `GET` | `/get/attr/{ref}/{attr}` | Get an attribute value |
| `GET` | `/get/styles/{ref}` | Get computed styles (font, color, display, position, etc.) |
| `GET` | `/get/box/{ref}` | Get bounding box `{x, y, width, height}` |
| `GET` | `/get/count/{selector}` | Count elements matching a CSS selector (URL-encode the selector) |
### State Checks
| `GET` | `/is/visible/{ref}` | Check if element is visible |
| `GET` | `/is/enabled/{ref}` | Check if element is enabled (not disabled) |
| `GET` | `/is/checked/{ref}` | Check if checkbox/radio is checked |
### Wait
| `POST` | `/wait` | `{"ms":1000}` | Wait for a fixed delay |
| `POST` | `/wait` | `{"selector":".loaded"}` | Wait for a CSS selector to appear |
| `POST` | `/wait` | `{"text":"Success"}` | Wait for text to appear on the page |
| `POST` | `/wait` | `{"url":"**/dashboard"}` | Wait for URL to match (glob patterns supported) |
| `POST` | `/wait` | `{"fn":"document.querySelector('.x')"}` | Wait for a JS expression to be truthy |
| `POST` | `/wait` | `{"selector":".x","timeout":10000}` | Custom timeout (default: 5000ms) |
### Eval
| `POST` | `/eval` | `{"code":"document.title"}` | Evaluate arbitrary JavaScript and return the result |
### Console / Errors
| `GET` | `/console` | Get and drain captured console logs (log, warn, error, info, debug) |
| `GET` | `/errors` | Get and drain captured console errors only |
### Cookies
| `GET` | `/cookies` | — | Get all cookies as `{name: value}` |
| `POST` | `/cookies/set` | `{"name":"x","value":"y"}` | Set a cookie |
| `POST` | `/cookies/clear` | — | Clear all cookies |
### Storage
| `GET` | `/storage/local` | — | Get all localStorage entries |
| `GET` | `/storage/session` | — | Get all sessionStorage entries |
| `GET` | `/storage/local/{key}` | — | Get a single localStorage key |
| `GET` | `/storage/session/{key}` | — | Get a single sessionStorage key |
| `POST` | `/storage/set` | `{"type":"local","key":"k","value":"v"}` | Set a storage entry. `type` is `"local"` or `"session"` |
| `POST` | `/storage/clear` | `{"type":"local"}` | Clear all entries for a storage type |
### Navigation
| `POST` | `/back` | Navigate back (`history.back()`) |
| `POST` | `/forward` | Navigate forward (`history.forward()`) |
| `POST` | `/reload` | Reload the page (`location.reload()`) |
### Find (Semantic Locators)
| `POST` | `/find` | `{"by":"role","value":"button","name":"Submit"}` | Find by ARIA role with optional accessible name |
| `POST` | `/find` | `{"by":"text","value":"Hello"}` | Find by visible text content |
| `POST` | `/find` | `{"by":"label","value":"Email"}` | Find by associated label text or `aria-label` |
| `POST` | `/find` | `{"by":"placeholder","value":"Search..."}` | Find by placeholder attribute |
| `POST` | `/find` | `{"by":"testid","value":"submit-btn"}` | Find by `data-testid` |
| `POST` | `/find` | `{"by":"alt","value":"Logo"}` | Find by `alt` attribute |
| `POST` | `/find` | `{"by":"title","value":"Close"}` | Find by `title` attribute |
| `POST` | `/find` | `{"by":"first","value":"button"}` | First element matching selector |
| `POST` | `/find` | `{"by":"last","value":".item"}` | Last element matching selector |
| `POST` | `/find` | `{"by":"nth","value":"li","index":2}` | Nth element matching selector (0-based) |
All `/find` calls return `{"ref":"@eN","tag":"...","text":"..."}` and accept optional `"action"` (`click`, `fill`, `type`, `hover`, `focus`), `"text"` (for fill/type), and `"exact":true` for exact text matching.
### Mouse / Keyboard
| `POST` | `/mouse` | `{"action":"move","x":100,"y":200}` | Move mouse to coordinates |
| `POST` | `/mouse` | `{"action":"down","x":100,"y":200,"button":"left"}` | Mouse button down |
| `POST` | `/mouse` | `{"action":"up","x":100,"y":200}` | Mouse button up |
| `POST` | `/mouse` | `{"action":"wheel","deltaY":100}` | Mouse wheel scroll |
| `POST` | `/keydown` | `{"key":"Shift"}` | Dispatch keydown event |
| `POST` | `/keyup` | `{"key":"Shift"}` | Dispatch keyup event |
### Highlight
| `POST` | `/highlight` | `{"ref":"@e1"}` | Visually highlight an element with a red overlay (2s duration) |
### Dialogs
| `POST` | `/dialog` | `{"action":"accept"}` | Auto-accept future alert/confirm/prompt dialogs |
| `POST` | `/dialog` | `{"action":"accept","text":"input"}` | Accept with custom prompt text |
| `POST` | `/dialog` | `{"action":"dismiss"}` | Auto-dismiss future dialogs |
| `GET` | `/dialogs` | — | Get and drain captured dialog events |
### Viewport
| `POST` | `/viewport` | `{"width":375,"height":812}` | Resize the webview window |
### State
| `POST` | `/state/save` | — | Save current localStorage, sessionStorage, and cookies |
| `POST` | `/state/load` | `{"localStorage":{...},"sessionStorage":{...},"cookies":{...}}` | Restore previously saved state |
### Screenshot
| `GET` | `/screenshot` | `?path=/tmp/shot.png` `&full=true` | Take a screenshot. Default path: `/tmp/agent-control-screenshot-<ts>.png`. `full=true` resizes to full page dimensions first. |
### Recording
| `POST` | `/record/start` | Start screen recording (macOS `screencapture -v`). Returns `{pid, path}` |
| `POST` | `/record/stop` | Stop recording (sends SIGINT to screencapture process). Returns `{path}` |
### Download
| `POST` | `/download` | `{"url":"https://...","path":"/tmp/file.zip"}` | Download a file using `curl` |
### Network
| `POST` | `/network/intercept` | — | Start intercepting fetch/XHR requests |
| `POST` | `/network/reset` | — | Stop intercepting and restore original fetch/XHR |
| `GET` | `/network/requests` | — | Get captured network request log |
| `POST` | `/network/route` | `{"url":"/api/data","body":"{\"mock\":true}"}` | Mock responses for matching URLs |
| `POST` | `/network/route` | `{"url":"/api/data","abort":true}` | Block requests to matching URLs |
### Geolocation
| `POST` | `/geo` | `{"lat":37.7749,"lng":-122.4194}` | Override `navigator.geolocation` |
### Offline
| `POST` | `/offline` | `{"enabled":true}` | Simulate offline mode (`navigator.onLine` → false) |
### Headers
| `POST` | `/headers` | `{"headers":{"Authorization":"Bearer token"}}` | Inject custom headers into all future fetch requests |
### Trace
| `POST` | `/trace/start` | Start a `PerformanceObserver` trace (resource, paint, LCP, layout-shift, etc.) |
| `POST` | `/trace/stop` | Stop tracing and return collected performance entries |
### Health
| `GET` | `/health` | Returns `{"ok":true}` — useful to check if the agent-control server is running |
## Ref Lifecycle
Every call to `/snapshot` or `/find` assigns elements a ref like `@e1`, `@e2`, etc. These refs are **invalidated whenever you take a new snapshot** — the ref map is cleared and rebuilt from scratch. If the page navigates or the DOM changes significantly, take a fresh snapshot before interacting.
```bash
# 1. Snapshot → get refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Save"
# 2. Use the ref
curl -X POST http://localhost:9876/click -d '{"ref":"@e1"}'
# 3. After DOM changes, snapshot again for fresh refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Saved ✓" ← refs are reassigned
```
## Semantic Locators (Find)
The `/find` endpoint lets you locate elements without needing a snapshot first. It assigns a new ref to the found element and optionally performs an action in one call.
```bash
# Find a button by role and click it
curl -X POST http://localhost:9876/find \
-d '{"by":"role","value":"button","name":"Submit","action":"click"}'
# Find an input by label and fill it
curl -X POST http://localhost:9876/find \
-d '{"by":"label","value":"Email","action":"fill","text":"user@example.com"}'
# Find by test ID
curl -X POST http://localhost:9876/find \
-d '{"by":"testid","value":"search-input","action":"type","text":"query"}'
# Find the 3rd list item (0-based index)
curl -X POST http://localhost:9876/find \
-d '{"by":"nth","value":"li","index":2}'
```
Supported locator strategies: `role`, `text`, `label`, `placeholder`, `testid`, `alt`, `title`, `first`, `last`, `nth`.
Use `"exact":true` for exact text/name matching instead of substring matching.
## Network Interception
Intercept, mock, and block network requests made by the webview:
```bash
# Start intercepting
curl -X POST http://localhost:9876/network/intercept
# Mock an API response
curl -X POST http://localhost:9876/network/route \
-d '{"url":"/api/users","body":"{\"users\":[{\"name\":\"Alice\"}]}"}'
# Block requests to analytics
curl -X POST http://localhost:9876/network/route \
-d '{"url":"analytics.example.com","abort":true}'
# View captured requests
curl http://localhost:9876/network/requests
# Reset (restore original fetch/XHR)
curl -X POST http://localhost:9876/network/reset
```
The interceptor patches `window.fetch` and `XMLHttpRequest`. Tauri IPC calls (`ipc://` URLs) are automatically excluded to avoid breaking `invoke()`.
## Security
- **Debug-only**: The HTTP server is gated behind `#[cfg(debug_assertions)]` and is completely compiled out of release builds.
- **Localhost-only**: The server listens on `localhost:9876`. It is not accessible from other machines.
- **Not for production**: This plugin gives full control over the webview — eval, DOM manipulation, cookie/storage access. Never ship it in a release build.
## Platform Notes
| Screenshot (`/screenshot`) | macOS only | `screencapture` (built-in) |
| Recording (`/record/start`, `/record/stop`) | macOS only | `screencapture -v` (built-in) |
| Download (`/download`) | Cross-platform | `curl` (must be on PATH) |
## Architecture
The plugin is composed of three parts:
1. **Rust HTTP server** — A raw TCP server built with `tokio::net::TcpListener` that manually parses HTTP/1.1 requests. It runs as a spawned async task during plugin setup (debug builds only).
2. **JS shim** (`guest-js/index.js`) — Embedded at compile time via `include_str!("../guest-js/index.js")`. It's injected into the webview on every navigation using Tauri's `on_navigation` hook with a 200ms delay to let the new page settle. The shim exposes `window.__DEVTOOLS__` with all DOM interaction methods.
3. **Eval bridge** — The communication channel between Rust and JS:
- Rust generates a unique 16-hex-char request ID
- Rust calls `webview.eval()` with wrapped JS that executes the expression
- The JS result is sent back via `window.__TAURI_INTERNALS__.invoke('plugin:agent-control|agent_control_respond', {requestId, data})`
- Rust receives the result through a `tokio::sync::oneshot` channel
- A configurable timeout (default 5s) cleans up pending requests
```
┌──────────┐ HTTP ┌────────────┐ eval() ┌───────────┐
│ Client │ ───────── │ Rust TCP │ ────────── │ Webview │
│ (curl) │ │ Server │ │ (JS) │
└──────────┘ └────────────┘ invoke() └───────────┘
▲ ──────────────────────┘
│ agent_control_respond(id, data)
│ via oneshot channel
```
## License
[MIT](https://opensource.org/licenses/MIT)