tauri-plugin-agent-control 0.1.0

Dev-only HTTP bridge for observing and controlling Tauri webviews — like Chrome DevTools Protocol, but for Tauri.
docs.rs failed to build tauri-plugin-agent-control-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

tauri-plugin-agent-control

License: MIT Tauri v2

Dev-only HTTP bridge for Tauri webviews — like Chrome DevTools Protocol, but for Tauri.

What it does

tauri-plugin-agent-control exposes your Tauri webview over a local HTTP API at http://localhost:9876. You can snapshot the DOM, click buttons, fill inputs, evaluate JavaScript, intercept network requests, take screenshots, and more — all via simple curl commands or any HTTP client. The server only runs in debug builds, so it's completely stripped from production.

Installation

1. Add the Rust dependency

From crates.io:

[dependencies]
tauri-plugin-agent-control = "0.1"

Or from git:

[dependencies]
tauri-plugin-agent-control = { git = "https://github.com/palanik1/tauri-plugin-agent-control" }

2. Register the plugin

In your src-tauri/src/main.rs (or lib.rs):

fn main() {
    tauri::Builder::default()
        .plugin(tauri_plugin_agent_control::init())
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
}

3. Add capability permissions

In your src-tauri/capabilities/default.json (create it if it doesn't exist):

{
  "identifier": "default",
  "windows": ["main"],
  "permissions": [
    "agent-control:default"
  ]
}

agent-control:default grants allow-agent-control-respond — the only Tauri command the plugin registers. It allows the JS shim to send eval results back to the Rust HTTP server.

4. Run in dev mode

cargo tauri dev

The HTTP bridge starts automatically on http://localhost:9876 in debug builds. Verify it's running:

curl -s http://localhost:9876/health
# {"ok":true}

Production builds: The server is completely stripped — it's gated behind #[cfg(debug_assertions)]. No code, no port, no surface area.

AI Agent Setup

Install the Amp/AI agent skill

Copy the SKILL.md from this package into your project's skill directory:

# Project-level (recommended)
mkdir -p .agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md

# Or user-level (available across all projects)
mkdir -p ~/.config/agents/skills/tauri-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md ~/.config/agents/skills/tauri-agent-control/SKILL.md

Or install via npm and copy:

npm install tauri-plugin-agent-control
cp node_modules/tauri-plugin-agent-control/SKILL.md .agents/skills/tauri-agent-control/SKILL.md

Once installed, your AI agent (Amp, Claude Code, etc.) can interact with your running Tauri app via natural language:

  • "Take a screenshot of the app"
  • "Click the Submit button"
  • "Fill in the email field with test@example.com"
  • "Check if the sidebar is visible"

The agent uses the skill to translate these into the appropriate curl commands to the HTTP bridge.

Quick Start

1. Snapshot the page to see what's on screen:

curl http://localhost:9876/snapshot?format=compact
[page] My App — http://localhost:1420/
  @e1 [button] "Submit" role="button"
  @e2 [input type="text"] placeholder="Enter name" name="username"
  @e3 [a] "Home" href="/"

2. Interact with elements using their refs:

# Fill an input
curl -X POST http://localhost:9876/fill \
  -d '{"ref":"@e2","text":"hello world"}'

# Click a button
curl -X POST http://localhost:9876/click \
  -d '{"ref":"@e1"}'

3. Re-snapshot to see the updated state:

curl http://localhost:9876/snapshot?format=compact

4. Evaluate arbitrary JavaScript:

curl -X POST http://localhost:9876/eval \
  -d '{"code":"document.title"}'

API Reference

Snapshot

Method Path Query / Body Description
GET /snapshot ?format=compact|json &scope=<selector> &depth=<n> &interactive=true|false Snapshot visible elements. compact returns a text tree, json returns full element data. interactive=false includes all elements, not just interactive ones.

Interactions

Method Path Body Description
POST /click {"ref":"@e1"} Click an element
POST /dblclick {"ref":"@e1"} Double-click an element
POST /hover {"ref":"@e1"} Hover over an element (mouseenter + mouseover)
POST /focus {"ref":"@e1"} Focus an element
POST /fill {"ref":"@e1","text":"value"} Set input/textarea value directly (fires input + change)
POST /type {"ref":"@e1","text":"value"} Type text character-by-character (fires keydown/keypress/input/keyup per char)
POST /press {"key":"Enter"} Press a key. Supports modifiers: "Control+a", "Meta+Shift+z"
POST /check {"ref":"@e1"} Check a checkbox
POST /uncheck {"ref":"@e1"} Uncheck a checkbox
POST /select {"ref":"@e1","values":["opt1","opt2"]} Select options in a <select> element
POST /scroll {"direction":"down","amount":500} Scroll the page. Optional "selector" to scroll a container. Directions: up, down, left, right
POST /scrollintoview {"ref":"@e1"} Scroll an element into view (smooth, centered)
POST /drag {"from":"@e1","to":"@e2"} Drag and drop between two elements
POST /upload {"ref":"@e1","dataUrl":"data:..."} Upload a file to a file input via base64 data URL

Getters

Method Path Description
GET /get/title Get document.title
GET /get/url Get location.href
GET /get/text/{ref} Get textContent of an element
GET /get/html/{ref} Get innerHTML of an element
GET /get/value/{ref} Get value of an input/textarea
GET /get/attr/{ref}/{attr} Get an attribute value
GET /get/styles/{ref} Get computed styles (font, color, display, position, etc.)
GET /get/box/{ref} Get bounding box {x, y, width, height}
GET /get/count/{selector} Count elements matching a CSS selector (URL-encode the selector)

State Checks

Method Path Description
GET /is/visible/{ref} Check if element is visible
GET /is/enabled/{ref} Check if element is enabled (not disabled)
GET /is/checked/{ref} Check if checkbox/radio is checked

Wait

Method Path Body Description
POST /wait {"ms":1000} Wait for a fixed delay
POST /wait {"selector":".loaded"} Wait for a CSS selector to appear
POST /wait {"text":"Success"} Wait for text to appear on the page
POST /wait {"url":"**/dashboard"} Wait for URL to match (glob patterns supported)
POST /wait {"fn":"document.querySelector('.x')"} Wait for a JS expression to be truthy
POST /wait {"selector":".x","timeout":10000} Custom timeout (default: 5000ms)

Eval

Method Path Body Description
POST /eval {"code":"document.title"} Evaluate arbitrary JavaScript and return the result

Console / Errors

Method Path Description
GET /console Get and drain captured console logs (log, warn, error, info, debug)
GET /errors Get and drain captured console errors only

Cookies

Method Path Body Description
GET /cookies Get all cookies as {name: value}
POST /cookies/set {"name":"x","value":"y"} Set a cookie
POST /cookies/clear Clear all cookies

Storage

Method Path Body Description
GET /storage/local Get all localStorage entries
GET /storage/session Get all sessionStorage entries
GET /storage/local/{key} Get a single localStorage key
GET /storage/session/{key} Get a single sessionStorage key
POST /storage/set {"type":"local","key":"k","value":"v"} Set a storage entry. type is "local" or "session"
POST /storage/clear {"type":"local"} Clear all entries for a storage type

Navigation

Method Path Description
POST /back Navigate back (history.back())
POST /forward Navigate forward (history.forward())
POST /reload Reload the page (location.reload())

Find (Semantic Locators)

Method Path Body Description
POST /find {"by":"role","value":"button","name":"Submit"} Find by ARIA role with optional accessible name
POST /find {"by":"text","value":"Hello"} Find by visible text content
POST /find {"by":"label","value":"Email"} Find by associated label text or aria-label
POST /find {"by":"placeholder","value":"Search..."} Find by placeholder attribute
POST /find {"by":"testid","value":"submit-btn"} Find by data-testid
POST /find {"by":"alt","value":"Logo"} Find by alt attribute
POST /find {"by":"title","value":"Close"} Find by title attribute
POST /find {"by":"first","value":"button"} First element matching selector
POST /find {"by":"last","value":".item"} Last element matching selector
POST /find {"by":"nth","value":"li","index":2} Nth element matching selector (0-based)

All /find calls return {"ref":"@eN","tag":"...","text":"..."} and accept optional "action" (click, fill, type, hover, focus), "text" (for fill/type), and "exact":true for exact text matching.

Mouse / Keyboard

Method Path Body Description
POST /mouse {"action":"move","x":100,"y":200} Move mouse to coordinates
POST /mouse {"action":"down","x":100,"y":200,"button":"left"} Mouse button down
POST /mouse {"action":"up","x":100,"y":200} Mouse button up
POST /mouse {"action":"wheel","deltaY":100} Mouse wheel scroll
POST /keydown {"key":"Shift"} Dispatch keydown event
POST /keyup {"key":"Shift"} Dispatch keyup event

Highlight

Method Path Body Description
POST /highlight {"ref":"@e1"} Visually highlight an element with a red overlay (2s duration)

Dialogs

Method Path Body Description
POST /dialog {"action":"accept"} Auto-accept future alert/confirm/prompt dialogs
POST /dialog {"action":"accept","text":"input"} Accept with custom prompt text
POST /dialog {"action":"dismiss"} Auto-dismiss future dialogs
GET /dialogs Get and drain captured dialog events

Viewport

Method Path Body Description
POST /viewport {"width":375,"height":812} Resize the webview window

State

Method Path Body Description
POST /state/save Save current localStorage, sessionStorage, and cookies
POST /state/load {"localStorage":{...},"sessionStorage":{...},"cookies":{...}} Restore previously saved state

Screenshot

Method Path Query Description
GET /screenshot ?path=/tmp/shot.png &full=true Take a screenshot. Default path: /tmp/agent-control-screenshot-<ts>.png. full=true resizes to full page dimensions first.

Recording

Method Path Description
POST /record/start Start screen recording (macOS screencapture -v). Returns {pid, path}
POST /record/stop Stop recording (sends SIGINT to screencapture process). Returns {path}

Download

Method Path Body Description
POST /download {"url":"https://...","path":"/tmp/file.zip"} Download a file using curl

Network

Method Path Body Description
POST /network/intercept Start intercepting fetch/XHR requests
POST /network/reset Stop intercepting and restore original fetch/XHR
GET /network/requests Get captured network request log
POST /network/route {"url":"/api/data","body":"{\"mock\":true}"} Mock responses for matching URLs
POST /network/route {"url":"/api/data","abort":true} Block requests to matching URLs

Geolocation

Method Path Body Description
POST /geo {"lat":37.7749,"lng":-122.4194} Override navigator.geolocation

Offline

Method Path Body Description
POST /offline {"enabled":true} Simulate offline mode (navigator.onLine → false)

Headers

Method Path Body Description
POST /headers {"headers":{"Authorization":"Bearer token"}} Inject custom headers into all future fetch requests

Trace

Method Path Description
POST /trace/start Start a PerformanceObserver trace (resource, paint, LCP, layout-shift, etc.)
POST /trace/stop Stop tracing and return collected performance entries

Health

Method Path Description
GET /health Returns {"ok":true} — useful to check if the agent-control server is running

Ref Lifecycle

Every call to /snapshot or /find assigns elements a ref like @e1, @e2, etc. These refs are invalidated whenever you take a new snapshot — the ref map is cleared and rebuilt from scratch. If the page navigates or the DOM changes significantly, take a fresh snapshot before interacting.

# 1. Snapshot → get refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Save"

# 2. Use the ref
curl -X POST http://localhost:9876/click -d '{"ref":"@e1"}'

# 3. After DOM changes, snapshot again for fresh refs
curl http://localhost:9876/snapshot?format=compact
# @e1 [button] "Saved ✓"   ← refs are reassigned

Semantic Locators (Find)

The /find endpoint lets you locate elements without needing a snapshot first. It assigns a new ref to the found element and optionally performs an action in one call.

# Find a button by role and click it
curl -X POST http://localhost:9876/find \
  -d '{"by":"role","value":"button","name":"Submit","action":"click"}'

# Find an input by label and fill it
curl -X POST http://localhost:9876/find \
  -d '{"by":"label","value":"Email","action":"fill","text":"user@example.com"}'

# Find by test ID
curl -X POST http://localhost:9876/find \
  -d '{"by":"testid","value":"search-input","action":"type","text":"query"}'

# Find the 3rd list item (0-based index)
curl -X POST http://localhost:9876/find \
  -d '{"by":"nth","value":"li","index":2}'

Supported locator strategies: role, text, label, placeholder, testid, alt, title, first, last, nth.

Use "exact":true for exact text/name matching instead of substring matching.

Network Interception

Intercept, mock, and block network requests made by the webview:

# Start intercepting
curl -X POST http://localhost:9876/network/intercept

# Mock an API response
curl -X POST http://localhost:9876/network/route \
  -d '{"url":"/api/users","body":"{\"users\":[{\"name\":\"Alice\"}]}"}'

# Block requests to analytics
curl -X POST http://localhost:9876/network/route \
  -d '{"url":"analytics.example.com","abort":true}'

# View captured requests
curl http://localhost:9876/network/requests

# Reset (restore original fetch/XHR)
curl -X POST http://localhost:9876/network/reset

The interceptor patches window.fetch and XMLHttpRequest. Tauri IPC calls (ipc:// URLs) are automatically excluded to avoid breaking invoke().

Security

  • Debug-only: The HTTP server is gated behind #[cfg(debug_assertions)] and is completely compiled out of release builds.
  • Localhost-only: The server listens on localhost:9876. It is not accessible from other machines.
  • Not for production: This plugin gives full control over the webview — eval, DOM manipulation, cookie/storage access. Never ship it in a release build.

Platform Notes

Feature Platform Dependency
Screenshot (/screenshot) macOS only screencapture (built-in)
Recording (/record/start, /record/stop) macOS only screencapture -v (built-in)
Download (/download) Cross-platform curl (must be on PATH)

Architecture

The plugin is composed of three parts:

  1. Rust HTTP server — A raw TCP server built with tokio::net::TcpListener that manually parses HTTP/1.1 requests. It runs as a spawned async task during plugin setup (debug builds only).

  2. JS shim (guest-js/index.js) — Embedded at compile time via include_str!("../guest-js/index.js"). It's injected into the webview on every navigation using Tauri's on_navigation hook with a 200ms delay to let the new page settle. The shim exposes window.__DEVTOOLS__ with all DOM interaction methods.

  3. Eval bridge — The communication channel between Rust and JS:

    • Rust generates a unique 16-hex-char request ID
    • Rust calls webview.eval() with wrapped JS that executes the expression
    • The JS result is sent back via window.__TAURI_INTERNALS__.invoke('plugin:agent-control|agent_control_respond', {requestId, data})
    • Rust receives the result through a tokio::sync::oneshot channel
    • A configurable timeout (default 5s) cleans up pending requests
┌──────────┐   HTTP    ┌────────────┐  eval()   ┌───────────┐
│  Client  │ ───────── │ Rust TCP   │ ────────── │  Webview  │
│ (curl)   │           │ Server     │            │  (JS)     │
└──────────┘           └────────────┘  invoke()  └───────────┘
                             ▲ ──────────────────────┘
                             │  agent_control_respond(id, data)
                             │  via oneshot channel

License

MIT