adk-browser
Browser automation tools for ADK-Rust agents using WebDriver (via thirtyfour).
Overview
This crate provides 46 browser automation tools as ADK Tool implementations, allowing LLM agents to interact with web pages. Tools are organized into categories and can be selectively enabled via profiles or builder toggles.
Requirements
A WebDriver-compatible server must be running:
# ChromeDriver
&&
# Selenium (Docker)
# With noVNC viewer (port 7900) — use observable() config
Quick Start
use ;
use Arc;
// Create and start a browser session
let config = new.headless.viewport;
let browser = new;
browser.start.await?;
// Use a profile to limit tools (recommended)
let toolset = with_profile;
let tools = toolset.all_tools;
// Or use minimal_browser_tools() for the smallest set
let tools = minimal_browser_tools;
// Clean up
browser.stop.await?;
Tool Profiles
Instead of using all 46 tools (which can overwhelm LLM context windows), use a profile:
| Profile | Tools | Use Case |
|---|---|---|
Minimal |
19 | Navigation + interaction + extraction + wait + screenshot |
FormFilling |
19 | Same as Minimal — optimized for form-filling agents |
Scraping |
13 | Navigation + extraction + screenshot + scroll (no interaction) |
Full |
46 | All tools — use only when full browser control is needed |
let toolset = with_profile;
For even fewer tools, use the helper functions:
// 6 tools: navigate, click, type, extract_text, wait_for_element, screenshot
let tools = minimal_browser_tools;
// 7 tools: navigate, extract_text, extract_attribute, extract_links, page_info, screenshot, scroll
let tools = readonly_browser_tools;
Or use the builder for fine-grained control:
let toolset = new
.with_navigation
.with_interaction
.with_extraction
.with_wait
.with_screenshot
.with_js
.with_cookies
.with_windows
.with_frames
.with_actions;
Tool Response Format
All navigation tools (browser_navigate, browser_back, browser_forward, browser_refresh) and interaction tools (browser_click, browser_type, browser_clear, browser_select) include a "page" field in their JSON response containing the current page context (URL, title, and truncated page text). This gives the LLM consistent situational awareness after any browser operation.
If page context capture fails after a successful operation, the response includes a "page_context_error" field instead of "page".
Multi-Tenant Browser Agents
For production multi-tenant use, create a pool-backed BrowserToolset and register it with LlmAgentBuilder via .toolset(). The toolset resolves a per-user BrowserSession from the pool at each invocation using the context's user_id.
use ;
use Arc;
// Create a session pool (shared across all invocations)
let pool = new;
// Pool-backed toolset — sessions resolved per-user at runtime
let toolset = with_pool;
// Or with a profile to limit tool categories
let toolset = with_pool_and_profile;
// Register with an agent via .toolset()
let agent = builder
.model
.toolset
.build;
Pool-backed toolsets resolve sessions lazily — pool.get_or_create(user_id) is called inside Toolset::tools(ctx). The synchronous all_tools() method returns an empty vec for pool-backed toolsets (with a warning log). Use Toolset::tools(ctx) or try_all_tools() instead.
For direct pool access without the toolset abstraction:
let pool = new;
let session = pool.get_or_create.await?;
session.navigate.await?;
// Graceful shutdown
pool.cleanup_all.await;
Session Lifecycle
BrowserSession automatically starts or reconnects the WebDriver when any browser method is called. You do not need to call start() manually — all public methods that access the WebDriver go through an internal live_driver() path that calls ensure_started() first.
let browser = new;
// No need to call start() — navigate will auto-start the session
browser.navigate.await?;
// If the WebDriver dies (Selenium restart, timeout, etc.),
// the next operation transparently recreates the session
browser.click.await?; // auto-reconnects if stale
// Explicit start/stop are still available for manual control
browser.start.await?;
browser.stop.await?;
// Check health (pings WebDriver, not just Option::is_some)
if browser.is_active.await
// Always stop before dropping to avoid orphaned WebDriver sessions
browser.stop.await?;
Observable Mode (noVNC)
When using Selenium's noVNC viewer for debugging:
let config = new.observable; // headless=false, 1280x720
Then open http://localhost:7900 to watch the browser in real-time.
Category Filtering
Fine-tune which tool categories are included:
let toolset = new
.with_navigation // navigate, back, forward, refresh
.with_interaction // click, double_click, type, clear, select
.with_extraction // extract_text, extract_attribute, extract_links, page_info, page_source
.with_wait // wait_for_element, wait, wait_for_page_load, wait_for_text
.with_screenshot // screenshot
.with_js // evaluate_js, scroll, hover, handle_alert
.with_cookies // get_cookies, get_cookie, add_cookie, delete_cookie, delete_all_cookies
.with_windows // list_windows, new_tab, new_window, switch_window, close_window, etc.
.with_frames // switch_to_frame, switch_to_parent_frame, switch_to_default_content
.with_actions; // drag_and_drop, right_click, focus, element_state, press_key, etc.
let tools = toolset.all_tools;
Available Tools (46)
Navigation (4 tools)
| Tool | Description |
|---|---|
browser_navigate |
Navigate to a URL |
browser_back |
Go back in history |
browser_forward |
Go forward in history |
browser_refresh |
Refresh current page |
Interaction (5 tools)
| Tool | Description |
|---|---|
browser_click |
Click an element (waits for clickable, returns page context) |
browser_double_click |
Double-click an element |
browser_type |
Type text into an input (optional clear_first, press_enter) |
browser_clear |
Clear an input field |
browser_select |
Select from dropdown by value, text, or index |
Extraction (5 tools)
| Tool | Description |
|---|---|
browser_extract_text |
Extract text from one or all matching elements |
browser_extract_attribute |
Get an attribute value (href, src, value, etc.) |
browser_extract_links |
Extract all links from page or container |
browser_page_info |
Get current URL and title |
browser_page_source |
Get HTML source (with max_length truncation) |
Screenshots (1 tool)
| Tool | Description |
|---|---|
browser_screenshot |
Capture page or element screenshot (optional artifact save) |
Waiting (4 tools)
| Tool | Description |
|---|---|
browser_wait_for_element |
Wait for element to appear (optional visible check) |
browser_wait |
Wait for a fixed duration (max 30s) |
browser_wait_for_page_load |
Wait for document.readyState === 'complete' |
browser_wait_for_text |
Wait for specific text to appear on page |
JavaScript (4 tools)
| Tool | Description |
|---|---|
browser_evaluate_js |
Execute JavaScript (sync or async) |
browser_scroll |
Scroll by direction, amount, or to element |
browser_hover |
Hover over an element (dispatches mouseenter + mouseover) |
browser_handle_alert |
Handle alerts/confirms/prompts (accept or dismiss) |
Cookies (5 tools)
| Tool | Description |
|---|---|
browser_get_cookies |
Get all cookies |
browser_get_cookie |
Get a specific cookie by name |
browser_add_cookie |
Add a cookie (with optional domain, path, secure, expiry) |
browser_delete_cookie |
Delete a cookie by name |
browser_delete_all_cookies |
Delete all cookies |
Windows/Tabs (8 tools)
| Tool | Description |
|---|---|
browser_list_windows |
List all windows/tabs |
browser_new_tab |
Open a new tab (optional URL) |
browser_new_window |
Open a new window (optional URL) |
browser_switch_window |
Switch to a window by handle |
browser_close_window |
Close current window |
browser_maximize_window |
Maximize window |
browser_minimize_window |
Minimize window |
browser_set_window_size |
Set window size and position |
Frames (3 tools)
| Tool | Description |
|---|---|
browser_switch_to_frame |
Switch to iframe by index or selector |
browser_switch_to_parent_frame |
Exit current iframe |
browser_switch_to_default_content |
Exit all iframes |
Advanced Actions (7 tools)
| Tool | Description |
|---|---|
browser_drag_and_drop |
Drag element to target |
browser_right_click |
Right-click (context menu) |
browser_focus |
Focus an element |
browser_element_state |
Check displayed/enabled/selected/clickable state |
browser_press_key |
Press keyboard key with optional modifiers (Ctrl, Alt, Shift, Meta) |
browser_file_upload |
Upload file to input element |
browser_print_to_pdf |
Print page to PDF (base64) |
Configuration
let config = new
.webdriver_url
.browser
.headless
.viewport
.page_load_timeout
.user_agent
.add_arg;
// For noVNC-compatible viewing (headless=false, 1280x720)
let observable_config = new.observable;
Element Selectors
Tools that target elements accept CSS selectors:
#login-button // By ID
.submit-btn // By class
input[type='email'] // By attribute
[data-testid='search'] // By data attribute
form.login input[name='password'] // Complex selector
WebDriver Servers
| Server | Command |
|---|---|
| Selenium (Chrome) | docker run -d -p 4444:4444 selenium/standalone-chrome |
| Selenium + noVNC | docker run -d -p 4444:4444 -p 7900:7900 --shm-size=2g selenium/standalone-chrome |
| Selenium (Firefox) | docker run -d -p 4444:4444 selenium/standalone-firefox |
| ChromeDriver | chromedriver --port=4444 |
| GeckoDriver | geckodriver --port=4444 |
Architecture
┌─────────────────────────────────────────────────┐
│ LlmAgent │
│ .toolset(browser_toolset) or .tool(...) │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ BrowserToolset (impl Toolset) │
│ Fixed session or pool-backed per-user session │
│ Profile / builder-based tool selection │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ BrowserSession / BrowserSessionPool │
│ Auto-start and reconnect via ensure_started() │
└─────────────────────────────────────────────────┘
│
▼
WebDriver Server
(ChromeDriver, Selenium)
Shutdown
Always stop sessions before exiting to avoid orphaned WebDriver processes:
// Single session
browser.stop.await?;
// Session pool
pool.cleanup_all.await;
// With tokio shutdown signal
select!
License
Apache-2.0
Part of ADK-Rust
This crate is part of the ADK-Rust framework for building AI agents in Rust.