Browsing
Lightweight MCP/API for browser automation
A concise MCP server and Rust library: navigate, get_links, follow_link, list_content (links+images), get_content, get_image, save_content, screenshot (full or element). Lazy browser init. Parallel reads via RwLock.
π― Usage Modes
- π MCP Server (primary) -
navigate,get_links,follow_link,list_content,get_content,get_image,save_content,screenshot,generate_sitemaptools for AI assistants - β¨οΈ CLI - Autonomous browsing tasks
- π¦ Library - Full agent system with LLM, custom actions
β¨ Why Browsing?
Building AI agents that can navigate and interact with websites is challenging. You need to:
- Extract structured data from unstructured HTML - Parse complex DOM trees and make them LLM-readable
- Handle browser automation reliably - Manage browser lifecycle, CDP connections, and process management
- Coordinate multiple subsystems - Orchestrate DOM extraction, LLM inference, and action execution
- Maintain testability - Mock components for unit testing without real browsers
- Support extensibility - Add custom actions, browser backends, and LLM providers
Browsing solves all of this with a clean, modular, and well-tested architecture.
π― Key Features
ποΈ Trait-Based Architecture
- BrowserClient trait - Abstract browser operations for easy mocking and alternative backends
- DOMProcessor trait - Pluggable DOM processing implementations
- ActionHandler trait - Extensible action system for custom behaviors
π€ Autonomous Agent System
- Complete agent execution loop with LLM integration
- Robust action parsing with JSON repair
- History tracking with state snapshots
- Graceful error handling and recovery
π Full Browser Automation
- Cross-platform support (macOS, Linux, Windows)
- Automatic browser detection
- Chrome DevTools Protocol (CDP) integration
- Tab management (create, switch, close)
- Screenshot capture (page and element-level)
π Advanced DOM Processing
- Full CDP integration (DOM, AX tree, Snapshot)
- LLM-ready serialization with interactive element indices
- Accessibility tree support for better semantic understanding
- Optimized for token efficiency
π§ Extensible & Maintainable
- Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
- Custom action registration
- Utility traits for reduced code duplication
- Comprehensive test coverage (200+ tests)
π¦ Installation
As a Library
[]
= "0.1"
= { = "1.40", = ["full"] }
As a CLI Tool
As an MCP Server
π Quick Start
1οΈβ£ CLI Usage
# Run an autonomous browsing task
# Launch a browser and get CDP URL
# Connect to existing browser
2οΈβ£ MCP Server Usage
Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
Then ask Claude:
"Navigate to rust-lang.org, get the links, follow the second link, and screenshot the main content area"
3οΈβ£ Library Usage
use Result;
use ;
async
π Full Library Documentation
Browser Launch Options
use ;
// Option 1: Auto-launch browser (default)
let profile = default;
let browser = new;
// Option 2: Connect to existing browser
let browser = new
.with_cdp_url;
// Option 3: Custom browser executable
use BrowserLauncher;
let launcher = new
.with_executable_path;
Using Traits for Testing
use ;
use Agent;
use Arc;
// Create mock browser for testing
async
π Usage Examples
Content Download
use ;
use DOMProcessorImpl;
use DOMProcessor;
async
Run this example:
Screenshot Capture
use Browser;
let browser = new;
browser.start.await?;
// Full page screenshot
let screenshot_data = browser.take_screenshot.await?;
// Viewport only
let viewport = browser.take_screenshot.await?;
Direct Browser Control
use ;
async
Custom Actions
use ;
use ActionModel;
use Result;
;
// Register custom action
agent.tools.register_custom_action;
ποΈ Architecture
Browsing follows SOLID principles with a focus on separation of concerns, testability, and maintainability.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent β
β βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ β
β β Browser β DOMProcessor β LLM β Tools β β
β β (trait) β (trait) β (trait) β β β
β ββββββββ¬βββββββ΄βββββββ¬ββββββββ΄βββββββ¬ββββββββ΄βββββ¬βββββ β
β β β β β β
βββββββββββΌββββββββββββββΌβββββββββββββββΌβββββββββββββΌββββββββ
β β β β
βββββββΌβββββββ βββββΌβββββ ββββββΌββββ ββββββΌββββββ
β Browser β βDomSvc β β LLM β β Handlers β
β β β β β β β β
βTabManager β βCDP β βChat β βNavigationβ
βNavManager β βHTML β βModel β βInteractionβ
βScreenshot β βTree β β β βTabs β
β β βBuilder β β β βContent β
ββββββββββββββ ββββββββββ ββββββββββ ββββββββββββ
Key Components
| Component | Responsibility | Trait-Based |
|---|---|---|
| Agent | Orchestrates browser, LLM, and DOM processing | Uses BrowserClient, DOMProcessor |
| Browser | Manages browser session and lifecycle | Implements BrowserClient |
| DOMProcessor | Extracts and serializes DOM | Implements DOMProcessor |
| Tools | Action registry and execution | Uses BrowserClient trait |
| Handlers | Specific action implementations | Use ActionHandler trait |
π Project Structure
browsing/
βββ src/
β βββ agent/ # Agent orchestration
β β βββ service.rs # Main agent implementation
β β βββ json_extractor.rs # JSON parsing utilities
β βββ browser/ # Browser management
β β βββ session.rs # Browser session (BrowserClient impl)
β β βββ tab_manager.rs # Tab operations
β β βββ navigation.rs # Navigation operations
β β βββ screenshot.rs # Screenshot operations
β β βββ cdp.rs # CDP WebSocket client
β β βββ launcher.rs # Browser launcher
β β βββ profile.rs # Browser configuration
β βββ dom/ # DOM processing
β β βββ processor.rs # DOMProcessor trait impl
β β βββ serializer.rs # LLM-ready serialization
β β βββ tree_builder.rs # DOM tree construction
β β βββ cdp_client.rs # CDP wrapper for DOM
β β βββ html_converter.rs # HTML to markdown
β βββ tools/ # Action system
β β βββ service.rs # Tools registry
β β βββ handlers/ # Action handlers
β β β βββ navigation.rs
β β β βββ interaction.rs
β β β βββ tabs.rs
β β β βββ content.rs
β β β βββ advanced.rs
β β βββ params.rs # Parameter extraction
β βββ traits/ # Core trait abstractions
β β βββ browser_client.rs # BrowserClient trait
β β βββ dom_processor.rs # DOMProcessor trait
β βββ llm/ # LLM integration
β β βββ base.rs # ChatModel trait
β βββ actor/ # Low-level interactions
β β βββ page.rs # Page operations
β β βββ element.rs # Element operations
β β βββ mouse.rs # Mouse interactions
β βββ config/ # Configuration
β βββ error/ # Error types
β βββ utils/ # Utilities
βββ Cargo.toml
π¨ Design Principles
Trait-Facing Design
- BrowserClient - Abstract browser operations for testing and alternative backends
- DOMProcessor - Pluggable DOM processing implementations
- ActionHandler - Extensible action system
- ChatModel - LLM provider abstraction
Separation of Concerns
- TabManager - Tab operations (create, switch, close)
- NavigationManager - Navigation logic
- ScreenshotManager - Screenshot capture
- Handlers - Focused action implementations
DRY (Don't Repeat Yourself)
- ActionParams - Reusable parameter extraction
- JSONExtractor - Centralized JSON parsing
- SessionGuard - Unified session access
KISS (Keep It Simple, Stupid)
- Split complex methods into focused helpers
- Clear naming and single responsibility
- Minimal dependencies between modules
π§ͺ Testing
# Run all tests
# Run with output
# Run specific test
# Run integration tests only
Test Coverage
- 317 tests across all modules (all passing)
- 50+ integration tests for full workflow
- 150+ unit tests for individual components
- Test files:
- actor_test.rs - Page, Element, Mouse, Keyboard operations (23 passed)
- browser_managers_test.rs - Navigation, Screenshot, Tab managers
- tools_handlers_test.rs - All action handlers (49 passed)
- agent_service_test.rs - Agent execution logic (32 passed)
- agent_execution_test.rs - Agent workflow tests (11 passed)
- traits_test.rs - BrowserClient, DOMProcessor traits (24 passed)
- utils_test.rs - URL extraction, signal handling (49 passed)
- Mock implementations for deterministic testing
- Trait-based mocking for browser/DOM components
β οΈ Data Retention Policy
Browser Data is NEVER Deleted
IMPORTANT: The browsing library never deletes browser data for safety reasons.
What This Means:
| Data Type | Behavior |
|---|---|
| Bookmarks | Never deleted |
| History | Never deleted |
| Cookies | Never deleted |
| Passwords | Never deleted |
| Extensions | Never deleted |
| Cache | Never deleted |
| Temp Directories | Never deleted (left in /tmp/) |
Why This Policy Exists:
- User Safety: Users may specify a custom
user_data_dirpointing to their real browser profile - Catastrophe Prevention: Accidentally deleting a user's real browser data (bookmarks, history, passwords) would be devastating
- Debugging: Leaving temp directories allows inspection after crashes or failures
- User Control: Users are responsible for managing their own browser data
How It Works:
When no user_data_dir is specified:
let profile = BrowserProfile ;
When browser.stop() is called:
- β Browser process is killed
- β In-memory state is cleared
- β User data directory is NOT deleted
Managing Temporary Data:
Users are responsible for cleanup:
# List browser temp directories
# Delete old temp directories (optional, manual cleanup)
Using a Custom Data Directory:
let profile = BrowserProfile ;
Warning: If you point to your real browser profile, the library will NOT protect it. You're responsible for that directory.
π§ Configuration
Browser Profile
use BrowserProfile;
let profile = BrowserProfile ;
Agent Settings
use AgentSettings;
let agent = new
.with_max_steps
.with_settings;
π API Documentation
Generate and view API docs: