# Browsing
**Lightweight MCP/API for browser automation**
A concise MCP server and Rust library: **navigate**, **get_links**, **follow_link**, **list_content** (links+images), **get_content**, **get_image**, **save_content**, **screenshot** (full or element). Lazy browser init. Parallel reads via RwLock.
## π― Usage Modes
1. **π MCP Server** (primary) - `navigate`, `get_links`, `follow_link`, `list_content`, `get_content`, `get_image`, `save_content`, `screenshot`, `generate_sitemap` tools for AI assistants
2. **β¨οΈ CLI** - Autonomous browsing tasks
3. **π¦ Library** - Full agent system with LLM, custom actions
## β¨ Why Browsing?
Building AI agents that can navigate and interact with websites is challenging. You need to:
- **Extract structured data from unstructured HTML** - Parse complex DOM trees and make them LLM-readable
- **Handle browser automation reliably** - Manage browser lifecycle, CDP connections, and process management
- **Coordinate multiple subsystems** - Orchestrate DOM extraction, LLM inference, and action execution
- **Maintain testability** - Mock components for unit testing without real browsers
- **Support extensibility** - Add custom actions, browser backends, and LLM providers
**Browsing solves all of this** with a clean, modular, and well-tested architecture.
## π― Key Features
### ποΈ Trait-Based Architecture
- **BrowserClient trait** - Abstract browser operations for easy mocking and alternative backends
- **DOMProcessor trait** - Pluggable DOM processing implementations
- **ActionHandler trait** - Extensible action system for custom behaviors
### π€ Autonomous Agent System
- Complete agent execution loop with LLM integration
- Robust action parsing with JSON repair
- History tracking with state snapshots
- Graceful error handling and recovery
### π Full Browser Automation
- Cross-platform support (macOS, Linux, Windows)
- Automatic browser detection
- Chrome DevTools Protocol (CDP) integration
- Tab management (create, switch, close)
- Screenshot capture (page and element-level)
### π Advanced DOM Processing
- Full CDP integration (DOM, AX tree, Snapshot)
- LLM-ready serialization with interactive element indices
- Accessibility tree support for better semantic understanding
- Optimized for token efficiency
### π§ Extensible & Maintainable
- Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
- Custom action registration
- Utility traits for reduced code duplication
- Comprehensive test coverage (200+ tests)
## π¦ Installation
### As a Library
```toml
[dependencies]
browsing = "0.1"
tokio = { version = "1.40", features = ["full"] }
```
### As a CLI Tool
```bash
cargo install --path . --bin browsing
```
### As an MCP Server
```bash
cargo build --release --bin browsing-mcp
```
## π Quick Start
### 1οΈβ£ CLI Usage
```bash
# Run an autonomous browsing task
browsing run "Find the latest news about AI" --url https://news.ycombinator.com --headless
# Launch a browser and get CDP URL
browsing launch --headless
# Connect to existing browser
browsing connect ws://localhost:9222/devtools/browser/abc123
```
**π [Full CLI Documentation](docs/CLI_USAGE.md)**
### 2οΈβ£ MCP Server Usage
Configure in Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"browsing": {
"command": "/path/to/browsing/target/release/browsing-mcp",
"env": {
"BROWSER_USE_HEADLESS": "true"
}
}
}
}
```
Then ask Claude:
```
"Navigate to rust-lang.org, get the links, follow the second link, and screenshot the main content area"
```
**π [Full MCP Documentation](docs/MCP_USAGE.md)**
### 3οΈβ£ Library Usage
```rust
use anyhow::Result;
use browsing::{Browser, Config};
#[tokio::main]
async fn main() -> Result<()> {
browsing::init();
let config = Config::from_env();
let browser = Browser::launch(config.browser_profile).await?;
browser.navigate("https://example.com").await?;
let state = browser.get_browser_state_summary(true).await?;
println!("Title: {}", state.title);
Ok(())
}
```
**π [Full Library Documentation](docs/LIBRARY_USAGE.md)**
### Browser Launch Options
```rust
use browsing::{Browser, BrowserProfile};
// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);
// Option 2: Connect to existing browser
let browser = Browser::new(profile)
.with_cdp_url("http://localhost:9222".to_string());
// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
.with_executable_path(std::path::PathBuf::from("/path/to/chrome"));
```
### Using Traits for Testing
```rust
use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;
// Create mock browser for testing
struct MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize,
}
#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
async fn start(&mut self) -> Result<(), BrowsingError> {
Ok(())
}
async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
Ok(())
}
// ... implement other trait methods
}
#[tokio::test]
async fn test_agent_with_mock_browser() {
let mock_browser = Box::new(MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize::new(0),
});
// Test agent behavior without real browser
let dom_processor = Box::new(MockDOMProcessor::new());
let llm = MockLLM::new();
let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
// ... test agent
}
```
## π Usage Examples
### Content Download
```rust
use browsing::{Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::traits::DOMProcessor;
#[tokio::main]
async fn main() -> browsing::error::Result<()> {
let mut browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Navigate to website
browser.navigate("https://www.ibm.com").await?;
tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;
// Extract content
let cdp_client = browser.get_cdp_client()?;
let session_id = browser.get_session_id()?;
let target_id = browser.get_current_target_id()?;
let dom_processor = DOMProcessorImpl::new()
.with_cdp_client(cdp_client, session_id)
.with_target_id(target_id);
let page_content = dom_processor.get_page_state_string().await?;
println!("Extracted {} bytes of content", page_content.len());
// Save to file
std::fs::write("ibm_content.txt", page_content)?;
Ok(())
}
```
**Run this example:**
```bash
cargo run --example ibm_content_download
```
### Screenshot Capture
```rust
use browsing::Browser;
let browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Full page screenshot
let screenshot_data = browser.take_screenshot(
Some("screenshot.png"), // path
true, // full_page
).await?;
// Viewport only
let viewport = browser.take_screenshot(
Some("viewport.png"),
false,
).await?;
```
### Direct Browser Control
```rust
use browsing::{Browser, BrowserProfile};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Navigate
browser.navigate("https://example.com").await?;
// Get current URL
let url = browser.get_current_url().await?;
println!("Current URL: {}", url);
// Tab management
browser.create_new_tab(Some("https://hackernews.com")).await?;
let tabs = browser.get_tabs().await?;
println!("Open tabs: {}", tabs.len());
// Switch tabs
browser.switch_to_tab(&tabs[0].target_id).await?;
Ok(())
}
```
### Custom Actions
```rust
use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;
struct CustomActionHandler;
#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
async fn execute(
&self,
params: &ActionParams<'_>,
context: &mut ActionContext<'_>,
) -> Result<ActionResult> {
// Custom action logic here
Ok(ActionResult {
extracted_content: Some("Custom result".to_string()),
..Default::default()
})
}
}
// Register custom action
agent.tools.register_custom_action(
"custom_action".to_string(),
"Description of custom action".to_string(),
None, // domains
CustomActionHandler,
);
```
## ποΈ Architecture
Browsing follows **SOLID principles** with a focus on separation of concerns, testability, and maintainability.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent β
β βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ β
β β Browser β DOMProcessor β LLM β Tools β β
β β (trait) β (trait) β (trait) β β β
β ββββββββ¬βββββββ΄βββββββ¬ββββββββ΄βββββββ¬ββββββββ΄βββββ¬βββββ β
β β β β β β
βββββββββββΌββββββββββββββΌβββββββββββββββΌβββββββββββββΌββββββββ
β β β β
βββββββΌβββββββ βββββΌβββββ ββββββΌββββ ββββββΌββββββ
β Browser β βDomSvc β β LLM β β Handlers β
β β β β β β β β
βTabManager β βCDP β βChat β βNavigationβ
βNavManager β βHTML β βModel β βInteractionβ
βScreenshot β βTree β β β βTabs β
β β βBuilder β β β βContent β
ββββββββββββββ ββββββββββ ββββββββββ ββββββββββββ
```
### Key Components
| **Agent** | Orchestrates browser, LLM, and DOM processing | Uses `BrowserClient`, `DOMProcessor` |
| **Browser** | Manages browser session and lifecycle | Implements `BrowserClient` |
| **DOMProcessor** | Extracts and serializes DOM | Implements `DOMProcessor` |
| **Tools** | Action registry and execution | Uses `BrowserClient` trait |
| **Handlers** | Specific action implementations | Use `ActionHandler` trait |
## π Project Structure
```
browsing/
βββ src/
β βββ agent/ # Agent orchestration
β β βββ service.rs # Main agent implementation
β β βββ json_extractor.rs # JSON parsing utilities
β βββ browser/ # Browser management
β β βββ session.rs # Browser session (BrowserClient impl)
β β βββ tab_manager.rs # Tab operations
β β βββ navigation.rs # Navigation operations
β β βββ screenshot.rs # Screenshot operations
β β βββ cdp.rs # CDP WebSocket client
β β βββ launcher.rs # Browser launcher
β β βββ profile.rs # Browser configuration
β βββ dom/ # DOM processing
β β βββ processor.rs # DOMProcessor trait impl
β β βββ serializer.rs # LLM-ready serialization
β β βββ tree_builder.rs # DOM tree construction
β β βββ cdp_client.rs # CDP wrapper for DOM
β β βββ html_converter.rs # HTML to markdown
β βββ tools/ # Action system
β β βββ service.rs # Tools registry
β β βββ handlers/ # Action handlers
β β β βββ navigation.rs
β β β βββ interaction.rs
β β β βββ tabs.rs
β β β βββ content.rs
β β β βββ advanced.rs
β β βββ params.rs # Parameter extraction
β βββ traits/ # Core trait abstractions
β β βββ browser_client.rs # BrowserClient trait
β β βββ dom_processor.rs # DOMProcessor trait
β βββ llm/ # LLM integration
β β βββ base.rs # ChatModel trait
β βββ actor/ # Low-level interactions
β β βββ page.rs # Page operations
β β βββ element.rs # Element operations
β β βββ mouse.rs # Mouse interactions
β βββ config/ # Configuration
β βββ error/ # Error types
β βββ utils/ # Utilities
βββ Cargo.toml
```
## π¨ Design Principles
### Trait-Facing Design
- **BrowserClient** - Abstract browser operations for testing and alternative backends
- **DOMProcessor** - Pluggable DOM processing implementations
- **ActionHandler** - Extensible action system
- **ChatModel** - LLM provider abstraction
### Separation of Concerns
- **TabManager** - Tab operations (create, switch, close)
- **NavigationManager** - Navigation logic
- **ScreenshotManager** - Screenshot capture
- **Handlers** - Focused action implementations
### DRY (Don't Repeat Yourself)
- **ActionParams** - Reusable parameter extraction
- **JSONExtractor** - Centralized JSON parsing
- **SessionGuard** - Unified session access
### KISS (Keep It Simple, Stupid)
- Split complex methods into focused helpers
- Clear naming and single responsibility
- Minimal dependencies between modules
## π§ͺ Testing
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_agent_workflow
# Run integration tests only
cargo test --test integration
```
### Test Coverage
- **317 tests** across all modules (all passing)
- **50+ integration tests** for full workflow
- **150+ unit tests** for individual components
- **Test files**:
- [actor_test.rs](tests/actor_test.rs) - Page, Element, Mouse, Keyboard operations (23 passed)
- [browser_managers_test.rs](tests/browser_managers_test.rs) - Navigation, Screenshot, Tab managers
- [tools_handlers_test.rs](tests/tools_handlers_test.rs) - All action handlers (49 passed)
- [agent_service_test.rs](tests/agent_service_test.rs) - Agent execution logic (32 passed)
- [agent_execution_test.rs](tests/agent_execution_test.rs) - Agent workflow tests (11 passed)
- [traits_test.rs](tests/traits_test.rs) - BrowserClient, DOMProcessor traits (24 passed)
- [utils_test.rs](tests/utils_test.rs) - URL extraction, signal handling (49 passed)
- **Mock implementations** for deterministic testing
- **Trait-based mocking** for browser/DOM components
## β οΈ Data Retention Policy
### Browser Data is NEVER Deleted
**IMPORTANT**: The `browsing` library **never deletes browser data** for safety reasons.
#### What This Means:
| **Bookmarks** | Never deleted |
| **History** | Never deleted |
| **Cookies** | Never deleted |
| **Passwords** | Never deleted |
| **Extensions** | Never deleted |
| **Cache** | Never deleted |
| **Temp Directories** | Never deleted (left in `/tmp/`) |
#### Why This Policy Exists:
1. **User Safety**: Users may specify a custom `user_data_dir` pointing to their real browser profile
2. **Catastrophe Prevention**: Accidentally deleting a user's real browser data (bookmarks, history, passwords) would be devastating
3. **Debugging**: Leaving temp directories allows inspection after crashes or failures
4. **User Control**: Users are responsible for managing their own browser data
#### How It Works:
When no `user_data_dir` is specified:
```rust
let profile = BrowserProfile {
user_data_dir: None, // Uses temp directory: /tmp/browser-use-1738369200000/
..Default::default()
};
```
When `browser.stop()` is called:
- β
Browser process is killed
- β
In-memory state is cleared
- β User data directory is **NOT** deleted
#### Managing Temporary Data:
Users are responsible for cleanup:
```bash
# List browser temp directories
ls -la /tmp/browser-use-*
# Delete old temp directories (optional, manual cleanup)
rm -rf /tmp/browser-use-1738369200000/
```
#### Using a Custom Data Directory:
```rust
let profile = BrowserProfile {
user_data_dir: Some("/path/to/custom/profile".into()),
..Default::default()
};
```
**Warning**: If you point to your real browser profile, the library will NOT protect it. You're responsible for that directory.
## π§ Configuration
### Browser Profile
```rust
use browsing::BrowserProfile;
let profile = BrowserProfile {
headless: true,
browser_type: browsing::BrowserType::Chrome,
user_data_dir: None,
disable_gpu: true,
..Default::default()
};
```
### Agent Settings
```rust
use browsing::agent::views::AgentSettings;
let agent = Agent::new(...)
.with_max_steps(50)
.with_settings(AgentSettings {
override_system_message: Some("Custom system prompt".to_string()),
..Default::default()
});
```
## π API Documentation
Generate and view API docs:
```bash
cargo doc --open
```