# Browsing
**Autonomous web browsing for AI agents - Rust implementation**
Browsing is a powerful Rust library that enables AI agents to autonomously interact with web pages. It provides a clean, trait-based architecture for browser automation, DOM extraction, and LLM-driven web interactions.
## β¨ Why Browsing?
Building AI agents that can navigate and interact with websites is challenging. You need to:
- **Extract structured data from unstructured HTML** - Parse complex DOM trees and make them LLM-readable
- **Handle browser automation reliably** - Manage browser lifecycle, CDP connections, and process management
- **Coordinate multiple subsystems** - Orchestrate DOM extraction, LLM inference, and action execution
- **Maintain testability** - Mock components for unit testing without real browsers
- **Support extensibility** - Add custom actions, browser backends, and LLM providers
**Browsing solves all of this** with a clean, modular, and well-tested architecture.
## π― Key Features
### ποΈ Trait-Based Architecture
- **BrowserClient trait** - Abstract browser operations for easy mocking and alternative backends
- **DOMProcessor trait** - Pluggable DOM processing implementations
- **ActionHandler trait** - Extensible action system for custom behaviors
### π€ Autonomous Agent System
- Complete agent execution loop with LLM integration
- Robust action parsing with JSON repair
- History tracking with state snapshots
- Graceful error handling and recovery
### π Full Browser Automation
- Cross-platform support (macOS, Linux, Windows)
- Automatic browser detection
- Chrome DevTools Protocol (CDP) integration
- Tab management (create, switch, close)
- Screenshot capture (page and element-level)
### π Advanced DOM Processing
- Full CDP integration (DOM, AX tree, Snapshot)
- LLM-ready serialization with interactive element indices
- Accessibility tree support for better semantic understanding
- Optimized for token efficiency
### π§ Extensible & Maintainable
- Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
- Custom action registration
- Utility traits for reduced code duplication
- Comprehensive test coverage (74+ tests)
## π¦ Installation
```bash
# Add to Cargo.toml
[dependencies]
browsing = "0.1"
```
```bash
# Or clone from source
git clone <repository>
cd browsing-rs
cargo build
```
## π Quick Start
### Basic Example
```rust
use browsing::{Agent, Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::llm::ChatModel;
// Implement your own LLM by implementing the ChatModel trait
struct MyLLM;
#[async_trait::async_trait]
impl ChatModel for MyLLM {
fn model(&self) -> &str { "my-model" }
fn provider(&self) -> &str { "my-provider" }
async fn chat(&self, messages: &[ChatMessage]) -> Result<ChatInvokeCompletion<String>> {
// Your LLM implementation here
todo!()
}
async fn chat_stream(&self, messages: &[ChatMessage])
-> Result<Box<dyn Stream<Item = Result<String>> + Send + Unpin>> {
// Your streaming implementation here
todo!()
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Create browser profile
let profile = BrowserProfile::default();
let browser = Box::new(Browser::new(profile));
// 2. Create your LLM implementation
let llm = MyLLM;
// 3. Create DOM processor
let dom_processor = Box::new(DOMProcessorImpl::new());
// 4. Create and run agent
let mut agent = Agent::new(
"Find the top post on Hacker News".to_string(),
browser,
dom_processor,
llm,
);
let history = agent.run().await?;
println!("β
Completed in {} steps", history.history.len());
Ok(())
}
```
### Browser Launch Options
```rust
use browsing::{Browser, BrowserProfile};
// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);
// Option 2: Connect to existing browser
let browser = Browser::new(profile)
.with_cdp_url("http://localhost:9222".to_string());
// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
.with_executable_path(std::path::PathBuf::from("/path/to/chrome"));
```
### Using Traits for Testing
```rust
use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;
// Create mock browser for testing
struct MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize,
}
#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
async fn start(&mut self) -> Result<(), BrowsingError> {
Ok(())
}
async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
Ok(())
}
// ... implement other trait methods
}
#[tokio::test]
async fn test_agent_with_mock_browser() {
let mock_browser = Box::new(MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize::new(0),
});
// Test agent behavior without real browser
let dom_processor = Box::new(MockDOMProcessor::new());
let llm = MockLLM::new();
let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
// ... test agent
}
```
## π Usage Examples
### Screenshot Capture
```rust
use browsing::Browser;
let browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Full page screenshot
let screenshot_data = browser.take_screenshot(
Some("screenshot.png"), // path
true, // full_page
).await?;
// Viewport only
let viewport = browser.take_screenshot(
Some("viewport.png"),
false,
).await?;
```
### Direct Browser Control
```rust
use browsing::{Browser, BrowserProfile};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Navigate
browser.navigate("https://example.com").await?;
// Get current URL
let url = browser.get_current_url().await?;
println!("Current URL: {}", url);
// Tab management
browser.create_new_tab(Some("https://hackernews.com")).await?;
let tabs = browser.get_tabs().await?;
println!("Open tabs: {}", tabs.len());
// Switch tabs
browser.switch_to_tab(&tabs[0].target_id).await?;
// Go back
browser.go_back().await?;
Ok(())
}
```
### Custom Actions
```rust
use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;
struct CustomActionHandler;
#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
async fn execute(
&self,
params: &ActionParams<'_>,
context: &mut ActionContext<'_>,
) -> Result<ActionResult> {
// Custom action logic here
Ok(ActionResult {
extracted_content: Some("Custom result".to_string()),
..Default::default()
})
}
}
// Register custom action
agent.tools.register_custom_action(
"custom_action".to_string(),
"Description of custom action".to_string(),
None, // domains
CustomActionHandler,
);
```
## ποΈ Architecture
Browsing follows **SOLID principles** with a focus on separation of concerns, testability, and maintainability.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent β
β βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ β
β β Browser β DOMProcessor β LLM β Tools β β
β β (trait) β (trait) β (trait) β β β
β ββββββββ¬βββββββ΄βββββββ¬ββββββββ΄βββββββ¬ββββββββ΄βββββ¬βββββ β
β β β β β β
βββββββββββΌββββββββββββββΌβββββββββββββββΌβββββββββββββΌββββββββ
β β β β
βββββββΌβββββββ βββββΌβββββ ββββββΌββββ ββββββΌββββββ
β Browser β βDomSvc β β LLM β β Handlers β
β β β β β β β β
βTabManager β βCDP β βChat β βNavigationβ
βNavManager β βHTML β βModel β βInteractionβ
βScreenshot β βTree β β β βTabs β
β β βBuilder β β β βContent β
ββββββββββββββ ββββββββββ ββββββββββ ββββββββββββ
```
### Key Components
| **Agent** | Orchestrates browser, LLM, and DOM processing | Uses `BrowserClient`, `DOMProcessor` |
| **Browser** | Manages browser session and lifecycle | Implements `BrowserClient` |
| **DOMProcessor** | Extracts and serializes DOM | Implements `DOMProcessor` |
| **Tools** | Action registry and execution | Uses `BrowserClient` trait |
| **Handlers** | Specific action implementations | Use `ActionHandler` trait |
## π Project Structure
```
browsing/
βββ src/
β βββ agent/ # Agent orchestration
β β βββ service.rs # Main agent implementation
β β βββ json_extractor.rs # JSON parsing utilities
β βββ browser/ # Browser management
β β βββ session.rs # Browser session (BrowserClient impl)
β β βββ tab_manager.rs # Tab operations
β β βββ navigation.rs # Navigation operations
β β βββ screenshot.rs # Screenshot operations
β β βββ cdp.rs # CDP WebSocket client
β β βββ launcher.rs # Browser launcher
β β βββ profile.rs # Browser configuration
β βββ dom/ # DOM processing
β β βββ processor.rs # DOMProcessor trait impl
β β βββ serializer.rs # LLM-ready serialization
β β βββ tree_builder.rs # DOM tree construction
β β βββ cdp_client.rs # CDP wrapper for DOM
β β βββ html_converter.rs # HTML to markdown
β βββ tools/ # Action system
β β βββ service.rs # Tools registry
β β βββ handlers/ # Action handlers
β β β βββ navigation.rs
β β β βββ interaction.rs
β β β βββ tabs.rs
β β β βββ content.rs
β β β βββ advanced.rs
β β βββ params.rs # Parameter extraction
β βββ traits/ # Core trait abstractions
β β βββ browser_client.rs # BrowserClient trait
β β βββ dom_processor.rs # DOMProcessor trait
β βββ llm/ # LLM integration
β β βββ base.rs # ChatModel trait
β βββ actor/ # Low-level interactions
β β βββ page.rs # Page operations
β β βββ element.rs # Element operations
β β βββ mouse.rs # Mouse interactions
β βββ config/ # Configuration
β βββ error/ # Error types
β βββ utils/ # Utilities
βββ Cargo.toml
```
## π¨ Design Principles
### Trait-Facing Design
- **BrowserClient** - Abstract browser operations for testing and alternative backends
- **DOMProcessor** - Pluggable DOM processing implementations
- **ActionHandler** - Extensible action system
- **ChatModel** - LLM provider abstraction
### Separation of Concerns
- **TabManager** - Tab operations (create, switch, close)
- **NavigationManager** - Navigation logic
- **ScreenshotManager** - Screenshot capture
- **Handlers** - Focused action implementations
### DRY (Don't Repeat Yourself)
- **ActionParams** - Reusable parameter extraction
- **JSONExtractor** - Centralized JSON parsing
- **SessionGuard** - Unified session access
### KISS (Keep It Simple, Stupid)
- Split complex methods into focused helpers
- Clear naming and single responsibility
- Minimal dependencies between modules
## π§ͺ Testing
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_agent_workflow
# Run integration tests only
cargo test --test integration
```
### Test Coverage
- **74+ tests** across all modules
- **24 integration tests** for full workflow
- **50+ unit tests** for individual components
- **Mock LLM** for deterministic testing
- **Trait-based mocking** for browser/DOM components
## π§ Configuration
### Browser Profile
```rust
use browsing::BrowserProfile;
let profile = BrowserProfile {
headless: true,
browser_type: browsing::BrowserType::Chrome,
user_data_dir: None,
disable_gpu: true,
..Default::default()
};
```
### Agent Settings
```rust
use browsing::agent::views::AgentSettings;
let agent = Agent::new(...)
.with_max_steps(50)
.with_settings(AgentSettings {
override_system_message: Some("Custom system prompt".to_string()),
..Default::default()
});
```
## π API Documentation
Generate and view API docs:
```bash
cargo doc --open
```