browsing 0.1.0

Autonomous web browsing for AI agents - Rust implementation
Documentation

Browsing

Autonomous web browsing for AI agents - Rust implementation

Browsing is a powerful Rust library that enables AI agents to autonomously interact with web pages. It provides a clean, trait-based architecture for browser automation, DOM extraction, and LLM-driven web interactions.

✨ Why Browsing?

Building AI agents that can navigate and interact with websites is challenging. You need to:

  • Extract structured data from unstructured HTML - Parse complex DOM trees and make them LLM-readable
  • Handle browser automation reliably - Manage browser lifecycle, CDP connections, and process management
  • Coordinate multiple subsystems - Orchestrate DOM extraction, LLM inference, and action execution
  • Maintain testability - Mock components for unit testing without real browsers
  • Support extensibility - Add custom actions, browser backends, and LLM providers

Browsing solves all of this with a clean, modular, and well-tested architecture.

🎯 Key Features

πŸ—οΈ Trait-Based Architecture

  • BrowserClient trait - Abstract browser operations for easy mocking and alternative backends
  • DOMProcessor trait - Pluggable DOM processing implementations
  • ActionHandler trait - Extensible action system for custom behaviors

πŸ€– Autonomous Agent System

  • Complete agent execution loop with LLM integration
  • Robust action parsing with JSON repair
  • History tracking with state snapshots
  • Graceful error handling and recovery

🌐 Full Browser Automation

  • Cross-platform support (macOS, Linux, Windows)
  • Automatic browser detection
  • Chrome DevTools Protocol (CDP) integration
  • Tab management (create, switch, close)
  • Screenshot capture (page and element-level)

πŸ“Š Advanced DOM Processing

  • Full CDP integration (DOM, AX tree, Snapshot)
  • LLM-ready serialization with interactive element indices
  • Accessibility tree support for better semantic understanding
  • Optimized for token efficiency

πŸ”§ Extensible & Maintainable

  • Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
  • Custom action registration
  • Utility traits for reduced code duplication
  • Comprehensive test coverage (74+ tests)

πŸ“¦ Installation

# Add to Cargo.toml
[dependencies]
browsing = "0.1"
# Or clone from source
git clone <repository>
cd browsing-rs
cargo build

πŸš€ Quick Start

Basic Example

use browsing::{Agent, Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::llm::ChatModel;

// Implement your own LLM by implementing the ChatModel trait
struct MyLLM;

#[async_trait::async_trait]
impl ChatModel for MyLLM {
    fn model(&self) -> &str { "my-model" }
    fn provider(&self) -> &str { "my-provider" }
    
    async fn chat(&self, messages: &[ChatMessage]) -> Result<ChatInvokeCompletion<String>> {
        // Your LLM implementation here
        todo!()
    }
    
    async fn chat_stream(&self, messages: &[ChatMessage]) 
        -> Result<Box<dyn Stream<Item = Result<String>> + Send + Unpin>> {
        // Your streaming implementation here
        todo!()
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create browser profile
    let profile = BrowserProfile::default();
    let browser = Box::new(Browser::new(profile));

    // 2. Create your LLM implementation
    let llm = MyLLM;

    // 3. Create DOM processor
    let dom_processor = Box::new(DOMProcessorImpl::new());

    // 4. Create and run agent
    let mut agent = Agent::new(
        "Find the top post on Hacker News".to_string(),
        browser,
        dom_processor,
        llm,
    );

    let history = agent.run().await?;
    println!("βœ… Completed in {} steps", history.history.len());

    Ok(())
}

Browser Launch Options

use browsing::{Browser, BrowserProfile};

// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);

// Option 2: Connect to existing browser
let browser = Browser::new(profile)
    .with_cdp_url("http://localhost:9222".to_string());

// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
    .with_executable_path(std::path::PathBuf::from("/path/to/chrome"));

Using Traits for Testing

use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;

// Create mock browser for testing
struct MockBrowser {
    navigation_count: std::sync::atomic::AtomicUsize,
}

#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
    async fn start(&mut self) -> Result<(), BrowsingError> {
        Ok(())
    }

    async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
        self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
        Ok(())
    }

    // ... implement other trait methods
}

#[tokio::test]
async fn test_agent_with_mock_browser() {
    let mock_browser = Box::new(MockBrowser {
        navigation_count: std::sync::atomic::AtomicUsize::new(0),
    });

    // Test agent behavior without real browser
    let dom_processor = Box::new(MockDOMProcessor::new());
    let llm = MockLLM::new();

    let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
    // ... test agent
}

πŸ“š Usage Examples

Screenshot Capture

use browsing::Browser;

let browser = Browser::new(BrowserProfile::default());
browser.start().await?;

// Full page screenshot
let screenshot_data = browser.take_screenshot(
    Some("screenshot.png"),  // path
    true,                      // full_page
).await?;

// Viewport only
let viewport = browser.take_screenshot(
    Some("viewport.png"),
    false,
).await?;

Direct Browser Control

use browsing::{Browser, BrowserProfile};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut browser = Browser::new(BrowserProfile::default());
    browser.start().await?;

    // Navigate
    browser.navigate("https://example.com").await?;

    // Get current URL
    let url = browser.get_current_url().await?;
    println!("Current URL: {}", url);

    // Tab management
    browser.create_new_tab(Some("https://hackernews.com")).await?;
    let tabs = browser.get_tabs().await?;
    println!("Open tabs: {}", tabs.len());

    // Switch tabs
    browser.switch_to_tab(&tabs[0].target_id).await?;

    // Go back
    browser.go_back().await?;

    Ok(())
}

Custom Actions

use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;

struct CustomActionHandler;

#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
    async fn execute(
        &self,
        params: &ActionParams<'_>,
        context: &mut ActionContext<'_>,
    ) -> Result<ActionResult> {
        // Custom action logic here
        Ok(ActionResult {
            extracted_content: Some("Custom result".to_string()),
            ..Default::default()
        })
    }
}

// Register custom action
agent.tools.register_custom_action(
    "custom_action".to_string(),
    "Description of custom action".to_string(),
    None,  // domains
    CustomActionHandler,
);

πŸ—οΈ Architecture

Browsing follows SOLID principles with a focus on separation of concerns, testability, and maintainability.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Agent                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Browser   β”‚ DOMProcessor β”‚     LLM      β”‚  Tools  β”‚  β”‚
β”‚  β”‚   (trait)   β”‚    (trait)   β”‚  (trait)     β”‚         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚             β”‚              β”‚            β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚             β”‚              β”‚            β”‚
    β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
    β”‚  Browser   β”‚ β”‚DomSvc  β”‚   β”‚  LLM   β”‚  β”‚ Handlers β”‚
    β”‚            β”‚ β”‚        β”‚   β”‚        β”‚  β”‚          β”‚
    β”‚TabManager  β”‚ β”‚CDP     β”‚   β”‚Chat    β”‚  β”‚Navigationβ”‚
    β”‚NavManager  β”‚ β”‚HTML    β”‚   β”‚Model   β”‚  β”‚Interactionβ”‚
    β”‚Screenshot  β”‚ β”‚Tree    β”‚   β”‚        β”‚  β”‚Tabs      β”‚
    β”‚            β”‚ β”‚Builder β”‚   β”‚        β”‚  β”‚Content   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

Component Responsibility Trait-Based
Agent Orchestrates browser, LLM, and DOM processing Uses BrowserClient, DOMProcessor
Browser Manages browser session and lifecycle Implements BrowserClient
DOMProcessor Extracts and serializes DOM Implements DOMProcessor
Tools Action registry and execution Uses BrowserClient trait
Handlers Specific action implementations Use ActionHandler trait

πŸ“ Project Structure

browsing/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agent/              # Agent orchestration
β”‚   β”‚   β”œβ”€β”€ service.rs      # Main agent implementation
β”‚   β”‚   └── json_extractor.rs # JSON parsing utilities
β”‚   β”œβ”€β”€ browser/            # Browser management
β”‚   β”‚   β”œβ”€β”€ session.rs      # Browser session (BrowserClient impl)
β”‚   β”‚   β”œβ”€β”€ tab_manager.rs  # Tab operations
β”‚   β”‚   β”œβ”€β”€ navigation.rs   # Navigation operations
β”‚   β”‚   β”œβ”€β”€ screenshot.rs   # Screenshot operations
β”‚   β”‚   β”œβ”€β”€ cdp.rs          # CDP WebSocket client
β”‚   β”‚   β”œβ”€β”€ launcher.rs     # Browser launcher
β”‚   β”‚   └── profile.rs      # Browser configuration
β”‚   β”œβ”€β”€ dom/                # DOM processing
β”‚   β”‚   β”œβ”€β”€ processor.rs    # DOMProcessor trait impl
β”‚   β”‚   β”œβ”€β”€ serializer.rs   # LLM-ready serialization
β”‚   β”‚   β”œβ”€β”€ tree_builder.rs # DOM tree construction
β”‚   β”‚   β”œβ”€β”€ cdp_client.rs   # CDP wrapper for DOM
β”‚   β”‚   └── html_converter.rs # HTML to markdown
β”‚   β”œβ”€β”€ tools/              # Action system
β”‚   β”‚   β”œβ”€β”€ service.rs      # Tools registry
β”‚   β”‚   β”œβ”€β”€ handlers/       # Action handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ navigation.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ interaction.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ tabs.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ content.rs
β”‚   β”‚   β”‚   └── advanced.rs
β”‚   β”‚   └── params.rs       # Parameter extraction
β”‚   β”œβ”€β”€ traits/             # Core trait abstractions
β”‚   β”‚   β”œβ”€β”€ browser_client.rs  # BrowserClient trait
β”‚   β”‚   └── dom_processor.rs   # DOMProcessor trait
β”‚   β”œβ”€β”€ llm/                # LLM integration
β”‚   β”‚   └── base.rs         # ChatModel trait
β”‚   β”œβ”€β”€ actor/              # Low-level interactions
β”‚   β”‚   β”œβ”€β”€ page.rs         # Page operations
β”‚   β”‚   β”œβ”€β”€ element.rs      # Element operations
β”‚   β”‚   └── mouse.rs        # Mouse interactions
β”‚   β”œβ”€β”€ config/             # Configuration
β”‚   β”œβ”€β”€ error/              # Error types
β”‚   └── utils/              # Utilities
└── Cargo.toml

🎨 Design Principles

Trait-Facing Design

  • BrowserClient - Abstract browser operations for testing and alternative backends
  • DOMProcessor - Pluggable DOM processing implementations
  • ActionHandler - Extensible action system
  • ChatModel - LLM provider abstraction

Separation of Concerns

  • TabManager - Tab operations (create, switch, close)
  • NavigationManager - Navigation logic
  • ScreenshotManager - Screenshot capture
  • Handlers - Focused action implementations

DRY (Don't Repeat Yourself)

  • ActionParams - Reusable parameter extraction
  • JSONExtractor - Centralized JSON parsing
  • SessionGuard - Unified session access

KISS (Keep It Simple, Stupid)

  • Split complex methods into focused helpers
  • Clear naming and single responsibility
  • Minimal dependencies between modules

πŸ§ͺ Testing

# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_agent_workflow

# Run integration tests only
cargo test --test integration

Test Coverage

  • 74+ tests across all modules
  • 24 integration tests for full workflow
  • 50+ unit tests for individual components
  • Mock LLM for deterministic testing
  • Trait-based mocking for browser/DOM components

πŸ”§ Configuration

Browser Profile

use browsing::BrowserProfile;

let profile = BrowserProfile {
    headless: true,
    browser_type: browsing::BrowserType::Chrome,
    user_data_dir: None,
    disable_gpu: true,
    ..Default::default()
};

Agent Settings

use browsing::agent::views::AgentSettings;

let agent = Agent::new(...)
    .with_max_steps(50)
    .with_settings(AgentSettings {
        override_system_message: Some("Custom system prompt".to_string()),
        ..Default::default()
    });

πŸ“– API Documentation

Generate and view API docs:

cargo doc --open