# Architecture Documentation
This document provides a comprehensive overview of the Browsing project architecture, design patterns, and testing strategy.
## Table of Contents
- [Overview](#overview)
- [Design Principles](#design-principles)
- [Architecture Layers](#architecture-layers)
- [Module Structure](#module-structure)
- [Key Traits](#key-traits)
- [Testing Strategy](#testing-strategy)
- [Data Flow](#data-flow)
- [Error Handling](#error-handling)
## Overview
Browsing is a Rust-based autonomous web browsing library for AI agents. It follows a **trait-based architecture** that enables:
- **Testability**: Mock implementations for unit testing
- **Extensibility**: Custom actions, browsers, and LLM providers
- **Maintainability**: Clear separation of concerns
- **Performance**: Efficient DOM processing and token usage
### High-Level Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ User Layer │
│ ┌─────────────┬──────────────┬──────────────┬─────────┐ │
│ │ CLI Tool │ MCP Server │ Library │ Examples│ │
│ └──────┬──────┴──────┬───────┴──────┬───────┴────┬────┘ │
└─────────┼─────────────┼──────────────┼────────────┼───────┘
│ │ │ │
┌─────────┼─────────────┼──────────────┼────────────┼───────┐
│ │ Agent Layer │ │ │
│ ┌──────▼──────┬──────────────┬──────────────┬────▼─────┐ │
│ │ Agent │ DOMProcessor │ LLM │ Tools │ │
│ │ Service │ (trait) │ (trait) │ Service │ │
│ └──────┬──────┴──────┬───────┴──────┬───────┴────┬────┘ │
└─────────┼─────────────┼──────────────┼────────────┼───────┘
│ │ │ │
┌─────────┼─────────────┼──────────────┼────────────┼───────┐
│ │ Browser Layer │ │ │
│ ┌──────▼──────┬──────────────┬──────────────┬────▼─────┐ │
│ │ Browser │ TabManager │ Navigation │ Actor │ │
│ │ (trait) │ Screenshot │ Manager │ System │ │
│ └──────┬──────┴──────┬───────┴──────┬───────┴────┬────┘ │
└─────────┼─────────────┼──────────────┼────────────┼───────┘
│ │ │ │
┌─────────┼─────────────┼──────────────┼────────────┼───────┐
│ │ Infrastructure Layer │ │ │
│ ┌──────▼──────┬──────────────┬──────────────┬────▼─────┐ │
│ │ CDP Client│ Config │ Error │ Utils │ │
│ │ │ Logging │ Handling │ │ │
│ └────────────┴──────────────┴──────────────┴──────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Design Principles
### 1. Trait-Based Design
All major components are defined as traits to enable:
- **Dependency Injection**: Inject mock implementations for testing
- **Multiple Implementations**: Support different browsers, LLMs, etc.
- **Polymorphism**: Use different implementations interchangeably
### 2. Separation of Concerns
Each module has a single, well-defined responsibility:
- **Agent**: Orchestrates the browsing task
- **Browser**: Manages browser lifecycle
- **DOM**: Extracts and processes page content
- **Tools**: Executes specific actions
- **Actor**: Low-level browser interactions
### 3. SOLID Principles
- **S**ingle Responsibility: Each struct/trait has one reason to change
- **O**pen/Closed: Open for extension (custom actions), closed for modification
- **L**iskov Substitution: Mock implementations can replace real ones
- **I**nterface Segregation: Small, focused traits
- **D**ependency Inversion: Depend on abstractions (traits), not concretions
### 4. DRY (Don't Repeat Yourself)
Shared utilities and traits reduce code duplication:
- **ActionParams**: Reusable parameter extraction
- **JSONExtractor**: Centralized JSON parsing with repair
- **SessionGuard**: Unified session access pattern
### 5. KISS (Keep It Simple, Stupid)
Complex functionality is broken into simple, focused components:
- **Managers**: Single-purpose managers (TabManager, NavigationManager, etc.)
- **Handlers**: Individual action handlers
- **Clear naming**: Self-documenting code
## Architecture Layers
### 1. User Interface Layer
Provides multiple ways to use the library:
#### CLI Tool (`src/bin/cli.rs`)
- Command-line interface for autonomous browsing
- Commands: `run`, `launch`, `connect`
- Configuration via environment variables and CLI arguments
#### MCP Server (`src/bin/mcp_server.rs`)
- Model Context Protocol server for AI assistants
- Tools: navigate, get_content, click, input, screenshot
- Prompts: browse_task template
- Resources: browser://current for page content
#### Library Interface (`src/lib.rs`)
- Public API exports
- Re-exports main types: Browser, Agent, Config
- Examples for common use cases
### 2. Agent Layer
Coordinates the autonomous browsing task.
#### Agent Service (`src/agent/service.rs`)
- **Purpose**: Orchestrates browsing tasks
- **Responsibilities**:
- Task execution loop
- LLM interaction
- Action parsing and execution
- History tracking
- Usage tracking (tokens, cost)
- **Key Methods**:
- `run()`: Execute the agent
- `step()`: Execute one agent step
- `build_messages()`: Construct LLM messages
#### DOM Processor (`src/dom/processor.rs`)
- **Purpose**: Extract and process page content
- **Trait**: `DOMProcessor`
- **Implementations**:
- CDP-based DOM extraction
- LLM-ready serialization
- Selector map generation
#### LLM Integration (`src/llm/base.rs`)
- **Purpose**: Abstract LLM provider
- **Trait**: `ChatModel`
- **Supports**:
- Chat completions
- Streaming responses
- Token usage tracking
- Tool/function calling
#### Tools Service (`src/tools/service.rs`)
- **Purpose**: Registry and executor for actions
- **Components**:
- Action model parsing
- Handler registry
- Action execution
- Custom action support
### 3. Browser Layer
Manages browser lifecycle and interactions.
#### Browser Session (`src/browser/session.rs`)
- **Purpose**: Main browser implementation
- **Trait**: `BrowserClient`
- **Responsibilities**:
- Browser lifecycle (start, stop)
- Navigation
- Tab management
- Screenshot capture
- State retrieval
#### Managers
- **TabManager** (`src/browser/tab_manager.rs`): Tab operations
- **NavigationManager** (`src/browser/navigation.rs`): Navigation operations
- **ScreenshotManager** (`src/browser/screenshot.rs`): Screenshot operations
#### Actor System (`src/actor/`)
Low-level browser interactions:
- **Page** (`page.rs`): Page-level operations
- **Element** (`element.rs`): Element operations
- **Mouse** (`mouse.rs`): Mouse interactions
- **Keyboard** (`keyboard.rs`): Keyboard input
### 4. Infrastructure Layer
Provides foundational utilities and services.
#### CDP Client (`src/browser/cdp.rs`)
- **Purpose**: Chrome DevTools Protocol client
- **Features**:
- WebSocket communication
- Command execution (no memory leaks — uses `&str` directly)
- Session management
- Event handling
- Retry logic with exponential backoff
#### Configuration (`src/config.rs`)
- **Purpose**: Configuration management
- **Sources**:
- Environment variables
- .env files
- Configuration structs
#### Error Handling (`src/error.rs`)
- **Purpose**: Centralized error types
- **Error Types**:
- `BrowsingError`: Main error enum
- Specific variants: Browser, Dom, Tool, LLM, Config
#### Utilities (`src/utils.rs`)
- **Purpose**: Shared utility functions
- **Features**:
- URL extraction
- Domain matching
- Signal handling
## Module Structure
```
src/
├── agent/ # Agent orchestration
│ ├── service.rs # Main agent implementation
│ ├── json_extractor.rs # JSON parsing utilities
│ ├── views.rs # Data types
│ └── mod.rs
├── browser/ # Browser management
│ ├── session.rs # Browser session (BrowserClient impl)
│ ├── tab_manager.rs # Tab operations
│ ├── navigation.rs # Navigation operations
│ ├── screenshot.rs # Screenshot operations
│ ├── cdp.rs # CDP WebSocket client
│ ├── launcher.rs # Browser launcher
│ ├── profile.rs # Browser configuration
│ ├── views.rs # Data types
│ └── mod.rs
├── dom/ # DOM processing
│ ├── processor.rs # DOMProcessor trait impl
│ ├── serializer.rs # LLM-ready serialization
│ ├── tree_builder.rs # DOM tree construction
│ ├── cdp_client.rs # CDP wrapper for DOM
│ ├── html_converter.rs # HTML to markdown
│ ├── views.rs # Data types
│ └── mod.rs
├── tools/ # Action system
│ ├── service.rs # Tools registry
│ ├── handlers/ # Action handlers
│ │ ├── navigation.rs
│ │ ├── interaction.rs
│ │ ├── tabs.rs
│ │ ├── content.rs
│ │ ├── advanced.rs
│ │ └── mod.rs
│ ├── views.rs # Data types
│ └── mod.rs
├── traits/ # Core trait abstractions
│ ├── browser_client.rs # BrowserClient trait
│ ├── dom_processor.rs # DOMProcessor trait
│ └── mod.rs
├── llm/ # LLM integration
│ ├── base.rs # ChatModel trait
│ └── mod.rs
├── actor/ # Low-level interactions
│ ├── page.rs # Page operations
│ ├── element.rs # Element operations
│ ├── mouse.rs # Mouse interactions
│ ├── keyboard.rs # Keyboard input
│ └── mod.rs
├── config/ # Configuration
│ └── mod.rs
├── error.rs # Error types
├── logging.rs # Logging setup
├── utils.rs # Utilities
├── views.rs # Shared data types
└── lib.rs # Public API
```
## Key Traits
### BrowserClient
Abstracts browser operations for testing and alternative backends.
```rust
#[async_trait]
pub trait BrowserClient: Send + Sync {
async fn start(&mut self) -> Result<()>;
async fn navigate(&mut self, url: &str) -> Result<()>;
async fn get_current_url(&self) -> Result<String>;
async fn create_tab(&mut self, url: Option<&str>) -> Result<String>;
async fn switch_to_tab(&mut self, target_id: &str) -> Result<()>;
async fn close_tab(&mut self, target_id: &str) -> Result<()>;
async fn get_tabs(&self) -> Result<Vec<TabInfo>>;
fn get_page(&self) -> Result<Page>;
async fn take_screenshot(&self, path: Option<&str>, full_page: bool) -> Result<Vec<u8>>;
// ... more methods
}
```
### DOMProcessor
Abstracts DOM processing operations.
```rust
#[async_trait]
pub trait DOMProcessor: Send + Sync {
async fn get_serialized_dom(&self) -> Result<SerializedDOMState>;
async fn get_page_state_string(&self) -> Result<String>;
async fn get_selector_map(&self) -> Result<HashMap<u32, DOMInteractedElement>>;
}
```
### ChatModel
Abstracts LLM provider interactions.
```rust
#[async_trait]
pub trait ChatModel: Send + Sync {
async fn chat(&self, messages: Vec<ChatMessage>) -> Result<ChatInvokeCompletion>;
async fn chat_stream(&self, messages: Vec<ChatMessage>) -> Result<BoxStream<ChatInvokeCompletion>>;
}
```
### ActionHandler
Abstracts action implementations.
```rust
#[async_trait]
pub trait ActionHandler: Send + Sync {
async fn execute(&self, params: &ActionParams<'_>, context: &mut ActionContext<'_>) -> Result<ActionResult>;
}
```
## Testing Strategy
### Test Organization
```
tests/
├── actor_test.rs # Actor module tests (60+ tests)
├── browser_managers_test.rs # Browser managers tests (50+ tests)
├── tools_handlers_test.rs # Tools handlers tests (50+ tests)
├── agent_service_test.rs # Agent service tests (40+ tests)
├── traits_test.rs # Traits tests (30+ tests)
├── utils_test.rs # Utilities tests (50+ tests)
├── browser_test.rs # Browser integration tests
├── dom_test.rs # DOM integration tests
├── agent_test.rs # Agent integration tests
├── tools_test.rs # Tools integration tests
├── integration_test.rs # Full workflow integration tests
├── integration_workflow_test.rs # End-to-end workflow tests
└── ... (additional test files)
```
### Test Categories
#### 1. Unit Tests
- **Purpose**: Test individual functions and methods
- **Examples**:
- Key code mapping
- URL parsing
- Domain matching
- Data structure validation
#### 2. Integration Tests
- **Purpose**: Test multiple components working together
- **Examples**:
- Browser navigation
- DOM extraction
- Agent execution flow
- Tool execution
#### 3. Trait Tests
- **Purpose**: Test trait implementations and mock objects
- **Examples**:
- Mock BrowserClient
- Mock DOMProcessor
- Trait method validation
#### 4. Property-Based Tests
- **Purpose**: Verify invariants across many inputs
- **Examples**:
- URL encoding/decoding
- Domain pattern matching
- Data structure consistency
### Mock Implementations
The project provides mock implementations for testing:
```rust
struct MockBrowserClient {
started: bool,
current_url: String,
navigation_count: AtomicUsize,
}
#[async_trait]
impl BrowserClient for MockBrowserClient {
async fn start(&mut self) -> Result<()> {
self.started = true;
Ok(())
}
// ... other methods
}
```
### Test Coverage Summary
| Actor | 29 | Keyboard, Mouse, Page, Element operations |
| Browser | 10 | Navigation, Screenshot, Tab management |
| Tools | 56 | All action handlers |
| Agent | 48 | Execution logic, history, usage tracking |
| Traits | 24 | BrowserClient, DOMProcessor implementations |
| Utils | 52 | URL extraction, domain matching, signals |
| DOM | 4 | DOM extraction and processing |
| Error | 13 | Error handling and recovery |
| Security | 4 | Security validation |
| Integration | 115+ | End-to-end workflows |
| **Total** | **355** | All passing, 0 ignored |
## Data Flow
### Agent Execution Flow
```
┌─────────────┐
│ Task │
└──────┬──────┘
│
▼
┌─────────────┐
│ Agent │
│ .run() │
└──────┬──────┘
│
▼
┌─────────────────────────────┐
│ For each step (max_steps): │
└────────────┬────────────────┘
│
▼
┌────────────────┐
│ Get Page State │
│ (DOMProcessor)│
└────────┬───────┘
│
▼
┌────────────────┐
│ Build Messages │
│ (with state) │
└────────┬───────┘
│
▼
┌────────────────┐
│ Call LLM │
│ (ChatModel) │
└────────┬───────┘
│
▼
┌────────────────┐
│ Parse Action │
│ (JSONExtractor)│
└────────┬───────┘
│
▼
┌────────────────┐
│ Execute Action │
│ (Tools) │
└────────┬───────┘
│
▼
┌────────────────┐
│ Track History │
│ & Usage │
└────────┬───────┘
│
▼
┌────────────────┐
│ Check Done │
│ Condition │
└────────────────┘
```
### Action Execution Flow
```
┌─────────────┐
│ Action │
│ Parameters │
└──────┬──────┘
│
▼
┌──────────────────┐
│ Get Handler │
│ (from registry) │
└──────┬───────────┘
│
▼
┌──────────────────┐
│ Execute Handler │
│ - BrowserClient │
│ - DOMProcessor │
│ - ActionContext │
└──────┬───────────┘
│
▼
┌──────────────────┐
│ Return Result │
│ - Content │
│ - Memory │
│ - State │
└──────────────────┘
```
## Error Handling
### Error Hierarchy
```
BrowsingError
├── Browser(String) # Browser-related errors
├── Dom(String) # DOM processing errors
├── Tool(String) # Tool/action errors
├── LLM(String) # LLM provider errors
├── Config(String) # Configuration errors
└── Other(String) # Other errors
```
### Error Propagation
Errors are propagated using Rust's `Result<T>` type and the `?` operator:
```rust
async fn navigate(&mut self, url: &str) -> Result<()> {
self.validate_url(url)?; // Returns early on error
self.cdp_navigate(url).await?; // Propagates CDP errors
Ok(())
}
```
### Error Recovery
The agent implements several recovery strategies:
1. **JSON Repair**: Repairs malformed JSON from LLMs
2. **Retry Logic**: Retries failed actions
3. **Graceful Degradation**: Continues on non-critical errors
4. **Error Logging**: Logs errors for debugging
## Design Patterns Used
### 1. Strategy Pattern
- **Traits**: BrowserClient, DOMProcessor, ChatModel
- **Purpose**: Enable different implementations
### 2. Builder Pattern
- **Types**: Browser, Agent, Config
- **Purpose**: Fluent configuration
### 3. Factory Pattern
- **Types**: BrowserLauncher, Handlers
- **Purpose**: Create instances with context
### 4. Repository Pattern
- **Types**: TabManager, NavigationManager, ScreenshotManager
- **Purpose**: Manage domain operations
### 5. Observer Pattern
- **Types**: CDP event handling
- **Purpose**: React to browser events
### 6. Command Pattern
- **Types**: ActionHandler, Tools
- **Purpose**: Encapsulate actions as objects
## Performance Considerations
### Headless-First Design
- **Default headless**: `BrowserProfile::default()` runs `--headless=new` (modern Chrome headless)
- **17+ lightweight flags**: Disables extensions, sync, background networking, logging, notifications, etc.
- **Fast startup**: 200ms initial sleep + 200ms CDP polling interval (vs 1000ms + 500ms previously)
### Memory Efficiency
- **No `Box::leak`**: CDP client uses `&str` directly instead of leaking heap strings per command
- **Cached regexes**: `JSONExtractor` uses `LazyLock<Regex>` — compiled once, reused forever
- **No `Vec<char>`**: JSON brace counting iterates `char_indices()` directly on `&str`
- **Normalized `tag_name`**: `node_name` lowercased once at construction, not per-call
- **Ownership over cloning**: `get_serialized_dom` passes tree ownership to serializer
### Token Optimization
- **Selective Extraction**: Only extract interactive elements
- **Content Pruning**: Limit text content length
- **Tree Pruning**: Remove irrelevant DOM nodes
- **Single-pass JSON parsing**: Agent output parsed once; repair only attempted on failure
### Serde Efficiency
- **No double round-trips**: `to_dict()` converts `serde_json::Value::Object` directly to `HashMap`
- **No triple parsing**: `parse_agent_output` parses once, repairs only if initial parse fails
### Caching
- **CDP Sessions**: Reuse sessions for multiple commands
- **Selector Maps**: Cache element mappings
- **DOM State**: Cache serialized state
- **Static Regexes**: `LazyLock` for compile-once regex patterns
### Concurrency
- **Async/Await**: Non-blocking I/O operations
- **Tokio Runtime**: Efficient async runtime
- **Parallel Requests**: Concurrent CDP commands
### Deduplication
- **Browser::start()**: Single unified session setup path (launch vs connect), shared target discovery code
## Security Considerations
### Input Validation
- **URL Validation**: Validate and sanitize URLs
- **Parameter Validation**: Validate action parameters
- **File Path Validation**: Validate file paths for screenshots
### Sandboxing
- **Headless by Default**: Runs without visible UI, reducing attack surface
- **Lightweight Flags**: 17+ flags disabling unnecessary Chrome features (extensions, sync, etc.)
- **Browser Flags**: Use Chrome's sandbox flags
- **User Data Dir**: Isolated browser profile
- **No Root**: Don't run as root
### Data Privacy
- **Local Processing**: Process data locally
- **No Telemetry**: No data collection
- **Configurable LLM**: User controls LLM provider
## Future Enhancements
### Planned Features
- [ ] Performance benchmarks
- [ ] More browser backends (Firefox, Safari)
- [ ] Enhanced DOM processing (paint order)
- [ ] Distributed agent execution
- [ ] Advanced error recovery
- [x] Headless-first design with lightweight flags
- [x] Memory leak elimination (CDP client)
- [x] Regex caching for JSON extraction
- [x] DOM tree cloning reduction
### Community Contributions
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## References
- [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/)
- [Rust Async Book](https://rust-lang.github.io/async-book/)
- [Rust Design Patterns](https://rust-unofficial.github.io/patterns/)