browsing 0.1.2

Lightweight MCP/API for browser automation: navigate, get content (text), screenshot. Parallelism via RwLock.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
# Browsing

**Lightweight MCP/API for browser automation**

A concise MCP server and Rust library: **navigate**, **get_links**, **follow_link**, **list_content** (links+images), **get_content**, **get_image**, **save_content**, **screenshot** (full or element). Lazy browser init. Parallel reads via RwLock.

## 🎯 Usage Modes

1. **πŸ”Œ MCP Server** (primary) - `navigate`, `get_links`, `follow_link`, `list_content`, `get_content`, `get_image`, `save_content`, `screenshot`, `generate_sitemap` tools for AI assistants
2. **⌨️ CLI** - Autonomous browsing tasks
3. **πŸ“¦ Library** - Full agent system with LLM, custom actions

## ✨ Why Browsing?

Building AI agents that can navigate and interact with websites is challenging. You need to:

- **Extract structured data from unstructured HTML** - Parse complex DOM trees and make them LLM-readable
- **Handle browser automation reliably** - Manage browser lifecycle, CDP connections, and process management
- **Coordinate multiple subsystems** - Orchestrate DOM extraction, LLM inference, and action execution
- **Maintain testability** - Mock components for unit testing without real browsers
- **Support extensibility** - Add custom actions, browser backends, and LLM providers

**Browsing solves all of this** with a clean, modular, and well-tested architecture.

## 🎯 Key Features

### πŸ—οΈ Trait-Based Architecture
- **BrowserClient trait** - Abstract browser operations for easy mocking and alternative backends
- **DOMProcessor trait** - Pluggable DOM processing implementations
- **ActionHandler trait** - Extensible action system for custom behaviors

### πŸ€– Autonomous Agent System
- Complete agent execution loop with LLM integration
- Robust action parsing with JSON repair
- History tracking with state snapshots
- Graceful error handling and recovery

### 🌐 Full Browser Automation
- Cross-platform support (macOS, Linux, Windows)
- Automatic browser detection
- Chrome DevTools Protocol (CDP) integration
- Tab management (create, switch, close)
- Screenshot capture (page and element-level)

### πŸ“Š Advanced DOM Processing
- Full CDP integration (DOM, AX tree, Snapshot)
- LLM-ready serialization with interactive element indices
- Accessibility tree support for better semantic understanding
- Optimized for token efficiency

### πŸ”§ Extensible & Maintainable
- Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
- Custom action registration
- Utility traits for reduced code duplication
- Comprehensive test coverage (200+ tests)

## πŸ“¦ Installation

### As a Library

```toml
[dependencies]
browsing = "0.1"
tokio = { version = "1.40", features = ["full"] }
```

### As a CLI Tool

```bash
cargo install --path . --bin browsing
```

### As an MCP Server

```bash
cargo build --release --bin browsing-mcp
```

## πŸš€ Quick Start

### 1️⃣ CLI Usage

```bash
# Run an autonomous browsing task
browsing run "Find the latest news about AI" --url https://news.ycombinator.com --headless

# Launch a browser and get CDP URL
browsing launch --headless

# Connect to existing browser
browsing connect ws://localhost:9222/devtools/browser/abc123
```

**πŸ“– [Full CLI Documentation](docs/CLI_USAGE.md)**

### 2️⃣ MCP Server Usage

Configure in Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "browsing": {
      "command": "/path/to/browsing/target/release/browsing-mcp",
      "env": {
        "BROWSER_USE_HEADLESS": "true"
      }
    }
  }
}
```

Then ask Claude:
```
"Navigate to rust-lang.org, get the links, follow the second link, and screenshot the main content area"
```

**πŸ“– [Full MCP Documentation](docs/MCP_USAGE.md)**

### 3️⃣ Library Usage

```rust
use anyhow::Result;
use browsing::{Browser, Config};

#[tokio::main]
async fn main() -> Result<()> {
    browsing::init();
    
    let config = Config::from_env();
    let browser = Browser::launch(config.browser_profile).await?;
    
    browser.navigate("https://example.com").await?;
    
    let state = browser.get_browser_state_summary(true).await?;
    println!("Title: {}", state.title);
    
    Ok(())
}
```

**πŸ“– [Full Library Documentation](docs/LIBRARY_USAGE.md)**

### Browser Launch Options

```rust
use browsing::{Browser, BrowserProfile};

// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);

// Option 2: Connect to existing browser
let browser = Browser::new(profile)
    .with_cdp_url("http://localhost:9222".to_string());

// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
    .with_executable_path(std::path::PathBuf::from("/path/to/chrome"));
```

### Using Traits for Testing

```rust
use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;

// Create mock browser for testing
struct MockBrowser {
    navigation_count: std::sync::atomic::AtomicUsize,
}

#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
    async fn start(&mut self) -> Result<(), BrowsingError> {
        Ok(())
    }

    async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
        self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
        Ok(())
    }

    // ... implement other trait methods
}

#[tokio::test]
async fn test_agent_with_mock_browser() {
    let mock_browser = Box::new(MockBrowser {
        navigation_count: std::sync::atomic::AtomicUsize::new(0),
    });

    // Test agent behavior without real browser
    let dom_processor = Box::new(MockDOMProcessor::new());
    let llm = MockLLM::new();

    let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
    // ... test agent
}
```

## πŸ“š Usage Examples

### Content Download

```rust
use browsing::{Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::traits::DOMProcessor;

#[tokio::main]
async fn main() -> browsing::error::Result<()> {
    let mut browser = Browser::new(BrowserProfile::default());
    browser.start().await?;

    // Navigate to website
    browser.navigate("https://www.ibm.com").await?;
    tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;

    // Extract content
    let cdp_client = browser.get_cdp_client()?;
    let session_id = browser.get_session_id()?;
    let target_id = browser.get_current_target_id()?;

    let dom_processor = DOMProcessorImpl::new()
        .with_cdp_client(cdp_client, session_id)
        .with_target_id(target_id);

    let page_content = dom_processor.get_page_state_string().await?;
    println!("Extracted {} bytes of content", page_content.len());

    // Save to file
    std::fs::write("ibm_content.txt", page_content)?;
    Ok(())
}
```

**Run this example:**
```bash
cargo run --example ibm_content_download
```

### Screenshot Capture

```rust
use browsing::Browser;

let browser = Browser::new(BrowserProfile::default());
browser.start().await?;

// Full page screenshot
let screenshot_data = browser.take_screenshot(
    Some("screenshot.png"),  // path
    true,                      // full_page
).await?;

// Viewport only
let viewport = browser.take_screenshot(
    Some("viewport.png"),
    false,
).await?;
```

### Direct Browser Control

```rust
use browsing::{Browser, BrowserProfile};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut browser = Browser::new(BrowserProfile::default());
    browser.start().await?;

    // Navigate
    browser.navigate("https://example.com").await?;

    // Get current URL
    let url = browser.get_current_url().await?;
    println!("Current URL: {}", url);

    // Tab management
    browser.create_new_tab(Some("https://hackernews.com")).await?;
    let tabs = browser.get_tabs().await?;
    println!("Open tabs: {}", tabs.len());

    // Switch tabs
    browser.switch_to_tab(&tabs[0].target_id).await?;

    Ok(())
}
```

### Custom Actions

```rust
use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;

struct CustomActionHandler;

#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
    async fn execute(
        &self,
        params: &ActionParams<'_>,
        context: &mut ActionContext<'_>,
    ) -> Result<ActionResult> {
        // Custom action logic here
        Ok(ActionResult {
            extracted_content: Some("Custom result".to_string()),
            ..Default::default()
        })
    }
}

// Register custom action
agent.tools.register_custom_action(
    "custom_action".to_string(),
    "Description of custom action".to_string(),
    None,  // domains
    CustomActionHandler,
);
```

## πŸ—οΈ Architecture

Browsing follows **SOLID principles** with a focus on separation of concerns, testability, and maintainability.

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Agent                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Browser   β”‚ DOMProcessor β”‚     LLM      β”‚  Tools  β”‚  β”‚
β”‚  β”‚   (trait)   β”‚    (trait)   β”‚  (trait)     β”‚         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚             β”‚              β”‚            β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚             β”‚              β”‚            β”‚
    β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
    β”‚  Browser   β”‚ β”‚DomSvc  β”‚   β”‚  LLM   β”‚  β”‚ Handlers β”‚
    β”‚            β”‚ β”‚        β”‚   β”‚        β”‚  β”‚          β”‚
    β”‚TabManager  β”‚ β”‚CDP     β”‚   β”‚Chat    β”‚  β”‚Navigationβ”‚
    β”‚NavManager  β”‚ β”‚HTML    β”‚   β”‚Model   β”‚  β”‚Interactionβ”‚
    β”‚Screenshot  β”‚ β”‚Tree    β”‚   β”‚        β”‚  β”‚Tabs      β”‚
    β”‚            β”‚ β”‚Builder β”‚   β”‚        β”‚  β”‚Content   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Key Components

| Component | Responsibility | Trait-Based |
|-----------|---------------|-------------|
| **Agent** | Orchestrates browser, LLM, and DOM processing | Uses `BrowserClient`, `DOMProcessor` |
| **Browser** | Manages browser session and lifecycle | Implements `BrowserClient` |
| **DOMProcessor** | Extracts and serializes DOM | Implements `DOMProcessor` |
| **Tools** | Action registry and execution | Uses `BrowserClient` trait |
| **Handlers** | Specific action implementations | Use `ActionHandler` trait |

## πŸ“ Project Structure

```
browsing/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agent/              # Agent orchestration
β”‚   β”‚   β”œβ”€β”€ service.rs      # Main agent implementation
β”‚   β”‚   └── json_extractor.rs # JSON parsing utilities
β”‚   β”œβ”€β”€ browser/            # Browser management
β”‚   β”‚   β”œβ”€β”€ session.rs      # Browser session (BrowserClient impl)
β”‚   β”‚   β”œβ”€β”€ tab_manager.rs  # Tab operations
β”‚   β”‚   β”œβ”€β”€ navigation.rs   # Navigation operations
β”‚   β”‚   β”œβ”€β”€ screenshot.rs   # Screenshot operations
β”‚   β”‚   β”œβ”€β”€ cdp.rs          # CDP WebSocket client
β”‚   β”‚   β”œβ”€β”€ launcher.rs     # Browser launcher
β”‚   β”‚   └── profile.rs      # Browser configuration
β”‚   β”œβ”€β”€ dom/                # DOM processing
β”‚   β”‚   β”œβ”€β”€ processor.rs    # DOMProcessor trait impl
β”‚   β”‚   β”œβ”€β”€ serializer.rs   # LLM-ready serialization
β”‚   β”‚   β”œβ”€β”€ tree_builder.rs # DOM tree construction
β”‚   β”‚   β”œβ”€β”€ cdp_client.rs   # CDP wrapper for DOM
β”‚   β”‚   └── html_converter.rs # HTML to markdown
β”‚   β”œβ”€β”€ tools/              # Action system
β”‚   β”‚   β”œβ”€β”€ service.rs      # Tools registry
β”‚   β”‚   β”œβ”€β”€ handlers/       # Action handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ navigation.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ interaction.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ tabs.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ content.rs
β”‚   β”‚   β”‚   └── advanced.rs
β”‚   β”‚   └── params.rs       # Parameter extraction
β”‚   β”œβ”€β”€ traits/             # Core trait abstractions
β”‚   β”‚   β”œβ”€β”€ browser_client.rs  # BrowserClient trait
β”‚   β”‚   └── dom_processor.rs   # DOMProcessor trait
β”‚   β”œβ”€β”€ llm/                # LLM integration
β”‚   β”‚   └── base.rs         # ChatModel trait
β”‚   β”œβ”€β”€ actor/              # Low-level interactions
β”‚   β”‚   β”œβ”€β”€ page.rs         # Page operations
β”‚   β”‚   β”œβ”€β”€ element.rs      # Element operations
β”‚   β”‚   └── mouse.rs        # Mouse interactions
β”‚   β”œβ”€β”€ config/             # Configuration
β”‚   β”œβ”€β”€ error/              # Error types
β”‚   └── utils/              # Utilities
└── Cargo.toml
```

## 🎨 Design Principles

### Trait-Facing Design
- **BrowserClient** - Abstract browser operations for testing and alternative backends
- **DOMProcessor** - Pluggable DOM processing implementations
- **ActionHandler** - Extensible action system
- **ChatModel** - LLM provider abstraction

### Separation of Concerns
- **TabManager** - Tab operations (create, switch, close)
- **NavigationManager** - Navigation logic
- **ScreenshotManager** - Screenshot capture
- **Handlers** - Focused action implementations

### DRY (Don't Repeat Yourself)
- **ActionParams** - Reusable parameter extraction
- **JSONExtractor** - Centralized JSON parsing
- **SessionGuard** - Unified session access

### KISS (Keep It Simple, Stupid)
- Split complex methods into focused helpers
- Clear naming and single responsibility
- Minimal dependencies between modules

## πŸ§ͺ Testing

```bash
# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_agent_workflow

# Run integration tests only
cargo test --test integration
```

### Test Coverage
- **317 tests** across all modules (all passing)
- **50+ integration tests** for full workflow
- **150+ unit tests** for individual components
- **Test files**:
  - [actor_test.rs]tests/actor_test.rs - Page, Element, Mouse, Keyboard operations (23 passed)
  - [browser_managers_test.rs]tests/browser_managers_test.rs - Navigation, Screenshot, Tab managers
  - [tools_handlers_test.rs]tests/tools_handlers_test.rs - All action handlers (49 passed)
  - [agent_service_test.rs]tests/agent_service_test.rs - Agent execution logic (32 passed)
  - [agent_execution_test.rs]tests/agent_execution_test.rs - Agent workflow tests (11 passed)
  - [traits_test.rs]tests/traits_test.rs - BrowserClient, DOMProcessor traits (24 passed)
  - [utils_test.rs]tests/utils_test.rs - URL extraction, signal handling (49 passed)
- **Mock implementations** for deterministic testing
- **Trait-based mocking** for browser/DOM components

## ⚠️ Data Retention Policy

### Browser Data is NEVER Deleted

**IMPORTANT**: The `browsing` library **never deletes browser data** for safety reasons.

#### What This Means:

| Data Type | Behavior |
|-----------|----------|
| **Bookmarks** | Never deleted |
| **History** | Never deleted |
| **Cookies** | Never deleted |
| **Passwords** | Never deleted |
| **Extensions** | Never deleted |
| **Cache** | Never deleted |
| **Temp Directories** | Never deleted (left in `/tmp/`) |

#### Why This Policy Exists:

1. **User Safety**: Users may specify a custom `user_data_dir` pointing to their real browser profile
2. **Catastrophe Prevention**: Accidentally deleting a user's real browser data (bookmarks, history, passwords) would be devastating
3. **Debugging**: Leaving temp directories allows inspection after crashes or failures
4. **User Control**: Users are responsible for managing their own browser data

#### How It Works:

When no `user_data_dir` is specified:
```rust
let profile = BrowserProfile {
    user_data_dir: None,  // Uses temp directory: /tmp/browser-use-1738369200000/
    ..Default::default()
};
```

When `browser.stop()` is called:
- βœ… Browser process is killed
- βœ… In-memory state is cleared
- ❌ User data directory is **NOT** deleted

#### Managing Temporary Data:

Users are responsible for cleanup:

```bash
# List browser temp directories
ls -la /tmp/browser-use-*

# Delete old temp directories (optional, manual cleanup)
rm -rf /tmp/browser-use-1738369200000/
```

#### Using a Custom Data Directory:

```rust
let profile = BrowserProfile {
    user_data_dir: Some("/path/to/custom/profile".into()),
    ..Default::default()
};
```

**Warning**: If you point to your real browser profile, the library will NOT protect it. You're responsible for that directory.



## πŸ”§ Configuration

### Browser Profile

```rust
use browsing::BrowserProfile;

let profile = BrowserProfile {
    headless: true,
    browser_type: browsing::BrowserType::Chrome,
    user_data_dir: None,
    disable_gpu: true,
    ..Default::default()
};
```

### Agent Settings

```rust
use browsing::agent::views::AgentSettings;

let agent = Agent::new(...)
    .with_max_steps(50)
    .with_settings(AgentSettings {
        override_system_message: Some("Custom system prompt".to_string()),
        ..Default::default()
    });
```

## πŸ“– API Documentation

Generate and view API docs:

```bash
cargo doc --open
```