# TODO
## Summary
**Overall Progress**: Core functionality complete. Systematically adding browser-use parity features.
## Recent Updates (May 2026)
- [x] Added `go_back` / `go_forward` / `reload` navigation actions via CDP
- [x] Added file operations -- `write_file`, `read_file`, and `replace_file` with path traversal protection
- [x] Added `FileHandler` module for disk file I/O within agent workflows
- [x] Added minimal CLI browser render command: `browsing render <url>`
- [x] Added CLI render test cases (argument parsing + render text formatting)
- [x] Headless browser by default -- `BrowserProfile::default()` uses `headless: true`
- [x] Modern headless mode -- `--headless=new` (faster than legacy Chrome headless)
- [x] 17+ lightweight Chrome flags -- disables extensions, sync, background networking, logging, etc.
- [x] Eliminated `Box::leak` memory leak in CDP client
- [x] Cached regex compilation in `JSONExtractor` with `LazyLock`
- [x] Eliminated `Vec<char>` allocation -- uses `char_indices()` directly on `&str`
- [x] Fixed triple JSON parsing in `parse_agent_output`
- [x] Normalized `tag_name` at construction
- [x] Removed double serde round-trip in `to_dict()`
- [x] Deduplicated `Browser::start()` -- unified session setup (~40 lines removed)
- [x] Reduced DOM tree cloning -- `get_serialized_dom` passes ownership to serializer
- [x] Optimized DOM tree builder -- final node moved into lookup instead of cloned
- [x] Reduced launch sleep 1000ms -> 200ms, CDP polling 30s -> 6s max
- [x] Updated all documentation (README, ARCHITECTURE, TODO, examples)
- [x] Removed all ignored tests -- 355+ tests passing (0 failures, 0 ignored)
- [x] Fixed JSON extractor to handle nested JSON structures using brace counting
- [x] Added IBM content download example demonstrating web scraping
## Capability Gaps vs browser-use
Browser-use (Python) has several capabilities this Rust implementation is missing. Filling these gaps systematically:
### Navigation (Priority: High)
- [x] `go_back` -- Navigate back in browser history (browser-use: `go_back`)
- [x] `go_forward` -- Navigate forward in browser history (browser-use: not listed but standard)
- [x] `reload` -- Reload the current page (browser-use: not listed but standard)
### File Operations (Priority: High)
- [x] `write_file` -- Write content to a file on disk (browser-use: `write_file`)
- [x] `read_file` -- Read content from a file on disk (browser-use: `read_file`)
- [x] `replace_file` -- Replace text within a file (browser-use: `replace_file`)
### Agent Execution (Priority: Medium)
- [ ] `max_actions_per_step` -- Execute multiple actions in a single LLM step
- [ ] `initial_actions` -- Pre-run actions before the main agent task loop
- [ ] Domain restriction enforcement -- Currently only filters prompts, does not block execution
### Element Finding (Priority: Medium)
- [ ] `get_element_by_prompt` -- Find element by natural language description using LLM
- [ ] `must_get_element_by_prompt` -- Same but errors if not found
### Visual Recording (Priority: Low)
- [ ] `generate_gif` -- Record agent actions as GIF for visual playback
- [ ] Screenshot sequence capture during agent execution
### Stealth / Anti-Detection (Priority: Low)
- [ ] Stealth mode -- Hide automation fingerprints from detection
- [ ] User-agent rotation
- [ ] WebDriver property masking
## Completed
### Core Library
- [x] Scaffold Rust project structure (single crate)
- [x] Basic module structure (agent, browser, llm, tools, dom, config, error, utils, views)
- [x] Core error types
- [x] Configuration system (from env vars)
- [x] Basic type definitions
- [x] Core view types (Agent, Browser, DOM, Token views)
- [x] Tools/actions registry system
- [x] Browser session and CDP client integration
- [x] DOM serialization and extraction (complete implementation with CDP)
- [x] DOM service integration with agent service
- [x] LLM base trait (ChatModel trait complete)
- [x] Agent service basic structure (execution loop skeleton)
- [x] Agent service implementation (action parsing, execution, history tracking)
- [x] Logging setup (tracing integration)
- [x] Configuration with .env support
- [x] JSON repair for LLM responses (anyrepair integration)
- [x] Actor implementation (page, element, mouse interactions)
- [x] Keyboard input support (key combinations, modifiers)
- [x] Action execution in tools service (search, navigate, click, input, done)
- [x] Element clicking and input using Page/Element actors
- [x] Selector map integration for element lookup by index
### CLI Interface
- [x] CLI binary with clap argument parsing
- [x] Run command for autonomous browsing tasks
- [x] Launch command for browser management
- [x] Connect command for existing browser instances
- [x] Render command for minimal webpage text output from URL input
- [x] Configuration via environment variables and files
- [x] CLI documentation (docs/CLI_USAGE.md)
### MCP Server
- [x] MCP server binary using rmcp
- [x] Tools: navigate, get_content, click, input, screenshot
- [x] Prompts: browse_task template
- [x] Resources: browser://current for page content
- [x] Lazy browser initialization
- [x] MCP documentation (docs/MCP_USAGE.md)
### Library Interface
- [x] Public API exports in lib.rs
- [x] Browser, Agent, Config re-exports
- [x] ChatModel trait for LLM integration
- [x] Example: library_usage.rs - Basic library usage
- [x] Example: custom_actions.rs - Custom action handlers
- [x] Example: ibm_content_download.rs - Web scraping demo
- [x] Example: comprehensive_showcase.rs - Full feature demonstration
- [x] Example: basic_navigation.rs - Simple navigation example
- [x] Example: simple_navigation.rs - Navigation-focused demo
- [x] Library documentation (docs/LIBRARY_USAGE.md)
### Browser Session
- [x] CDP client implementation (WebSocket connection)
- [x] Browser connection via CDP URL
- [x] Navigation handling
- [x] Page actor access
- [x] Browser launch and management (local browser - basic implementation)
- [x] Browser launcher (executable detection, port finding, process management)
- [x] Page state capture (full DOM extraction via get_serialized_dom_tree)
- [x] Screenshot support (Page, Element, and Browser session)
- [x] Tab management (list, switch, close, create tabs with actions)
- [x] Browser state summary (get_browser_state_summary with DOM, tabs, screenshot)
### DOM Service
- [x] Basic HTML parsing and text extraction
- [x] Core CDP tree extraction (_get_all_trees - snapshot, DOM tree, AX tree, device pixel ratio)
- [x] Viewport ratio calculation (_get_viewport_ratio)
- [x] CDP client session_id support (send_command_with_session)
- [x] Full DOM tree building (get_dom_tree, enhanced node construction)
- [x] Enhanced snapshot lookup (build_snapshot_lookup)
- [x] Enhanced DOM tree node types (EnhancedDOMTreeNode, EnhancedSnapshotNode, EnhancedAXNode)
- [x] DOM serializer for LLM representation (basic implementation)
- [x] get_serialized_dom_tree method
- [x] Element extraction with indices (selector map)
- [x] JSON extraction with brace counting (handles nested objects/arrays)
- [ ] Markdown extraction (enhanced)
- [ ] Paint order filtering (advanced)
- [ ] Enhanced DOM snapshot optimizations
### LLM Integration
- [x] LLM base trait (ChatModel)
- [x] ChatMessage types
- [x] ChatInvokeCompletion types
- [x] ChatModel trait with streaming support
- [x] Token counting (usage information extraction from response)
### Tools/Actions
- [x] Action registry system
- [x] Default actions (click, input, navigate, search, done, switch, close, scroll, wait, send_keys, evaluate, find_text, dropdown_options, select_dropdown, upload_file, extract)
- [x] Action execution (basic implementation)
- [x] Element interaction (click, input using Page/Element actors)
- [x] Selector map integration (get element by index, lookup backend_node_id)
- [x] Custom action registration (ActionHandler trait and registration system)
### Agent Service
- [x] Agent execution loop (complete)
- [x] Step management
- [x] LLM interaction
- [x] Action parsing from LLM response (with JSON repair)
- [x] Action execution via tools
- [x] History tracking
- [x] Task completion detection
### Actor (Low-level browser interactions)
- [x] Page actor (navigation, evaluation, screenshot, keyboard)
- [x] Element actor (click, fill, text extraction, screenshot, bounding box)
- [x] Mouse actor (click, move, scroll)
- [x] Keyboard input (press keys, key combinations)
### Utilities
- [x] URL detection and parsing
- [x] Logging setup (tracing)
- [x] Configuration with .env support
- [x] Signal handling (SIGINT/SIGTERM for graceful shutdown)
- [ ] Telemetry (optional)
### Testing
- [x] Unit tests for core modules (317+ tests passing)
- [x] Integration tests (50+ comprehensive tests passing)
- [x] Actor module tests (page, element, mouse, keyboard operations)
- [x] Browser managers tests (navigation, screenshot, tab management)
- [x] Tools handlers tests (navigation, interaction, content, tabs, advanced, file)
- [x] Agent service tests (execution logic, history tracking, usage tracking)
- [x] Agent execution tests (workflow, configuration, token tracking)
- [x] Traits tests (BrowserClient, DOMProcessor implementations)
- [x] Utilities tests (URL extraction, domain matching, signal handling)
- [x] Signal handling tests
- [x] JSON extraction tests (nested JSON support)
- [x] CLI render tests (argument parsing, text formatting)
### Documentation
- [x] API documentation (cargo doc --open)
- [x] Examples (5 working examples demonstrating all features)
- [x] README.md - Complete with architecture and usage
- [x] CLI, MCP, and Library usage guides in docs/
## Notes
- Using single crate structure (not multi-crate workspace)
- LLM integration via ChatModel trait
- Using anyrepair for JSON repair
- Using rmcp for MCP support
- Using clap for CLI argument parsing
- Rust edition 2024
## Usage Modes
### 1. CLI Tool
```bash
# Install
cargo install --path . --bin browsing
# Run autonomous task
browsing run "Find the latest news" --url https://news.ycombinator.com --headless
# Launch browser
browsing launch --headless
# Connect to existing browser
browsing connect ws://localhost:9222/devtools/browser/abc123
# Minimal CLI browser render
browsing render https://example.com --max-chars 1200
```
### 2. MCP Server
```bash
# Build
cargo build --release --bin browsing-mcp
# Run (communicates via stdio)
./target/release/browsing-mcp
# Configure in Claude Desktop
# See docs/MCP_USAGE.md for configuration details
```
### 3. Rust Library
```rust
use browsing::{Browser, Config};
let mut browser = Browser::new(Config::from_env().browser_profile);
browser.start().await?;
browser.navigate("https://example.com").await?;
```
See `docs/LIBRARY_USAGE.md` for complete API documentation.