browsing 0.1.0

Autonomous web browsing for AI agents - Rust implementation
Documentation
# Migration TODO

## Summary

**Overall Progress**: ~100% Complete (Core Functionality + Signal Handling + Comprehensive Testing)

### Core Functionality ✅
- ✅ Browser launch and CDP connection
- ✅ DOM extraction and serialization
- ✅ LLM integration (ChatModel trait)
- ✅ Agent execution loop
- ✅ Screenshot support
- ✅ Actor system (Page, Element, Mouse, Keyboard)
- ✅ Signal handling for graceful shutdown
- ✅ Comprehensive test suite (74+ tests)

### Remaining Work ⏳
- ⏳ Documentation (API docs, examples, migration guide)
- ⏳ Advanced DOM features (paint order, enhanced markdown) - Optional optimizations
- ⏳ Cost calculation (token counting implemented, cost calculation can be added)
- [x] Additional actions (extract - LLM-based content extraction) - ✅ Complete
- [x] Signal handling (SIGINT/SIGTERM) - ✅ Complete
- [x] Comprehensive integration tests (24 tests) - ✅ Complete

## Completed ✅

- [x] Scaffold Rust project structure (single crate)
- [x] Basic module structure (agent, browser, llm, tools, dom, config, error, utils, views)
- [x] Core error types
- [x] Configuration system (from env vars)
- [x] Basic type definitions
- [x] Core view types (Agent, Browser, DOM, Token views)
- [x] Tools/actions registry system
- [x] Browser session and CDP client integration
- [x] DOM serialization and extraction (complete implementation with CDP)
- [x] DOM service integration with agent service
- [x] LLM base trait (ChatModel trait complete)
- [x] Agent service basic structure (execution loop skeleton)
- [x] Agent service implementation (action parsing, execution, history tracking)
- [x] Logging setup (tracing integration)
- [x] Configuration with .env support
- [x] JSON repair for LLM responses (anyrepair integration)
- [x] Actor implementation (page, element, mouse interactions)
- [x] Keyboard input support (key combinations, modifiers)
- [x] Action execution in tools service (search, navigate, click, input, done)
- [x] Element clicking and input using Page/Element actors
- [x] Selector map integration for element lookup by index

## In Progress 🚧

- [ ] Documentation (API docs, examples, migration guide)

## Pending 📋

### Core Types and Models
- [x] Action types (ActionModel, ActionResult) - Basic implementation complete
- [x] DOM element types (EnhancedDOMTreeNode, EnhancedSnapshotNode, EnhancedAXNode)
- [x] Browser profile types (BrowserProfile with basic fields)
- [x] Additional view types from Python (BrowserStateSummary, etc.) - get_browser_state_summary implemented

### Browser Session
- [x] CDP client implementation (WebSocket connection)
- [x] Browser connection via CDP URL
- [x] Navigation handling
- [x] Page actor access
- [x] Browser launch and management (local browser - basic implementation)
- [x] Browser launcher (executable detection, port finding, process management)
- [x] Page state capture (full DOM extraction via get_serialized_dom_tree)
- [x] Screenshot support (Page, Element, and Browser session)
- [x] Tab management (list, switch, close, create tabs with actions)
- [x] Browser state summary (get_browser_state_summary with DOM, tabs, screenshot)

### DOM Service
- [x] Basic HTML parsing and text extraction
- [x] Page state retrieval (placeholder)
- [x] Selector map structure (placeholder)
- [x] Core CDP tree extraction (_get_all_trees - snapshot, DOM tree, AX tree, device pixel ratio)
- [x] Viewport ratio calculation (_get_viewport_ratio)
- [x] CDP client session_id support (send_command_with_session)
- [x] Full DOM tree building (get_dom_tree, enhanced node construction)
- [x] Enhanced snapshot lookup (build_snapshot_lookup)
- [x] Enhanced DOM tree node types (EnhancedDOMTreeNode, EnhancedSnapshotNode, EnhancedAXNode)
- [x] DOM serializer for LLM representation (basic implementation)
- [x] get_serialized_dom_tree method
- [x] Element extraction with indices (selector map)
- [ ] Markdown extraction (enhanced)
- [ ] Paint order filtering (advanced)
- [ ] Enhanced DOM snapshot optimizations

### LLM Integration
- [x] LLM base trait (ChatModel)
- [x] ChatMessage types
- [x] ChatInvokeCompletion types
- [x] ChatModel trait with streaming support
- [x] Token counting (usage information extraction from response)

### Tools/Actions
- [x] Action registry system
- [x] Default actions (click, input, navigate, search, done, switch, close, scroll, go_back, wait, send_keys, evaluate, find_text, dropdown_options, select_dropdown, upload_file, extract)
- [x] Action execution (basic implementation)
- [x] Element interaction (click, input using Page/Element actors)
- [x] Selector map integration (get element by index, lookup backend_node_id)
- [x] Custom action registration (ActionHandler trait and registration system)

### Agent Service
- [x] Agent execution loop (complete)
- [x] Step management
- [x] LLM interaction
- [x] Action parsing from LLM response (with JSON repair)
- [x] Action execution via tools
- [x] History tracking
- [x] Task completion detection

### Actor (Low-level browser interactions)
- [x] Page actor (navigation, evaluation, screenshot, keyboard)
- [x] Element actor (click, fill, text extraction, screenshot, bounding box)
- [x] Mouse actor (click, move, scroll)
- [x] Keyboard input (press keys, key combinations)

### Utilities
- [x] URL detection and parsing
- [x] Logging setup (tracing)
- [x] Configuration with .env support
- [x] Signal handling (SIGINT/SIGTERM for graceful shutdown)
- [ ] Telemetry (optional)

### Testing
- [x] Unit tests for core modules (50+ tests passing)
- [x] Integration tests (24 comprehensive tests passing)
- [x] Additional test suites (browser, dom, agent, tools, llm - 13+ tests passing)
- [x] Signal handling tests (2 tests passing)
- [ ] Snapshot tests (insta) - Optional

### Documentation
- [ ] API documentation
- [ ] Examples
- [ ] Migration guide

## Notes

- Using single crate structure (not multi-crate workspace)
- LLM integration via ChatModel trait
- Using anyrepair for JSON repair
- Using rmcp for MCP support
- Rust edition 2024