Features
Core Capabilities
- Dual Implementation: Native Rust library and Python wrapper with CLI tools
- Content Conversion: Convert raw HTML to Markdown, JSON, or YAML formats
- Web Fetching: Fetch web pages with optional JavaScript rendering support
- Search Integration: Query search engines using browser mode (headless/headed/existing. NO API KEY) or API mode
- Multiple Search Engines: Support for Bing, Google, DuckDuckGo, Brave Search, Tavily, and custom engines
- Proxy Support: Use proxies in both browser-based and API-based operations
- End-to-End Pipeline: Complete workflow from search queries to content extraction for AI applications
Advanced Features (coming soon)
- Customizable Browser Controls: Control screen resolution, viewport, user-agent, and locale to mimic real users.
- Automatic CAPTCHA Handling: Detect and bypass CAPTCHAs using solvers or manual fallback.
- Intelligent Query Formulation: Enhance search queries with prompt rewriting or intent-aware generation.
- Stealth & Anti-Bot Evasion: Use fingerprint spoofing, proxy rotation, and human-like interaction patterns.
- Workflow-Oriented Task Automation: Chain multiple actions like search, click, form fill, and scrape into workflows.
- Multi-Channel Processing (MCP) Integration: Integrate with agent frameworks for context-aware, distributed task execution.
- Search Metrics & Observability: Track success rate, latency, CAPTCHA rate, and export logs to observability tools.
Architecture
Tarzi is built with a modular architecture consisting of three core components:
- Converter Module: Converts raw HTML content into structured formats
- Fetcher Module: Handles web page retrieval with multiple strategies
- Search Module: Provides search engine integration and result processing
Usage Examples
Basic Content Conversion
use ;
let converter = new;
let html = "<h1>Hello World</h1><p>This is content.</p>";
let markdown = converter.convert.await?;
Web Page Fetching
use ;
let mut fetcher = new;
// Simple HTTP request
let content = fetcher.fetch.await?;
// With JavaScript rendering
let content = fetcher.fetch.await?;
Search and Content Extraction
use ;
let mut search_engine = new;
// Search and fetch content for each result
let results_with_content = search_engine.search_and_fetch.await?;
Python Integration
# Convert HTML to Markdown
=
# Fetch web page
=
# Search web
=
CLI Usage
# Convert HTML to Markdown
# Fetch web page with JavaScript rendering
# Search and fetch content
License
Apache License 2.0 - see LICENSE file for details.
Contributors
Thank you ❤ all human and non-human contributors.