🐒 Tarzi
Tarzi is a unified search interface designed for Retrieval-Augmented Generation (RAG) and agentic systems built on large language models. Search is a core functionality in these systems, yet most search engine providers impose API paywalls or strict rate limits—even for light or research-driven usage.
Tarzi removes these barriers by supporting both token-based APIs and free web queries across multiple search engines. With a single dependency, you can integrate and switch between different Search Engine Providers (SEPs) as needed—seamlessly and efficiently.
⚙️ Core Capabilities
- 🦀 Dual Implementation: Native Rust library and Python wrapper with CLI tools
- 🔄 Content Conversion: Convert raw HTML into Markdown, JSON, or YAML
- 🌐 Web Fetching: Fetch web pages with optional JavaScript rendering
- 🔍 Search Integration: Query search engines via browser (token-free) or API (token-required) mode
- 🧠 Multi-Engine Support: Works with Bing, Google, DuckDuckGo, Brave Search, Tavily, and custom engines
- 🛡️ Proxy Support: Bypass network bans using proxy support
- 🚀 End-to-End Workflow: Full pipeline from search to content extraction for AI and automation use cases
🧪 Advanced Features (Coming Soon)
- 🖥️ Custom Browser Controls: Set screen size, viewport, and locale for realistic behavior
- 🕵️♂️ Anti-Bot Evasion: Use fingerprint spoofing, proxy rotation, and human-like actions to avoid detection
- 🧠 Smarter Queries: Improve search results with prompt rewriting and intent-aware queries
- 🔗 Workflow Automation: Chain steps like search, click, form fill, and scraping into automated flows
- 🤖 Agent Integration (MCP): Connect with agent frameworks for context-aware, distributed task execution
- 📊 Observability: Monitor success rate, latency, CAPTCHA frequency, and export logs for analysis
Usage Examples
- Examples in Python and Rust: examples
License
Apache License 2.0 - see LICENSE file for details.
Contributors
Thank you ❤ all human and non-human contributors.