🕷️ crawly
A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules.
🚀 Features
- Concurrent crawling: Takes advantage of concurrency for efficient scraping across multiple cores;
- Respects
robots.txt: Automatically fetches and adheres to website scraping guidelines; - DFS algorithm: Uses a depth-first search algorithm to crawl web links;
- Customizable with Builder Pattern: Tailor the depth of crawling, rate limits, and other parameters effortlessly;
- Cloudflare's detection: If the destination URL is hosted with Cloudflare and a mitigation is found, the URL will be skipped;
- Built with Rust: Guarantees memory safety and top-notch speed.
📦 Installation
Add crawly to your Cargo.toml:
[]
= "^0.1"
🛠️ Usage
A simple usage example:
use Result;
use Crawler;
async
Using the Builder
For more refined control over the crawler's behavior, the CrawlerBuilder comes in handy:
use Result;
use CrawlerBuilder;
async
🛡️ Cloudflare
This crate will detect Cloudflare hosted sites and if the header cf-mitigated is found, the URL will be skipped without
throwing any error.
📜 Tracing
Every function is instrumented, also this crate will emit some DEBUG messages for better comprehending the crawling flow.
🤝 Contributing
Contributions, issues, and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
📝 License
This project is MIT licensed.
💌 Contact
- Author: Dario Cancelliere
- Email: dario.cancelliere@gmail.com
- Company Website: https://www.crystalsoft.it