gar-crawl
A high level HTML crawler with a concise builder.
The goal of this crate is to accomplish crawling and scraping tasks with minimal boilerplate. Default propagators are provided or you can make your own, and you can modify the Reqwest client used before building a crawler.
examples
Basic usage with default options ( crawl depth: 2, workers: 40, revisit: false )
builder
.add_default_propagators // crawl to href and src links
.add_handler
.build? // construct crawler
.crawl // begin crawl
.await?;
Example with more features used
builder
.add_default_propagators // crawl to href and src links
.revisit // default false
.whitelist // stay on this site
.user_agent // set user agent
.proxy? // set up https proxy
.add_handler
.on_page
.depth // default 2
.workers // default 40
.timeout // timeout requests after 5 seconds
.build? // construct crawler
.crawl // begin crawl
.await?;
See examples/ or gar-crawl-cli/ for more examples