Crate spider

source ·
Expand description

Website crawling library that rapidly crawls all pages to gather links via isolated contexts.

Spider is multi-threaded crawler that can be configured to scrape web pages. It has the ability to gather tens of thousands of pages within seconds.

How to use Spider

There are a couple of ways to use Spider:

  • Concurrent is the fastest way to start crawling a web page and typically the most efficient.
    • crawl is used to crawl concurrently.
  • Sequential lets you crawl the web pages one after another respecting delay sequences.
  • Scrape Scrape the page and hold onto the HTML raw string to parse.
    • scrape is used to gather the HTML.

Basic usage

First, you will need to add spider to your Cargo.toml.

Next, simply add the website url in the struct of website and crawl, you can also crawl sequentially.

Re-exports

Modules