spider 1.1.1

Multithreaded Web spider crawler written in Rust.
Documentation

Spider

crate version

Multithreaded Web spider crawler written in Rust.

Depensencies

$ apt install openssl libssl-dev

Usage

Add this dependency to your Cargo.toml file.

[dependencies]
spider = "1.0.2"

and then you'll be able to use library. Here a simple example

extern crate spider;

use spider::website::Website;

fn main() {
    let mut website: Website = Website::new("https://choosealicense.com");
    localhost.crawl();

    for page in localhost.get_pages() {
        println!("- {}", page.get_url());
    }
}

You can use Configuration object to configure your crawler:

// ..
let mut website: Website = Website::new("https://choosealicense.com");
website.configuration.blacklist_url.push("https://choosealicense.com/licenses/".to_string());
website.configuration.respect_robots_txt = true;
website.configuration.verbose = true;
localhost.crawl();
// ..

TODO

  • multi-threaded system
  • respect robot.txt file
  • add configuration object for polite delay, etc..
  • add polite delay
  • parse command line arguments

Contribute

I am open-minded to any contribution. Just fork & commit on another branch.