Spider
Multithreaded Web spider crawler written in Rust.
Dependencies
Usage
Add this dependency to your Cargo.toml file.
[]
= "1.0.2"
and then you'll be able to use library. Here a simple example
extern crate spider;
use Website;
You can use Configuration
object to configure your crawler:
// ..
let mut website: Website = new;
website.configuration.blacklist_url.push;
website.configuration.respect_robots_txt = true;
website.configuration.verbose = true;
website.configuration.delay = 2000;
website.crawl;
// ..
TODO
- multi-threaded system
- respect robot.txt file
- add configuration object for polite delay, etc..
- add polite delay
- parse command line arguments
Contribute
I am open-minded to any contribution. Just fork & commit
on another branch.