Spider
Web spider framework that can spider a domain and collect pages it visits.
Depensencies
Usage
from source for command line usages
Will produce something like this
- http://localhost:4000/
- http://localhost:4000/portfolio
- http://localhost:4000/resume
- http://localhost:4000/blog
as crate for librairy usage
Add this dependency to your Cargo.toml file.
[]
= "1.0.2"
and then you'll be able to use library. Here a simple example
extern crate spider;
use Website;
TODO
- [ ]: multi-threaded system
- [ ]: respect robot.txt file
- [ ]: add configuratioon object for polite delay, etc..
- [ ]: parse command line arguments
Contribute
I am open-minded to any contribution. Just fork & commit
on another branch.