Crate url_crawler[−][src]
Configurable parallel web crawler
Example
extern crate url_crawler; use url_crawler::*; /// Function for filtering content in the crawler before a HEAD request. /// /// Only allow directory entries, and files that have the `deb` extension. fn apt_filter(url: &Url) -> bool { let url = url.as_str(); url.ends_with("/") || url.ends_with(".deb") } pub fn main() { // Create a crawler designed to crawl the given website. let crawler = Crawler::new("http://apt.pop-os.org/".into()) // Use four threads for fetching .threads(4) // Check if a URL matches this filter before performing a HEAD request on it. .pre_fetch(apt_filter) // Initialize the crawler and begin crawling. This returns immediately. .crawl(); // Process url entries as they become available for file in crawler { println!("{:#?}", file); } }
Modules
header |
HTTP header types |
Structs
CrawlIter |
Iterator that returns fetched |
Crawler |
A configurable parallel web crawler. |
Flags |
Flags for controlling the behavior of the crawler. |
Url |
A parsed URL record. |
Enums
Error | |
UrlEntry |
URLs discovered found by the web crawler |
Functions
filename_from_url |
Convenience function for getting the filename from a URL. |