Struct url_crawler::Crawler [−][src]
pub struct Crawler { /* fields omitted */ }
A configurable parallel web crawler.
Crawling does not occur until this type is consumed by the crawl
method.
Methods
impl Crawler
[src]
impl Crawler
pub fn new(
source: impl Into<CrawlerSource>
) -> Self
[src]
pub fn new(
source: impl Into<CrawlerSource>
) -> Self
Initializes a new crawler with a default thread count of 4
.
pub fn flags(self, flags: Flags) -> Self
[src]
pub fn flags(self, flags: Flags) -> Self
Set flags for configuring the crawler.
pub fn threads(self, threads: usize) -> Self
[src]
pub fn threads(self, threads: usize) -> Self
Specifies the number of fetcher threads to use.
Notes
- If the input is
0
,1
thread will be used. - The default thread count is
4
when not using this method.
pub fn errors(self, errors: ErrorsCallback) -> Self
[src]
pub fn errors(self, errors: ErrorsCallback) -> Self
pub fn pre_fetch(self, pre_fetch: PreFetchCallback) -> Self
[src]
pub fn pre_fetch(self, pre_fetch: PreFetchCallback) -> Self
Enables filtering items based on their filename.
Notes
Returning false
will prevent the item from being fetched.
pub fn post_fetch(self, post_fetch: PostFetchCallback) -> Self
[src]
pub fn post_fetch(self, post_fetch: PostFetchCallback) -> Self
Enables filtering items based on their filename and requested headers.
Notes
Returning false
will prevent the item from being scraped / returned.
ⓘImportant traits for CrawlIterpub fn crawl(self) -> CrawlIter
[src]
ⓘImportant traits for CrawlIter
pub fn crawl(self) -> CrawlIter
Initializes the crawling, returning an iterator of discovered files.
The crawler will continue to crawl in background threads even while the iterator is not being pulled from.