Expand description
§Builder Module
Provides the CrawlerBuilder, a fluent API for constructing and configuring
Crawler instances with customizable settings and components.
§Overview
The CrawlerBuilder simplifies the process of assembling various spider-core
components into a fully configured web crawler. It provides a flexible,
ergonomic interface for setting up all aspects of the crawling process.
§Key Features
- Concurrency Configuration: Control the number of concurrent downloads, parsing workers, and pipeline processors
- Component Registration: Attach custom downloaders, middlewares, and pipelines
- Checkpoint Management: Configure automatic saving and loading of crawl state
(requires
checkpointfeature) - Statistics Integration: Initialize and connect the
StatCollector - Default Handling: Automatic addition of essential middlewares when needed
§Example
ⓘ
use spider_core::CrawlerBuilder;
use spider_middleware::rate_limit::RateLimitMiddleware;
use spider_pipeline::console::ConsolePipeline;
use spider_util::error::SpiderError;
async fn setup_crawler() -> Result<(), SpiderError> {
let crawler = CrawlerBuilder::new(MySpider)
.max_concurrent_downloads(10)
.max_parser_workers(4)
.add_middleware(RateLimitMiddleware::default())
.add_pipeline(ConsolePipeline::new())
.with_checkpoint_path("./crawl.checkpoint")
.build()
.await?;
crawler.start_crawl().await
}Structs§
- Crawler
Builder - A fluent builder for constructing
Crawlerinstances.