Skip to main content

Module builder

Module builder 

Source
Expand description

§Builder Module

Provides the CrawlerBuilder, a fluent API for constructing and configuring Crawler instances with customizable settings and components.

§Overview

The CrawlerBuilder simplifies the process of assembling various spider-core components into a fully configured web crawler. It provides a flexible, ergonomic interface for setting up all aspects of the crawling process.

§Key Features

  • Concurrency Configuration: Control the number of concurrent downloads, parsing workers, and pipeline processors
  • Component Registration: Attach custom downloaders, middlewares, and pipelines
  • Checkpoint Management: Configure automatic saving and loading of crawl state (requires checkpoint feature)
  • Statistics Integration: Initialize and connect the StatCollector
  • Default Handling: Automatic addition of essential middlewares when needed

§Example

use spider_core::CrawlerBuilder;
use spider_middleware::rate_limit::RateLimitMiddleware;
use spider_pipeline::console::ConsolePipeline;
use spider_util::error::SpiderError;

async fn setup_crawler() -> Result<(), SpiderError> {
    let crawler = CrawlerBuilder::new(MySpider)
        .max_concurrent_downloads(10)
        .max_parser_workers(4)
        .add_middleware(RateLimitMiddleware::default())
        .add_pipeline(ConsolePipeline::new())
        .with_checkpoint_path("./crawl.checkpoint")
        .build()
        .await?;

    crawler.start_crawl().await
}

Structs§

CrawlerBuilder
A fluent builder for constructing Crawler instances.