Skip to main content

Module builder

Module builder 

Source
Expand description

Builder for constructing and configuring the Crawler instance.

This module provides the CrawlerBuilder, a fluent API for setting up and customizing a web crawler. It simplifies the process of assembling various spider-lib components, including:

  • Defining concurrency settings for downloads, parsing, and pipelines.
  • Attaching custom Downloader implementations.
  • Registering Middlewares to process requests and responses.
  • Adding Pipelines to process scraped items.
  • Configuring checkpointing for persistence and fault tolerance.
  • Initializing and integrating a StatCollector for gathering crawl statistics.

The builder handles default configurations (e.g., adding a default User-Agent middleware if none is specified) and loading existing checkpoints.

Structs§

CrawlerBuilder
CrawlerConfig
Configuration for the crawler’s concurrency settings.