Expand description
Builder for constructing and configuring the Crawler instance.
This module provides the CrawlerBuilder, a fluent API for
setting up and customizing a web crawler. It simplifies the process of
assembling various spider-lib components, including:
- Defining concurrency settings for downloads, parsing, and pipelines.
- Attaching custom
Downloaderimplementations. - Registering
Middlewares to process requests and responses. - Adding
Pipelines to process scraped items. - Configuring checkpointing for persistence and fault tolerance.
- Initializing and integrating a
StatCollectorfor gathering crawl statistics.
The builder handles default configurations (e.g., adding a default User-Agent middleware if none is specified) and loading existing checkpoints.
Structs§
- Crawler
Builder - Crawler
Config - Configuration for the crawler’s concurrency settings.