Skip to main content

Module crawler

Module crawler 

Source
Expand description

The core Crawler implementation for the spider-lib framework.

This module defines the Crawler struct, which acts as the central orchestrator for the web scraping process. It ties together the scheduler, downloader, middlewares, spiders, and item pipelines to execute a crawl. The crawler manages the lifecycle of requests and items, handles concurrency, supports checkpointing for fault tolerance, and collects statistics for monitoring.

It utilizes a task-based asynchronous model, spawning distinct tasks for handling initial requests, downloading web pages, parsing responses, and processing scraped items.

Structsยง

Crawler
The central orchestrator for the web scraping process, handling requests, responses, items, concurrency, checkpointing, and statistics collection.