Expand description
The core Crawler implementation for the spider-lib framework.
This module defines the Crawler struct, which acts as the central orchestrator
for the web scraping process. It ties together the scheduler, downloader,
middlewares, spiders, and item pipelines to execute a crawl. The crawler
manages the lifecycle of requests and items, handles concurrency, and supports
checkpointing for fault tolerance.
It utilizes a task-based asynchronous model, spawning distinct tasks for handling initial requests, downloading web pages, parsing responses, and processing scraped items.