Skip to main content

Module checkpoint

Module checkpoint 

Source
Expand description

Module for managing crawler checkpoints.

This module defines the data structures (SchedulerCheckpoint, Checkpoint) and functions for saving and loading the state of a crawler. Checkpoints enable the crawler to gracefully recover from interruptions or to resume a crawl at a later time. They capture the state of the scheduler (pending requests, visited URLs, salvaged requests) and the item pipelines.

Structs§

Checkpoint
A complete checkpoint of the crawler’s state.
SchedulerCheckpoint
A snapshot of the scheduler’s state.

Functions§

save_checkpoint