Skip to main content

Module pipelines

Module pipelines 

Source
Expand description

Module for spider-lib item pipeline implementations.

This module serves as a container for various concrete implementations of the Pipeline trait. Each submodule within this module provides a specific mechanism for processing, storing, or transforming ScrapedItems after they have been extracted by a spider.

It re-exports several built-in pipelines such as:

  • console_writer: For printing items to the console (debugging).
  • csv_exporter: For exporting items to CSV files.
  • deduplication: For filtering out duplicate items.
  • json_writer: For exporting items to a single JSON file.
  • jsonl_writer: For exporting items to JSON Lines files.
  • sqlite_writer: For persisting items to a SQLite database.

Modulesยง

console_writer
Item Pipeline for writing scraped items to the console.
deduplication
Item Pipeline for deduplicating scraped items.