spider-pipeline 0.3.4

Pipeline implementations for the spider-lib web scraping framework.
Documentation

spider-pipeline

Item pipelines for processing, filtering, and exporting scraped data in spider-lib.

Use this crate directly when you want pipeline features without bringing the full facade crate.

Installation

[dependencies]
spider-pipeline = "0.3.4"

Built-in Pipelines

Core (always available):

  • ConsolePipeline
  • DeduplicationPipeline

Optional (feature-gated):

  • pipeline-json -> JsonPipeline
  • pipeline-jsonl -> JsonlPipeline
  • pipeline-csv -> CsvPipeline
  • pipeline-sqlite -> SqlitePipeline
  • pipeline-stream-json -> StreamJsonPipeline

Usage

use spider_pipeline::{console::ConsolePipeline, dedup::DeduplicationPipeline};

let crawler = spider_core::CrawlerBuilder::new(MySpider)
    .add_pipeline(DeduplicationPipeline::new(&["url"]))
    .add_pipeline(ConsolePipeline::new())
    .build()
    .await?;

Feature Flags

  • core (default)
  • pipeline-csv
  • pipeline-json
  • pipeline-jsonl
  • pipeline-sqlite
  • pipeline-stream-json
[dependencies]
spider-pipeline = { version = "0.3.4", features = ["pipeline-jsonl", "pipeline-csv"] }

When using via spider-lib, enable root features with the same names.

Related Crates

License

MIT. See LICENSE.