Skip to main content

Crate spider_pipeline

Crate spider_pipeline 

Source
Expand description

§spider-pipeline

Item pipelines for cleanup, validation, deduplication, and output.

Pipelines run after parsing. This crate contains both in-memory stages such as transforms and validators, and output backends such as CSV, JSON, SQLite, and streaming JSON.

§Example

use spider_pipeline::json::JsonPipeline;
use spider_pipeline::console::ConsolePipeline;

let crawler = CrawlerBuilder::new(MySpider)
    .add_pipeline(JsonPipeline::new("output.json")?)
    .add_pipeline(ConsolePipeline::new())
    .build()
    .await?;

Modules§

console
Pipeline that logs items as they pass through.
dedup
Pipeline that drops duplicate items by selected fields.
pipeline
Pipeline trait and lifecycle hooks.
schema
Schema-aware item workflows and export helpers.
transform
Item transformation pipeline.
validation
Item validation pipeline.