spider-lib
A modular Rust web scraping framework inspired by Scrapy.
spider-lib is the facade crate for the workspace. It re-exports core crawling, downloader, middleware, pipeline, utility, and macro APIs so you can start with one dependency and enable only the features you need.
Workspace Crates
spider-core: crawler runtime, spider trait, scheduler, builder, state, and stats.spider-downloader: downloader traits and reqwest-based downloader implementation.spider-macro: procedural macros such as#[scraped_item].spider-middleware: retry, rate limiting, robots, cookies, proxy, cache, and user-agent middleware.spider-pipeline: item processing and output pipelines (JSON, JSONL, CSV, SQLite, stream JSON).spider-util: shared request/response/item/error types and helper utilities.
Installation
[]
= "2.0.4"
= { = "1.0", = ["derive"] }
= "1.0"
serde and serde_json are required when using #[scraped_item].
Quick Start
use *;
;
;
async
Try the maintained examples:
Feature Flags
Default feature: core.
Middleware features:
middleware-cachemiddleware-autothrottlemiddleware-proxymiddleware-user-agentmiddleware-robotsmiddleware-cookies
Pipeline features:
pipeline-csvpipeline-jsonpipeline-jsonlpipeline-sqlitepipeline-stream-json
Core features:
live-stats: enables in-place terminal stat updates.checkpoint: enables checkpoint/resume support.cookie-store: enables cookie store integration (also enablesmiddleware-cookies).
Example:
[]
= { = "2.0.4", = ["middleware-robots", "pipeline-jsonl"] }
Using Workspace Crates Directly
Use spider-lib when you want the integrated API surface.
Use sub-crates directly if you need tighter dependency control or only one subsystem (for example, custom downloader integration with spider-downloader, or utility types from spider-util).
Development
Documentation
- API docs: https://docs.rs/spider-lib
- Contribution guide: CONTRIBUTING.md
License
MIT. See LICENSE.