spider-lib 🕷️🕸️
spider-lib is an asynchronous web scraping library for Rust, inspired by Scrapy. It features a powerful, modular architecture designed for high-performance data fetching. Currently under active development.
Architecture
Leverages Rust's async capabilities for efficient I/O and parallel scraping. Employs a modular, actor-based design with core components: Downloader, Scheduler, and ItemPipeline for flexible workflow customization.
Quick Start
To begin, clone the repository and execute an example:
This command initiates a spider to collect book data from a sample website and exports it to a CSV file.
Contribution
For ideas or bug reports, please open an issue or submit a pull request.