spider-core
The core engine of the spider-lib web scraping framework.
Overview
The spider-core crate provides the fundamental components for building web scrapers, including the main Crawler, Scheduler, Spider trait, and other essential infrastructure for managing the crawling process.
This crate implements the central orchestration layer of the web scraping framework. It manages the flow of requests and responses, coordinates concurrent operations, and provides the foundation for middleware and pipeline systems.
Key Components
- Crawler: The main orchestrator that manages the crawling process
- Scheduler: Handles request queuing and duplicate detection
- Spider: Trait defining the interface for custom scraping logic
- CrawlerBuilder: Fluent API for configuring and building crawlers
- Middleware: Interceptors for processing requests and responses
- Pipeline: Processors for scraped items
- Stats: Collection and reporting of crawl statistics
- Checkpoint: Support for resuming crawls from saved state
Architecture
The spider-core crate serves as the central hub connecting all other components of the spider framework. It handles:
- Request scheduling and execution
- Response processing
- Concurrent crawling operations
- State management
- Statistics collection
- Checkpoint and resume functionality
Usage
Most users will interact with the components re-exported from this crate through the main spider-lib facade. However, this crate can be used independently for fine-grained control over the crawling process.
use ;
use ;
;
async
Dependencies
This crate depends on:
spider-util: For basic data structures and utilitiesspider-middleware: For request/response processingspider-downloader: For HTTP request executionspider-pipeline: For item processing- Various external crates for async processing, serialization, and data structures
License
This project is licensed under the MIT License - see the LICENSE file for details.