spider_core/engine/mod.rs
1//! # Engine Module
2//!
3//! Implements the core crawling engine that orchestrates the web scraping process.
4//!
5//! ## Overview
6//!
7//! The engine module provides the main `Crawler` struct and its associated
8//! components that manage the entire scraping workflow. It handles requests,
9//! responses, items, and coordinates the various subsystems including
10//! downloaders, middlewares, parsers, and pipelines.
11//!
12//! ## Key Components
13//!
14//! - **Crawler**: The central orchestrator that manages the crawling lifecycle
15//! - **Downloader Task**: Handles HTTP requests and response retrieval
16//! - **Parser Task**: Processes responses and extracts data according to spider logic
17//! - **Item Processor**: Handles scraped items through registered pipelines
18//! - **Middleware Manager**: Coordinates request/response processing through middlewares
19//!
20//! ## Architecture
21//!
22//! The engine uses an asynchronous, task-based model where different operations
23//! run concurrently in separate Tokio tasks. Communication between components
24//! happens through async channels, allowing for high-throughput processing.
25//!
26//! ## Internal Components
27//!
28//! These are implementation details and are not typically used directly:
29//! - `spawn_downloader_task`: Creates the task responsible for downloading web pages
30//! - `spawn_parser_task`: Creates the task responsible for parsing responses
31//! - `spawn_item_processor_task`: Creates the task responsible for processing items
32//! - `SharedMiddlewareManager`: Manages concurrent access to middlewares
33
34mod context;
35mod crawler;
36mod handler;
37mod middleware;
38mod parser;
39mod processor;
40
41pub use context::CrawlerContext;
42pub use crawler::Crawler;
43pub(crate) use handler::spawn_downloader_task;
44pub(crate) use middleware::SharedMiddlewareManager;
45pub(crate) use parser::spawn_parser_task;
46pub(crate) use processor::spawn_item_processor_task;