spider_middleware/lib.rs
1//! # spider-middleware
2//!
3//! Provides built-in middleware implementations for the `spider-lib` framework.
4//!
5//! ## Overview
6//!
7//! The `spider-middleware` crate contains a comprehensive collection of middleware
8//! implementations that extend the functionality of web crawlers. Middlewares
9//! intercept and process requests and responses, enabling features like rate
10//! limiting, retries, user-agent rotation, and more.
11//!
12//! ## Available Middlewares
13//!
14//! - **Rate Limiting**: Controls request rates to prevent server overload
15//! - **Retries**: Automatically retries failed or timed-out requests
16//! - **User-Agent Rotation**: Manages and rotates user agents
17//! - **Referer Management**: Handles the `Referer` header
18//! - **Cookies**: Persists cookies across requests to maintain sessions
19//! - **HTTP Caching**: Caches responses to accelerate development
20//! - **Robots.txt**: Adheres to `robots.txt` rules
21//! - **Proxy**: Manages and rotates proxy servers
22//!
23//! ## Architecture
24//!
25//! Each middleware implements the `Middleware` trait, allowing them to intercept
26//! requests before they're sent and responses after they're received. This
27//! enables flexible, composable behavior customization for crawlers.
28//!
29//! ## Example
30//!
31//! ```rust,ignore
32//! use spider_middleware::rate_limit::RateLimitMiddleware;
33//! use spider_middleware::retry::RetryMiddleware;
34//!
35//! // Add middlewares to your crawler
36//! let crawler = CrawlerBuilder::new(MySpider)
37//! .add_middleware(RateLimitMiddleware::default())
38//! .add_middleware(RetryMiddleware::new())
39//! .build()
40//! .await?;
41//! ```
42
43pub mod middleware;
44pub mod rate_limit;
45pub mod referer;
46pub mod request;
47pub mod retry;
48pub mod user_agent;
49
50pub use spider_util::request::Request;
51pub use spider_util::response::Response;
52
53pub mod prelude;
54pub mod cookies;
55pub mod http_cache;
56pub mod proxy;
57pub mod robots_txt;