spider_middleware/lib.rs
1//! # spider-middleware
2//!
3//! Provides built-in middleware implementations for the `spider-lib` framework.
4//!
5//! ## Overview
6//!
7//! The `spider-middleware` crate contains a comprehensive collection of middleware
8//! implementations that extend the functionality of web crawlers. Middlewares
9//! intercept and process requests and responses, enabling features like rate
10//! limiting, retries, user-agent rotation, and more.
11//!
12//! ## Available Middlewares
13//!
14//! - **Rate Limiting**: Controls request rates to prevent server overload
15//! - **Retries**: Automatically retries failed or timed-out requests
16//! - **Referer Management**: Handles the `Referer` header
17//! - **User-Agent Rotation**: Manages and rotates user agents (feature: `middleware-user-agent`)
18//! - **Cookies**: Persists cookies across requests to maintain sessions (feature: `middleware-cookies`)
19//! - **HTTP Caching**: Caches responses to accelerate development (feature: `middleware-cache`)
20//! - **Robots.txt**: Adheres to `robots.txt` rules (feature: `middleware-robots`)
21//! - **Proxy**: Manages and rotates proxy servers (feature: `middleware-proxy`)
22//!
23//! ## Architecture
24//!
25//! Each middleware implements the `Middleware` trait, allowing them to intercept
26//! requests before they are sent and responses after they are received. This
27//! enables flexible, composable behavior customization for crawlers.
28//!
29//! ## Example
30//!
31//! ```rust,ignore
32//! use spider_middleware::rate_limit::RateLimitMiddleware;
33//! use spider_middleware::retry::RetryMiddleware;
34//!
35//! // Add middlewares to your crawler
36//! let crawler = CrawlerBuilder::new(MySpider)
37//! .add_middleware(RateLimitMiddleware::default())
38//! .add_middleware(RetryMiddleware::new())
39//! .build()
40//! .await?;
41//! ```
42
43pub mod middleware;
44pub mod rate_limit;
45pub mod referer;
46pub mod request;
47pub mod retry;
48
49pub use spider_util::request::Request;
50pub use spider_util::response::Response;
51
52pub mod prelude;
53
54#[cfg(feature = "middleware-user-agent")]
55pub mod user_agent;
56
57#[cfg(feature = "middleware-cookies")]
58pub mod cookies;
59
60#[cfg(feature = "middleware-cache")]
61pub mod http_cache;
62
63#[cfg(feature = "middleware-proxy")]
64pub mod proxy;
65
66#[cfg(feature = "middleware-robots")]
67pub mod robots_txt;