Expand description
§spider-middleware
Provides built-in middleware implementations for the spider-lib framework.
§Overview
The spider-middleware crate contains a comprehensive collection of middleware
implementations that extend the functionality of web crawlers. Middlewares
intercept and process requests and responses, enabling features like rate
limiting, retries, user-agent rotation, and more.
§Available Middlewares
- Rate Limiting: Controls request rates to prevent server overload
- Retries: Automatically retries failed or timed-out requests
- User-Agent Rotation: Manages and rotates user agents
- Referer Management: Handles the
Refererheader - Cookies: Persists cookies across requests to maintain sessions
- HTTP Caching: Caches responses to accelerate development
- Robots.txt: Adheres to
robots.txtrules - Proxy: Manages and rotates proxy servers
§Architecture
Each middleware implements the Middleware trait, allowing them to intercept
requests before they’re sent and responses after they’re received. This
enables flexible, composable behavior customization for crawlers.
§Example
ⓘ
use spider_middleware::rate_limit::RateLimitMiddleware;
use spider_middleware::retry::RetryMiddleware;
// Add middlewares to your crawler
let crawler = CrawlerBuilder::new(MySpider)
.add_middleware(RateLimitMiddleware::default())
.add_middleware(RetryMiddleware::new())
.build()
.await?;Modules§
- cookies
- Cookie Middleware to manage the Set-Cookie header
- http_
cache - HTTP Cache Middleware for caching web responses.
- middleware
- Core Middleware trait and related types for the
spider-coreframework. - prelude
- Commonly used items from the
spider-middlewarecrate. - proxy
- Auto-Rotate Proxy Middleware for rotating proxies during crawling.
- rate_
limit - Rate Limit Middleware for controlling request frequency.
- referer
- Referer Middleware for managing HTTP Referer headers.
- request
- Data structures for representing HTTP requests in
spider-lib. - retry
- Retry Middleware for handling failed requests.
- robots_
txt - Robots.txt Middleware for respecting website crawling policies.
- user_
agent - User-Agent Middleware for rotating User-Agents during crawling.