Expand description
§spider-core
spider-core is the runtime crate behind the rest of the workspace.
It owns the crawler loop, scheduling, shared runtime state, statistics, and
the Spider trait used to describe crawl behavior.
If you are building an application, spider-lib is usually the easier
starting point. Depend on spider-core directly when you want the runtime
API without the facade crate.
§Example
ⓘ
use spider_core::{async_trait, CrawlerBuilder, Spider};
use spider_util::{response::Response, error::SpiderError, item::ParseOutput};
#[spider_macro::scraped_item]
struct Item {
title: String,
}
struct MySpider;
#[async_trait]
impl Spider for MySpider {
type Item = Item;
type State = ();
fn start_requests(&self) -> Result<spider_core::StartRequests<'_>, SpiderError> {
Ok(spider_core::StartRequests::Urls(vec!["https://example.com"]))
}
async fn parse(
&self,
_response: Response,
_state: &Self::State,
) -> Result<ParseOutput<Self::Item>, SpiderError> {
Ok(ParseOutput::new())
}
}
async fn run() -> Result<(), SpiderError> {
let crawler = CrawlerBuilder::new(MySpider).build().await?;
crawler.start_crawl().await
}Re-exports§
pub use builder::CrawlerBuilder;pub use engine::Crawler;pub use scheduler::Scheduler;pub use spider::Spider;pub use spider::StartRequestIter;pub use spider::StartRequests;pub use state::ConcurrentMap;pub use state::ConcurrentVec;pub use state::Counter;pub use state::Counter64;pub use state::Flag;pub use state::StateAccessMetrics;pub use state::VisitedUrls;pub use tokio;
Modules§
- builder
- Builder API for assembling a
Crawler. - config
- Configuration types used by the crawler runtime.
- engine
- Internal engine pieces used by
crate::Crawler. - prelude
- Convenient re-exports for code that depends on
spider-coredirectly. - scheduler
- Request scheduling and duplicate detection.
- spider
- The spider trait and request bootstrap types.
- state
- Runtime state helpers.
- stats
- Runtime statistics and reporting helpers.
Structs§
- DashMap
- DashMap is an implementation of a concurrent associative array/hashmap in Rust.
- Reqwest
Client Downloader - Downloader implementation backed by
reqwest::Client.
Traits§
- Downloader
- Trait implemented by HTTP downloaders used by the crawler runtime.
- Http
Client - Minimal HTTP client trait for middleware that needs direct fetches.
Attribute Macros§
- async_
trait - scraped_
item - Attribute macro for defining a scraped item type.