Expand description
§spider-core
Core engine of the spider-lib web scraping framework.
Provides the main components: Crawler, Scheduler, Spider trait, and infrastructure.
§Example
ⓘ
use spider_core::{Crawler, CrawlerBuilder, Spider};
use spider_util::{response::Response, error::SpiderError, item::ParseOutput};
#[spider_macro::scraped_item]
struct MyItem {
title: String,
url: String,
}
struct MySpider;
#[async_trait::async_trait]
impl Spider for MySpider {
type Item = MyItem;
fn start_urls(&self) -> Vec<&'static str> { vec!["https://example.com"] }
async fn parse(&mut self, response: Response) -> Result<ParseOutput<Self::Item>, SpiderError> {
todo!()
}
}
async fn run_crawler() -> Result<(), SpiderError> {
let crawler = CrawlerBuilder::new(MySpider).build().await?;
crawler.start_crawl().await
}Re-exports§
pub use builder::CrawlerBuilder;pub use crawler::Crawler;pub use scheduler::Scheduler;pub use spider::Spider;pub use tokio;
Modules§
- builder
- Builder Module
- crawler
- Crawler Module
- prelude
- A “prelude” for users of the
spider-corecrate. - scheduler
- Scheduler Module
- spider
- Spider Module
- state
- Module for tracking the operational state of the crawler.
- stats
- Statistics Module
Structs§
- DashMap
- DashMap is an implementation of a concurrent associative array/hashmap in Rust.
- Reqwest
Client Downloader - Concrete implementation of Downloader using reqwest client
Traits§
- Downloader
- A trait for HTTP downloaders that can fetch web pages and apply middleware
- Simple
Http Client - A simple HTTP client trait for fetching web content.
Attribute Macros§
- async_
trait - scraped_
item - A procedural macro to derive the
ScrapedItemtrait.