Skip to main content

Crate spider_core

Crate spider_core 

Source
Expand description

§spider-core

Core engine of the spider-lib web scraping framework.

Provides the main components: Crawler, Scheduler, Spider trait, and infrastructure.

§Example

use spider_core::{Crawler, CrawlerBuilder, Spider};
use spider_util::{response::Response, error::SpiderError, item::ParseOutput};

#[spider_macro::scraped_item]
struct MyItem {
    title: String,
    url: String,
}

struct MySpider;

#[async_trait::async_trait]
impl Spider for MySpider {
    type Item = MyItem;
    fn start_urls(&self) -> Vec<&'static str> { vec!["https://example.com"] }
    async fn parse(&mut self, response: Response) -> Result<ParseOutput<Self::Item>, SpiderError> {
        todo!()
    }
}

async fn run_crawler() -> Result<(), SpiderError> {
    let crawler = CrawlerBuilder::new(MySpider).build().await?;
    crawler.start_crawl().await
}

Re-exports§

pub use builder::CrawlerBuilder;
pub use crawler::Crawler;
pub use scheduler::Scheduler;
pub use spider::Spider;
pub use tokio;

Modules§

builder
Builder Module
crawler
Crawler Module
prelude
A “prelude” for users of the spider-core crate.
scheduler
Scheduler Module
spider
Spider Module
state
Module for tracking the operational state of the crawler.
stats
Statistics Module

Structs§

DashMap
DashMap is an implementation of a concurrent associative array/hashmap in Rust.
ReqwestClientDownloader
Concrete implementation of Downloader using reqwest client

Traits§

Downloader
A trait for HTTP downloaders that can fetch web pages and apply middleware
SimpleHttpClient
A simple HTTP client trait for fetching web content.

Attribute Macros§

async_trait
scraped_item
A procedural macro to derive the ScrapedItem trait.