Skip to main content

Crate spider_core

Crate spider_core 

Source
Expand description

§spider-core

spider-core is the runtime crate behind the rest of the workspace. It owns the crawler loop, scheduling, shared runtime state, statistics, and the Spider trait used to describe crawl behavior.

If you are building an application, spider-lib is usually the easier starting point. Depend on spider-core directly when you want the runtime API without the facade crate.

§Example

use spider_core::{async_trait, CrawlerBuilder, Spider};
use spider_util::{response::Response, error::SpiderError, item::ParseOutput};

#[spider_macro::scraped_item]
struct Item {
    title: String,
}

struct MySpider;

#[async_trait]
impl Spider for MySpider {
    type Item = Item;
    type State = ();

    fn start_requests(&self) -> Result<spider_core::StartRequests<'_>, SpiderError> {
        Ok(spider_core::StartRequests::Urls(vec!["https://example.com"]))
    }

    async fn parse(
        &self,
        _response: Response,
        _state: &Self::State,
    ) -> Result<ParseOutput<Self::Item>, SpiderError> {
        Ok(ParseOutput::new())
    }
}

async fn run() -> Result<(), SpiderError> {
    let crawler = CrawlerBuilder::new(MySpider).build().await?;
    crawler.start_crawl().await
}

Re-exports§

pub use builder::CrawlerBuilder;
pub use engine::Crawler;
pub use scheduler::Scheduler;
pub use spider::Spider;
pub use spider::StartRequestIter;
pub use spider::StartRequests;
pub use state::ConcurrentMap;
pub use state::ConcurrentVec;
pub use state::Counter;
pub use state::Counter64;
pub use state::Flag;
pub use state::StateAccessMetrics;
pub use state::VisitedUrls;
pub use tokio;

Modules§

builder
Builder API for assembling a Crawler.
config
Configuration types used by the crawler runtime.
engine
Internal engine pieces used by crate::Crawler.
prelude
Convenient re-exports for code that depends on spider-core directly.
scheduler
Request scheduling and duplicate detection.
spider
The spider trait and request bootstrap types.
state
Runtime state helpers.
stats
Runtime statistics and reporting helpers.

Structs§

DashMap
DashMap is an implementation of a concurrent associative array/hashmap in Rust.
ReqwestClientDownloader
Downloader implementation backed by reqwest::Client.

Traits§

Downloader
Trait implemented by HTTP downloaders used by the crawler runtime.
HttpClient
Minimal HTTP client trait for middleware that needs direct fetches.

Attribute Macros§

async_trait
scraped_item
Attribute macro for defining a scraped item type.