Crate spider_client

Source
Expand description

The spider-client module provides the primary interface and functionalities for the Spider web crawler library, which is designed for rapid and efficient crawling of web pages to gather links using isolated contexts.

§Features

  • Multi-threaded Crawling: Spider can utilize multiple threads to parallelize the crawling process, drastically improving performance and allowing the ability to gather millions of pages in a short time.

  • Configurable: The library provides various options to configure the crawling behavior, such as setting the depth of crawling, user-agent strings, delays between requests, and more.

  • Link Gathering: One of the primary objectives of Spider is to gather and manage links from the web pages it crawls, compiling them into a structured format for further use.

§Examples

Basic usage of the Spider client might look like this:

use spider_client::{Spider, RequestType, RequestParams};
use tokio;

#[tokio::main]
async fn main() {
    let spider = Spider::new(Some("myspiderapikey".into())).expect("API key must be provided");

    let url = "https://spider.cloud";

    // Scrape a single URL
    let scraped_data = spider.scrape_url(url, None, "application/json").await.expect("Failed to scrape the URL");

    println!("Scraped Data: {:?}", scraped_data);

    // Crawl a website
    let crawler_params = RequestParams {
        limit: Some(1),
        proxy_enabled: Some(true),
        store_data: Some(false),
        metadata: Some(false),
        request: Some(RequestType::Http),
        ..Default::default()
    };

    let crawl_result = spider.crawl_url(url, Some(crawler_params), false, "application/json", None::<fn(serde_json::Value)>).await.expect("Failed to crawl the URL");

    println!("Crawl Result: {:?}", crawl_result);
}

§Modules

  • config: Contains the configuration options for the Spider client.
  • utils: Utility functions used by the Spider client.

Structs§

CSSSelector
ChunkingAlgDict
Structure representing the Chunking algorithm dictionary.
DataParam
Delay
EventTracker
IdleNetwork
QueryRequest
Query request to get a document.
RequestParams
Structure representing request parameters.
SearchRequestParams
The structure representing request parameters for a search request.
Selector
Spider
Represents a Spider with API key and HTTP client.
Timeout
TransformParams
Structure representing request parameters for transforming files.
Viewport
View port handling for chrome.
WaitFor
WebhookSettings
Represents the settings for a webhook configuration

Enums§

ChunkingType
Enum representing different types of Chunking.
RedirectPolicy
RequestType
the request type to perform
ReturnFormat
Enum representing different return formats.
ReturnFormatHandling
Send multiple return formats.
WebAutomation

Type Aliases§

CSSExtractionMap
ExecutionScriptsMap
WebAutomationMap