Expand description
The spider-client
module provides the primary interface and
functionalities for the Spider web crawler library, which is
designed for rapid and efficient crawling of web pages to gather
links using isolated contexts.
§Features
-
Multi-threaded Crawling: Spider can utilize multiple threads to parallelize the crawling process, drastically improving performance and allowing the ability to gather millions of pages in a short time.
-
Configurable: The library provides various options to configure the crawling behavior, such as setting the depth of crawling, user-agent strings, delays between requests, and more.
-
Link Gathering: One of the primary objectives of Spider is to gather and manage links from the web pages it crawls, compiling them into a structured format for further use.
§Examples
Basic usage of the Spider client might look like this:
use spider_client::{Spider, RequestType, RequestParams};
use tokio;
#[tokio::main]
async fn main() {
let spider = Spider::new(Some("myspiderapikey".into())).expect("API key must be provided");
let url = "https://spider.cloud";
// Scrape a single URL
let scraped_data = spider.scrape_url(url, None, "application/json").await.expect("Failed to scrape the URL");
println!("Scraped Data: {:?}", scraped_data);
// Crawl a website
let crawler_params = RequestParams {
limit: Some(1),
proxy_enabled: Some(true),
store_data: Some(false),
metadata: Some(false),
request: Some(RequestType::Http),
..Default::default()
};
let crawl_result = spider.crawl_url(url, Some(crawler_params), false, "application/json", None::<fn(serde_json::Value)>).await.expect("Failed to crawl the URL");
println!("Crawl Result: {:?}", crawl_result);
}
§Modules
config
: Contains the configuration options for the Spider client.utils
: Utility functions used by the Spider client.
Structs§
- CSSSelector
- Chunking
AlgDict - Structure representing the Chunking algorithm dictionary.
- Data
Param - Delay
- Event
Tracker - Idle
Network - Query
Request - Query request to get a document.
- Request
Params - Structure representing request parameters.
- Search
Request Params - The structure representing request parameters for a search request.
- Selector
- Spider
- Represents a Spider with API key and HTTP client.
- Timeout
- Transform
Params - Structure representing request parameters for transforming files.
- Viewport
- View port handling for chrome.
- WaitFor
- Webhook
Settings - Represents the settings for a webhook configuration
Enums§
- Chunking
Type - Enum representing different types of Chunking.
- Redirect
Policy - Request
Type - the request type to perform
- Return
Format - Enum representing different return formats.
- Return
Format Handling - Send multiple return formats.
- WebAutomation