Struct spider_client::RequestParams
source · pub struct RequestParams {Show 36 fields
pub url: Option<String>,
pub request: Option<RequestType>,
pub limit: Option<u32>,
pub return_format: Option<ReturnFormat>,
pub tld: Option<bool>,
pub depth: Option<u32>,
pub cache: Option<bool>,
pub budget: Option<HashMap<String, u32>>,
pub black_list: Option<Vec<String>>,
pub white_list: Option<Vec<String>>,
pub locale: Option<String>,
pub cookies: Option<String>,
pub stealth: Option<bool>,
pub headers: Option<HashMap<String, String>>,
pub anti_bot: Option<bool>,
pub metadata: Option<bool>,
pub viewport: Option<HashMap<String, i32>>,
pub encoding: Option<String>,
pub subdomains: Option<bool>,
pub user_agent: Option<String>,
pub store_data: Option<bool>,
pub gpt_config: Option<Vec<String>>,
pub fingerprint: Option<bool>,
pub storageless: Option<bool>,
pub readability: Option<bool>,
pub proxy_enabled: Option<bool>,
pub respect_robots: Option<bool>,
pub query_selector: Option<String>,
pub full_resources: Option<bool>,
pub sitemap: Option<bool>,
pub page_insights: Option<bool>,
pub return_embeddings: Option<bool>,
pub request_timeout: Option<u32>,
pub run_in_background: Option<bool>,
pub skip_config_checks: Option<bool>,
pub chunking_alg: Option<ChunkingAlgDict>,
}Expand description
Structure representing request parameters.
Fields§
§url: Option<String>The URL to be crawled.
request: Option<RequestType>The type of request to be made.
limit: Option<u32>The maximum number of pages the crawler should visit.
return_format: Option<ReturnFormat>The format in which the result should be returned.
tld: Option<bool>Specifies whether to only visit the top-level domain.
depth: Option<u32>The depth of the crawl.
cache: Option<bool>Specifies whether the request should be cached.
budget: Option<HashMap<String, u32>>The budget for various resources.
black_list: Option<Vec<String>>The blacklist routes to ignore. This can be a Regex string pattern.
white_list: Option<Vec<String>>The whitelist routes to only crawl. This can be a Regex string pattern and used with black_listing.
locale: Option<String>The locale to be used during the crawl.
The cookies to be set for the request, formatted as a single string.
stealth: Option<bool>Specifies whether to use stealth techniques to avoid detection.
headers: Option<HashMap<String, String>>The headers to be used for the request.
anti_bot: Option<bool>Specifies whether anti-bot measures should be used.
metadata: Option<bool>Specifies whether to include metadata in the response.
viewport: Option<HashMap<String, i32>>The dimensions of the viewport.
encoding: Option<String>The encoding to be used for the request.
subdomains: Option<bool>Specifies whether to include subdomains in the crawl.
user_agent: Option<String>The user agent string to be used for the request.
store_data: Option<bool>Specifies whether the response data should be stored.
gpt_config: Option<Vec<String>>Configuration settings for GPT (general purpose texture mappings).
fingerprint: Option<bool>Specifies whether to use fingerprinting protection.
storageless: Option<bool>Specifies whether to perform the request without using storage.
readability: Option<bool>Specifies whether readability optimizations should be applied.
proxy_enabled: Option<bool>Specifies whether to use a proxy for the request.
respect_robots: Option<bool>Specifies whether to respect the site’s robots.txt file.
query_selector: Option<String>CSS selector to be used to filter the content.
full_resources: Option<bool>Specifies whether to load all resources of the crawl target.
sitemap: Option<bool>Specifies whether to use the sitemap links.
page_insights: Option<bool>Get page insights to determine information like request duration, accessibility, and other web vitals. Requires the metadata parameter to be set to true.
return_embeddings: Option<bool>Returns the OpenAI embeddings for the title and description. Other values, such as keywords, may also be included. Requires the metadata parameter to be set to true.
request_timeout: Option<u32>The timeout for the request, in milliseconds.
run_in_background: Option<bool>Specifies whether to run the request in the background.
skip_config_checks: Option<bool>Specifies whether to skip configuration checks.
chunking_alg: Option<ChunkingAlgDict>The chunking algorithm to use.