Struct spider_client::RequestParams

source ·
pub struct RequestParams {
Show 36 fields pub url: Option<String>, pub request: Option<RequestType>, pub limit: Option<u32>, pub return_format: Option<ReturnFormat>, pub tld: Option<bool>, pub depth: Option<u32>, pub cache: Option<bool>, pub budget: Option<HashMap<String, u32>>, pub black_list: Option<Vec<String>>, pub white_list: Option<Vec<String>>, pub locale: Option<String>, pub cookies: Option<String>, pub stealth: Option<bool>, pub headers: Option<HashMap<String, String>>, pub anti_bot: Option<bool>, pub metadata: Option<bool>, pub viewport: Option<HashMap<String, i32>>, pub encoding: Option<String>, pub subdomains: Option<bool>, pub user_agent: Option<String>, pub store_data: Option<bool>, pub gpt_config: Option<Vec<String>>, pub fingerprint: Option<bool>, pub storageless: Option<bool>, pub readability: Option<bool>, pub proxy_enabled: Option<bool>, pub respect_robots: Option<bool>, pub query_selector: Option<String>, pub full_resources: Option<bool>, pub sitemap: Option<bool>, pub page_insights: Option<bool>, pub return_embeddings: Option<bool>, pub request_timeout: Option<u32>, pub run_in_background: Option<bool>, pub skip_config_checks: Option<bool>, pub chunking_alg: Option<ChunkingAlgDict>,
}
Expand description

Structure representing request parameters.

Fields§

§url: Option<String>

The URL to be crawled.

§request: Option<RequestType>

The type of request to be made.

§limit: Option<u32>

The maximum number of pages the crawler should visit.

§return_format: Option<ReturnFormat>

The format in which the result should be returned.

§tld: Option<bool>

Specifies whether to only visit the top-level domain.

§depth: Option<u32>

The depth of the crawl.

§cache: Option<bool>

Specifies whether the request should be cached.

§budget: Option<HashMap<String, u32>>

The budget for various resources.

§black_list: Option<Vec<String>>

The blacklist routes to ignore. This can be a Regex string pattern.

§white_list: Option<Vec<String>>

The whitelist routes to only crawl. This can be a Regex string pattern and used with black_listing.

§locale: Option<String>

The locale to be used during the crawl.

§cookies: Option<String>

The cookies to be set for the request, formatted as a single string.

§stealth: Option<bool>

Specifies whether to use stealth techniques to avoid detection.

§headers: Option<HashMap<String, String>>

The headers to be used for the request.

§anti_bot: Option<bool>

Specifies whether anti-bot measures should be used.

§metadata: Option<bool>

Specifies whether to include metadata in the response.

§viewport: Option<HashMap<String, i32>>

The dimensions of the viewport.

§encoding: Option<String>

The encoding to be used for the request.

§subdomains: Option<bool>

Specifies whether to include subdomains in the crawl.

§user_agent: Option<String>

The user agent string to be used for the request.

§store_data: Option<bool>

Specifies whether the response data should be stored.

§gpt_config: Option<Vec<String>>

Configuration settings for GPT (general purpose texture mappings).

§fingerprint: Option<bool>

Specifies whether to use fingerprinting protection.

§storageless: Option<bool>

Specifies whether to perform the request without using storage.

§readability: Option<bool>

Specifies whether readability optimizations should be applied.

§proxy_enabled: Option<bool>

Specifies whether to use a proxy for the request.

§respect_robots: Option<bool>

Specifies whether to respect the site’s robots.txt file.

§query_selector: Option<String>

CSS selector to be used to filter the content.

§full_resources: Option<bool>

Specifies whether to load all resources of the crawl target.

§sitemap: Option<bool>

Specifies whether to use the sitemap links.

§page_insights: Option<bool>

Get page insights to determine information like request duration, accessibility, and other web vitals. Requires the metadata parameter to be set to true.

§return_embeddings: Option<bool>

Returns the OpenAI embeddings for the title and description. Other values, such as keywords, may also be included. Requires the metadata parameter to be set to true.

§request_timeout: Option<u32>

The timeout for the request, in milliseconds.

§run_in_background: Option<bool>

Specifies whether to run the request in the background.

§skip_config_checks: Option<bool>

Specifies whether to skip configuration checks.

§chunking_alg: Option<ChunkingAlgDict>

The chunking algorithm to use.

Trait Implementations§

source§

impl Debug for RequestParams

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for RequestParams

source§

fn default() -> RequestParams

Returns the “default value” for a type. Read more
source§

impl<'de> Deserialize<'de> for RequestParams

source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl Serialize for RequestParams

source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T> Instrument for T

source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> WithSubscriber for T

source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,