Skip to main content

Request

Struct Request 

Source
pub struct Request {
    pub url: String,
    pub sid: String,
    pub callback: Option<Callback>,
    pub callback_name: Option<String>,
    pub priority: i32,
    pub dont_filter: bool,
    pub meta: HashMap<String, Value>,
    pub retry_count: u32,
    pub session_kwargs: HashMap<String, Value>,
    /* private fields */
}
Expand description

A crawl request with URL, priority, metadata, and optional callback.

Request is the unit of work in the crawl pipeline. Create one with Request::new, customize it with the builder methods (with_priority, with_sid, with_callback, etc.), and return it from your spider’s parse method wrapped in SpiderOutput::FollowRequest.

Two requests are considered equal if their fingerprints match (or, if no fingerprint has been computed, if their URLs match). Ordering is by priority (higher values are dequeued first).

Fields§

§url: String

The URL to fetch. This is the only required field; everything else has sensible defaults set by Request::new.

§sid: String

The session identifier used to select a fetcher from the SessionManager. An empty string means “use the default session.”

§callback: Option<Callback>

An optional callback to process the response. When present, the engine calls this closure instead of Spider::parse. Because closures are not cloneable, use copy_without_callback when you need to duplicate a request for retries.

§callback_name: Option<String>

The name of the callback, kept for debugging output and checkpoint serialization. It has no effect on routing; the actual closure in callback is what gets invoked.

§priority: i32

The scheduling priority. Higher values are dequeued first by the Scheduler. The default is 0. Use negative values to de-prioritize retries or background pages.

§dont_filter: bool

Whether to bypass the duplicate-request filter. Set this to true when you intentionally want to re-fetch a URL – for example, to poll a page for updates or to retry after a transient failure.

§meta: HashMap<String, Value>

Arbitrary metadata passed through the crawl pipeline. Whatever you put here is available on the request when it reaches your callback, which is useful for carrying context (e.g., a parent-page ID) between parse stages.

§retry_count: u32

The number of times this request has been retried after receiving a blocked response. The engine increments this automatically and stops retrying once it exceeds Spider::max_blocked_retries.

§session_kwargs: HashMap<String, Value>

Additional keyword arguments forwarded to the session fetcher. Common keys include "method", "headers", "data", and "json". These are also factored into the deduplication fingerprint when Spider::fp_include_kwargs is enabled.

Implementations§

Source§

impl Request

Source

pub fn new(url: impl Into<String>) -> Self

Creates a new request for the given URL with default settings.

All fields are initialized to their zero/empty values: priority 0, no session override, no callback, no metadata, and duplicate filtering enabled. Use the with_* builder methods to customize.

Source

pub fn with_sid(self, sid: impl Into<String>) -> Self

Sets the session identifier for this request, routing it to a specific fetcher registered in the SessionManager. Use this when your spider manages multiple sessions with different cookies, proxies, or authentication contexts.

Source

pub fn with_priority(self, priority: i32) -> Self

Sets the scheduling priority for this request. Higher values are dequeued first. Use positive values for important pages (e.g., product detail pages) and negative values for low-priority background work.

Source

pub fn with_dont_filter(self, dont_filter: bool) -> Self

Sets whether this request should bypass the duplicate-request filter. Pass true to allow re-fetching a URL that has already been seen. This is useful for polling pages that change over time or for manual retries.

Source

pub fn with_meta(self, meta: HashMap<String, Value>) -> Self

Attaches arbitrary metadata to this request. The metadata map is carried through the entire crawl pipeline and is accessible in your callback or parse implementation, making it the standard way to pass context (such as a parent URL or category label) between crawl stages.

Source

pub fn with_callback(self, name: &str, callback: Callback) -> Self

Attaches a named callback to process the response for this request. When the engine receives the response, it will call this closure instead of Spider::parse. The name is stored for debugging and checkpoint serialization; it does not affect dispatch.

Source

pub fn domain(&self) -> String

Extracts the domain (host) from the request URL. Returns an empty string if the URL cannot be parsed. This is used internally for domain allowlisting and per-domain statistics, but you can also call it in your own code to inspect which host a request targets.

Source

pub fn update_fingerprint( &mut self, include_kwargs: bool, include_headers: bool, keep_fragments: bool, ) -> &[u8]

Computes and caches a SHA-1 fingerprint for deduplication, returning it as a byte slice.

The fingerprint is derived from the session ID, HTTP method, URL, and request body. The boolean flags control whether session kwargs, headers, and URL fragments are also included. Once computed, the fingerprint is cached so subsequent calls are free. The Scheduler calls this automatically when a request is enqueued.

Source

pub fn fingerprint(&self) -> Option<&[u8]>

Returns the cached fingerprint, if one has been computed via update_fingerprint. Returns None if the fingerprint has never been calculated. The cache manager uses this to look up previously stored responses.

Source

pub fn copy_without_callback(&self) -> Self

Creates a clone of this request without the callback closure.

Because Callback is a boxed dyn Fn and cannot be cloned, this method copies every field except callback (which is set to None). The engine uses this when creating retry requests for blocked responses.

Trait Implementations§

Source§

impl Debug for Request

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for Request

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Ord for Request

Source§

fn cmp(&self, other: &Self) -> Ordering

This method returns an Ordering between self and other. Read more
1.21.0 · Source§

fn max(self, other: Self) -> Self
where Self: Sized,

Compares and returns the maximum of two values. Read more
1.21.0 · Source§

fn min(self, other: Self) -> Self
where Self: Sized,

Compares and returns the minimum of two values. Read more
1.50.0 · Source§

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

Restrict a value to a certain interval. Read more
Source§

impl PartialEq for Request

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl PartialOrd for Request

Source§

fn partial_cmp(&self, other: &Self) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more
Source§

impl Eq for Request

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<Q, K> Comparable<K> for Q
where Q: Ord + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn compare(&self, key: &K) -> Ordering

Compare self to key and return their ordering.
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more