pub struct Request {
pub url: String,
pub sid: String,
pub callback: Option<Callback>,
pub callback_name: Option<String>,
pub priority: i32,
pub dont_filter: bool,
pub meta: HashMap<String, Value>,
pub retry_count: u32,
pub session_kwargs: HashMap<String, Value>,
/* private fields */
}Expand description
A crawl request with URL, priority, metadata, and optional callback.
Request is the unit of work in the crawl pipeline. Create one with
Request::new, customize it with the builder methods (with_priority,
with_sid, with_callback, etc.), and return it from your spider’s parse
method wrapped in SpiderOutput::FollowRequest.
Two requests are considered equal if their fingerprints match (or, if no fingerprint has been computed, if their URLs match). Ordering is by priority (higher values are dequeued first).
Fields§
§url: StringThe URL to fetch. This is the only required field; everything else has
sensible defaults set by Request::new.
sid: StringThe session identifier used to select a fetcher from the
SessionManager. An empty string
means “use the default session.”
callback: Option<Callback>An optional callback to process the response. When present, the engine
calls this closure instead of Spider::parse.
Because closures are not cloneable, use copy_without_callback
when you need to duplicate a request for retries.
callback_name: Option<String>The name of the callback, kept for debugging output and checkpoint
serialization. It has no effect on routing; the actual closure in
callback is what gets invoked.
priority: i32The scheduling priority. Higher values are dequeued first by the
Scheduler. The default is 0. Use
negative values to de-prioritize retries or background pages.
dont_filter: boolWhether to bypass the duplicate-request filter. Set this to true when
you intentionally want to re-fetch a URL – for example, to poll a page
for updates or to retry after a transient failure.
meta: HashMap<String, Value>Arbitrary metadata passed through the crawl pipeline. Whatever you put here is available on the request when it reaches your callback, which is useful for carrying context (e.g., a parent-page ID) between parse stages.
retry_count: u32The number of times this request has been retried after receiving a
blocked response. The engine increments this automatically and stops
retrying once it exceeds Spider::max_blocked_retries.
session_kwargs: HashMap<String, Value>Additional keyword arguments forwarded to the session fetcher. Common
keys include "method", "headers", "data", and "json". These are
also factored into the deduplication fingerprint when
Spider::fp_include_kwargs is enabled.
Implementations§
Source§impl Request
impl Request
Sourcepub fn new(url: impl Into<String>) -> Self
pub fn new(url: impl Into<String>) -> Self
Creates a new request for the given URL with default settings.
All fields are initialized to their zero/empty values: priority 0, no
session override, no callback, no metadata, and duplicate filtering
enabled. Use the with_* builder methods to customize.
Sourcepub fn with_sid(self, sid: impl Into<String>) -> Self
pub fn with_sid(self, sid: impl Into<String>) -> Self
Sets the session identifier for this request, routing it to a specific
fetcher registered in the SessionManager.
Use this when your spider manages multiple sessions with different cookies,
proxies, or authentication contexts.
Sourcepub fn with_priority(self, priority: i32) -> Self
pub fn with_priority(self, priority: i32) -> Self
Sets the scheduling priority for this request. Higher values are dequeued first. Use positive values for important pages (e.g., product detail pages) and negative values for low-priority background work.
Sourcepub fn with_dont_filter(self, dont_filter: bool) -> Self
pub fn with_dont_filter(self, dont_filter: bool) -> Self
Sets whether this request should bypass the duplicate-request filter.
Pass true to allow re-fetching a URL that has already been seen. This
is useful for polling pages that change over time or for manual retries.
Sourcepub fn with_meta(self, meta: HashMap<String, Value>) -> Self
pub fn with_meta(self, meta: HashMap<String, Value>) -> Self
Attaches arbitrary metadata to this request. The metadata map is carried
through the entire crawl pipeline and is accessible in your callback or
parse implementation, making it the standard way to pass context (such
as a parent URL or category label) between crawl stages.
Sourcepub fn with_callback(self, name: &str, callback: Callback) -> Self
pub fn with_callback(self, name: &str, callback: Callback) -> Self
Attaches a named callback to process the response for this request.
When the engine receives the response, it will call this closure instead
of Spider::parse. The name is stored
for debugging and checkpoint serialization; it does not affect dispatch.
Sourcepub fn domain(&self) -> String
pub fn domain(&self) -> String
Extracts the domain (host) from the request URL. Returns an empty string if the URL cannot be parsed. This is used internally for domain allowlisting and per-domain statistics, but you can also call it in your own code to inspect which host a request targets.
Sourcepub fn update_fingerprint(
&mut self,
include_kwargs: bool,
include_headers: bool,
keep_fragments: bool,
) -> &[u8] ⓘ
pub fn update_fingerprint( &mut self, include_kwargs: bool, include_headers: bool, keep_fragments: bool, ) -> &[u8] ⓘ
Computes and caches a SHA-1 fingerprint for deduplication, returning it as a byte slice.
The fingerprint is derived from the session ID, HTTP method, URL, and
request body. The boolean flags control whether session kwargs, headers,
and URL fragments are also included. Once computed, the fingerprint is
cached so subsequent calls are free. The Scheduler
calls this automatically when a request is enqueued.
Sourcepub fn fingerprint(&self) -> Option<&[u8]>
pub fn fingerprint(&self) -> Option<&[u8]>
Returns the cached fingerprint, if one has been computed via
update_fingerprint. Returns None if
the fingerprint has never been calculated. The cache manager uses this to
look up previously stored responses.
Sourcepub fn copy_without_callback(&self) -> Self
pub fn copy_without_callback(&self) -> Self
Creates a clone of this request without the callback closure.
Because Callback is a boxed dyn Fn and cannot be cloned, this
method copies every field except callback (which is set to None).
The engine uses this when creating retry requests for blocked responses.
Trait Implementations§
Source§impl Ord for Request
impl Ord for Request
Source§impl PartialOrd for Request
impl PartialOrd for Request
impl Eq for Request
Auto Trait Implementations§
impl Freeze for Request
impl !RefUnwindSafe for Request
impl Send for Request
impl Sync for Request
impl Unpin for Request
impl UnsafeUnpin for Request
impl !UnwindSafe for Request
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<Q, K> Comparable<K> for Q
impl<Q, K> Comparable<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more