pub struct Request {
pub url: Url,
pub method: Method,
pub headers: HeaderMap,
pub body: Option<Body>,
/* private fields */
}Expand description
Shared runtime data types and convenience helpers. Outgoing HTTP request used by the crawler runtime.
Request is the handoff type between spiders, middleware, the scheduler,
and the downloader. It is transport-neutral enough to be shared across the
workspace, but expressive enough for custom methods, headers, bodies, and
request-scoped metadata.
§Example
use spider_util::request::Request;
use url::Url;
// Create a basic GET request
let request = Request::new(Url::parse("https://example.com").unwrap());
// Build a request with headers and method
let post_request = Request::new(Url::parse("https://api.example.com").unwrap())
.with_method(reqwest::Method::POST)
.with_header("Accept", "application/json")
.unwrap();Fields§
§url: UrlThe target URL for this request.
method: MethodThe HTTP method (GET, POST, etc.).
headers: HeaderMapHTTP headers for the request.
body: Option<Body>Optional request body.
Implementations§
Source§impl Request
impl Request
Sourcepub fn new(url: Url) -> Request
pub fn new(url: Url) -> Request
Creates a new Request with the given URL.
This is the most common constructor used by spiders when enqueueing
follow-up pages. It does not allocate metadata storage unless
with_meta is called.
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com").unwrap());Sourcepub fn with_method(self, method: Method) -> Request
pub fn with_method(self, method: Method) -> Request
Sourcepub fn with_header(
self,
name: &str,
value: &str,
) -> Result<Request, SpiderError>
pub fn with_header( self, name: &str, value: &str, ) -> Result<Request, SpiderError>
Adds a header to the request.
Returns an error if the header name or value is invalid.
§Errors
Returns a SpiderError::HeaderValueError if the header name or value is invalid.
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com").unwrap())
.with_header("Accept", "application/json")
.unwrap();Sourcepub fn with_json(self, json: Value) -> Request
pub fn with_json(self, json: Value) -> Request
Sets the body of the request to a JSON value and defaults the method to POST.
This helper stores the payload body only. Add content-type headers explicitly when the target service expects them.
§Example
use spider_util::request::Request;
use url::Url;
use serde_json::json;
let request = Request::new(Url::parse("https://api.example.com").unwrap())
.with_json(json!({"name": "test"}));Sourcepub fn with_form(self, form: DashMap<String, String>) -> Request
pub fn with_form(self, form: DashMap<String, String>) -> Request
Sets the body of the request to form data and defaults the method to POST.
§Example
use spider_util::request::Request;
use url::Url;
use dashmap::DashMap;
let mut form = DashMap::new();
form.insert("key".to_string(), "value".to_string());
let request = Request::new(Url::parse("https://api.example.com").unwrap())
.with_form(form);Sourcepub fn with_bytes(self, bytes: Bytes) -> Request
pub fn with_bytes(self, bytes: Bytes) -> Request
Sourcepub fn with_meta(self, key: &str, value: Value) -> Request
pub fn with_meta(self, key: &str, value: Value) -> Request
Adds a value to the request’s metadata.
Lazily allocates the metadata map on first use. Metadata is commonly used to carry crawl context such as pagination state, source URLs, or retry bookkeeping across middleware and parsing stages.
§Example
use spider_util::request::Request;
use url::Url;
use serde_json::json;
let request = Request::new(Url::parse("https://example.com").unwrap())
.with_meta("priority", json!(1))
.with_meta("source", json!("manual"));Sourcepub fn get_meta(&self, key: &str) -> Option<Value>
pub fn get_meta(&self, key: &str) -> Option<Value>
Gets a reference to a metadata value, if it exists.
Returns a cloned JSON value because metadata is stored in a shared
concurrent map. Returns None if the key doesn’t exist or if metadata
hasn’t been set.
Sourcepub fn meta_map(&self) -> Option<&Arc<DashMap<String, Value>>>
pub fn meta_map(&self) -> Option<&Arc<DashMap<String, Value>>>
Returns a reference to the internal metadata map, if it exists.
Sourcepub fn insert_meta(&mut self, key: String, value: Value)
pub fn insert_meta(&mut self, key: String, value: Value)
Inserts a value into metadata, creating the map if needed.
This is intended for internal framework use.
Sourcepub fn get_meta_ref(&self, key: &str) -> Option<Ref<'_, String, Value>>
pub fn get_meta_ref(&self, key: &str) -> Option<Ref<'_, String, Value>>
Gets a value from metadata using DashMap’s API.
This is intended for internal framework use where direct access is needed.
Sourcepub fn set_meta_from_option(
&mut self,
meta: Option<Arc<DashMap<String, Value>>>,
)
pub fn set_meta_from_option( &mut self, meta: Option<Arc<DashMap<String, Value>>>, )
Sets the metadata map directly.
Used for internal framework operations.
Sourcepub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
pub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
Clones the metadata map.
Used for internal framework operations where metadata needs to be copied.
Sourcepub fn take_meta(&mut self) -> Option<Arc<DashMap<String, Value>>>
pub fn take_meta(&mut self) -> Option<Arc<DashMap<String, Value>>>
Takes the metadata map, leaving None in its place.
Used for internal framework operations.
Sourcepub fn meta_inner(&self) -> &Option<Arc<DashMap<String, Value>>>
pub fn meta_inner(&self) -> &Option<Arc<DashMap<String, Value>>>
Returns a reference to the metadata Arc for internal framework use.
Sourcepub fn get_retry_attempts(&self) -> u32
pub fn get_retry_attempts(&self) -> u32
Gets the number of times the request has been retried.
Returns 0 if no retry attempts have been recorded.
Sourcepub fn increment_retry_attempts(&mut self)
pub fn increment_retry_attempts(&mut self)
Increments the retry count for the request.
Lazily allocates the metadata map if not already present.
Sourcepub fn fingerprint(&self) -> String
pub fn fingerprint(&self) -> String
Generates a unique fingerprint for the request based on its URL, method, and body.
This is the stable identity used by runtime deduplication and related components that need to recognize equivalent requests.
The fingerprint is used for duplicate detection and caching. It combines:
- The request URL
- The HTTP method
- The request body (if present)
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com").unwrap());
let fingerprint = request.fingerprint();