Skip to main content

HttpClient

Struct HttpClient 

Source
pub struct HttpClient { /* private fields */ }
Expand description

Workspace-wide HTTP client with the security defaults applied.

Internally holds one reqwest::Client per source. Construct via HttpClient::new with the full set of allowlists the calling process will need.

Implementations§

Source§

impl HttpClient

Source

pub fn new(allowlists: Vec<SourceAllowlist>) -> Result<Self, Error>

Build a client with rustls + redirect-allowlist + size cap + timeouts.

allowlists MUST cover every source whose URL might be passed in; fetches against unregistered sources return HttpError::UnknownSource.

§Errors

Returns the underlying reqwest::Error if ClientBuilder::build fails (typically a TLS-backend init failure).

Source

pub fn source_allowlist(&self, source: &str) -> Option<&SourceAllowlist>

The SourceAllowlist this client was built with for source, or None if source was not registered.

This is the identical value captured by the per-source redirect closure (see HttpClient’s allowlists field doc). It exists so the orchestrator can apply the docs/REDIRECT_ALLOWLIST.md §1 pre-fetch host check on a metadata-discovered OA URL — the URL that is fetched without necessarily passing through a redirect hop — using the same source of truth the redirect closure uses, so the two can never disagree. Callers MUST use this for the "oa-publisher" leg only; the initial template-constructed URL is exempt per docs/REDIRECT_ALLOWLIST.md §6.

Source

pub async fn fetch_bytes( &self, source: &str, url: Url, ) -> Result<(Bytes, Url), HttpError>

Fetch a URL, treating it as a JSON or text body. Caps at PDF_MAX_BYTES.

Returns the response body bytes plus the effective final URL after redirects (post-allowlist verification — every hop has already been validated by the time this returns).

§Errors

Any HttpError variant.

Source

pub async fn fetch_bytes_with_headers( &self, source: &str, url: Url, headers: &[(&str, &str)], ) -> Result<(Bytes, Url), HttpError>

Like Self::fetch_bytes but attaches additional request headers to the outgoing GET. The headers are validated up-front against the visible-ASCII subset (RFC 7230 §3.2); any failure returns HttpError::InvalidHeader before the request is sent.

Used by Tier-3 TDM sources that authenticate via a header (APS Harvest X-API-Key, Elsevier ScienceDirect X-ELS-APIKey). Header values appear on the wire only — they are never logged.

§Errors

Any HttpError variant including HttpError::InvalidHeader.

Source

pub async fn fetch_pdf( &self, source: &str, url: Url, ) -> Result<(Bytes, Url), HttpError>

Fetch a URL expected to be a PDF. Same as Self::fetch_bytes plus the magic-byte check on the first 5 bytes (%PDF- = [0x25, 0x50, 0x44, 0x46, 0x2D]). Mismatch returns HttpError::NotAPdf.

§Errors

Any HttpError variant including HttpError::NotAPdf.

Source§

impl HttpClient

Test-oriented HttpClient constructor. Originally cfg(test); now also reachable from the doiget-cli orchestrator’s integration tests (which live outside this crate and therefore cannot see cfg(test)-gated items). The constructor name retains its for_tests_allow_http signal — production code MUST use HttpClient::new with tier_1_allowlist.

Source

pub fn new_for_tests_allow_http(source: &str, allowlist_host: &str) -> Self

Build a test-oriented HttpClient against an http:// wiremock origin. The redirect closure still rejects insecure schemes — we only relax https_only at the connection level so wiremock can serve. This is acceptable because the redirect closure (which is the security-load-bearing path) is exercised by the redirect_to_http_is_rejected_by_closure test below.

Production callers MUST use HttpClient::new with tier_1_allowlist — the for_tests_allow_http suffix is the load- bearing signal that this constructor lifts the initial-leg HTTPS-only requirement.

Source

pub fn new_for_tests_allow_http_multi(entries: &[(&str, &str)]) -> Self

Multi-source variant of HttpClient::new_for_tests_allow_http.

Builds a relaxed-https_only client per (source, allowlist_host) pair. Used by the doiget-cli orchestrator’s integration tests when more than one upstream needs to be wiremocked simultaneously (e.g. Crossref + Unpaywall against two different mock servers). Production callers MUST use HttpClient::new with tier_1_allowlist.

Trait Implementations§

Source§

impl Clone for HttpClient

Source§

fn clone(&self) -> HttpClient

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for HttpClient

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more