Expand description
Centralized HTTP client wrapper. All Source impls fetch through here.
Security defaults per docs/SECURITY.md:
- rustls TLS only (no openssl, no native-tls — enforced by
deny.toml) - HTTPS-only redirect policy (file://, data://, http:// rejected)
- Per-source redirect host allowlist (
docs/REDIRECT_ALLOWLIST.md) - Body size cap (
crate::PDF_MAX_BYTES= 100 MB) - Per-request timeouts (connect 10s, read 60s, total 300s)
- PDF magic-byte check on the first 5 bytes (
%PDF-) - User-Agent:
doiget/<version> (+https://github.com/sotashimozono/doiget)
See docs/SECURITY.md §1.2-1.3 / §1.10 and docs/REDIRECT_ALLOWLIST.md.
§Architectural note: per-source reqwest::Client
reqwest::redirect::Policy::custom receives only an Attempt value, which
exposes the next URL and previous URL chain but not the original
request’s headers. That makes the “tag the request with X-Doiget-Source
and inspect it from inside the redirect closure” approach infeasible on
reqwest 0.13.x. Instead, HttpClient holds one
reqwest::Client per source — each client’s redirect closure captures
that source’s SourceAllowlist so cross-source confusion is impossible
by construction.
Structs§
- Http
Client - Workspace-wide HTTP client with the security defaults applied.
- Source
Allowlist - Per-source allowlist entry. Matches the schema in
docs/REDIRECT_ALLOWLIST.md§2.
Enums§
- Http
Error - Errors that can arise during HTTP fetches.
Functions§
- oa_
publisher_ allowlist - Hard-coded Phase 1 allowlist for the synthetic
"oa-publisher"source — the publisher / preprint / repository hosts to which Unpaywall’sbest_oa_location.url(orurl_for_pdf) typically resolves. - tier_
1_ allowlist - Hard-coded Phase 1 allowlist for Tier 1 sources. Sourced from
docs/REDIRECT_ALLOWLIST.md§3. - tier_
2_ allowlist - Hard-coded Phase 4 allowlist for Tier 2 metadata sources (OpenAlex,
Semantic Scholar, DOAJ). Sourced from
docs/SOURCES.md§1 (the Tier 2 table) anddocs/REDIRECT_ALLOWLIST.md§3 (same redirect-allowlist policy as Tier 1, distinct source keys).