Expand description
doiget fetch <ref> subcommand.
Phase 1 scope:
- arXiv refs — full end-to-end: PDF bytes are fetched via the
doiget_core::sources::arxiv::ArxivSource, the[doiget]extension table is populated with the resolved license, source, size, andfetched_at, and the result is written to the on-disk store with both the metadata TOML and the PDF. - DOI refs — Crossref metadata + Unpaywall license enrichment + an
OA PDF fetch when Unpaywall’s
best_oa_location.url_for_pdf(orbest_oa_location.url) resolves to a host on the synthetic"oa-publisher"allowlist (docs/REDIRECT_ALLOWLIST.md§3). The OA URL host check is informed-best-effort; if the host is not on the allowlist or the body fails the magic-byte check, the orchestrator logs aFetch errrow undersource = "oa-publisher"and falls back to metadata-only success — the metadata is still useful.
§Provenance contract
Per docs/PROVENANCE_LOG.md §3, every invocation emits at least one
SessionStart, one or more Fetch rows (one per source consulted), one
StoreWrite row on success, and one SessionEnd. Each Fetch row is
appended by the underlying Source impl; the orchestrator owns the
session-bookend rows and the StoreWrite row.
§Configuration surface
Hard-coded paths with env-var overrides; full config.toml plumbing
arrives in a follow-up. See docs/CONFIG.md for the eventual surface.
| Env var | Default | Purpose |
|---|---|---|
DOIGET_STORE_ROOT | $HOME/papers (or %USERPROFILE%\papers on Windows) | Filesystem store root |
DOIGET_LOG_PATH | <config>/doiget/access.jsonl | Provenance log file |
DOIGET_CONTACT_EMAIL | doiget@localhost | Polite-pool contact email (User-Agent and Crossref) |
DOIGET_UNPAYWALL_EMAIL | (= contact email) | Unpaywall query-string email |
DOIGET_ARXIV_BASE | https://arxiv.org | arXiv source base (test override) |
DOIGET_CROSSREF_BASE | https://api.crossref.org | Crossref source base (test override) |
DOIGET_UNPAYWALL_BASE | https://api.unpaywall.org/v2 | Unpaywall source base (test override) |
DOIGET_OA_PUBLISHER_BASE | (production allowlist) | OA publisher host allowlist override (test override) |
Structs§
- CliExit
- Carries a
docs/ERRORS.md§4 process exit code out of a CLI command tomain, which owns the actualstd::process::exit(calling it insiderun_with_optionswould kill in-process integration tests). The human-readableerror[CODE]: …line has ALREADY been written to stderr byrender_fetch_errorbefore this is constructed, somainmust NOT print it again. Issue #119. - Fetch
Plan - Structured dry-run preview returned by
--dry-runand thedry_run: trueMCP variants. Wire shape matches ADR-0022 §1 anddocs/MCP_TOOLS.md§10. - PdfSource
Plan - Per-PDF-source row inside
FetchPlan::pdf_sources. - Rate
Limit Budget - Per-process rate-limit context surfaced alongside
FetchPlanso an agent can predict the politeness ceiling without a separatedoiget_capability_profileround-trip.
Functions§
- build_
dry_ run_ envelope - Build the dry-run envelope as a
serde_json::Value, without writing anywhere. Used by both the CLI (which prints it to stdout) and the MCP tool wrapper (which routes the bytes via JSON-RPC). Wire shape: - build_
fetch_ plan - Build the dry-run preview (
FetchPlan) for the given ref and store root, without contacting the network or filesystem. - emit_
dry_ run_ plan_ to_ stdout - Serialize the dry-run envelope and write it to stdout. Used by the
--dry-runflag ondoiget fetchanddoiget batch. The envelope shape matches ADR-0022 §1 /docs/MCP_TOOLS.md§10. - run_
with_ options - Run the
doiget fetch <ref>subcommand.