Structs§
- Extraction
Meta - Metadata written alongside each DataLab extraction cache entry as
meta.json. - Work
Text Result - Result of extracting text from a work’s PDF.
- Zotero
Item Info - Brief Zotero library info for a work matched by DOI.
Enums§
- PdfSource
- Where the PDF was obtained from.
- Processing
Mode - Work
Text Error - Errors from the work_text pipeline.
Functions§
- datalab_
cache_ dir_ path - Return the local cache directory path for
cache_idif determinable. - datalab_
cached_ item_ keys - Return the keys of all locally cached DataLab extractions.
- datalab_
cached_ json - Return the cached JSON for
cache_idif it exists, otherwiseNone. - datalab_
cached_ markdown - Return the cached markdown for
cache_idif it exists, otherwiseNone. - do_
extract - Extract text from PDF bytes, routing through DataLab if
datalabisSome. - download_
extraction_ from_ zotero - Download
papers_extract_{item_key}.zipfrom Zotero (identified byatt_key) and restore it to the local cache directory. - extract_
text_ bytes - Extract text from PDF bytes using pdf-extract.
- find_
work_ in_ zotero - Check if a work exists in the Zotero library, matched by DOI.
- poll_
zotero_ for_ work - Poll Zotero for a work by DOI. Waits 5s initially, then polls every 2s for up to ~2 min.
- read_
extraction_ meta - Read the
meta.jsonforcache_idfrom the local DataLab cache, if present. - try_
zotero - Try to find and download a PDF from Zotero (local storage first, then remote API).
- upload_
extraction_ to_ zotero - Upload the local DataLab cache for
item_keyto Zotero aspapers_extract_{item_key}.zipattached to that same item. - work_
text - Download and extract the full text of a scholarly work.