Expand description
commonmeta — a Rust port of front-matter/commonmeta.
Convert scholarly metadata between formats. The native model is Data;
format modules read into it and write out of it.
Re-exports§
pub use data::Citation;pub use data::Data;pub use error::Error;pub use error::Result;pub use schema_utils::SCHEMA_JSON;
Modules§
- author_
utils - cmd
- constants
- Controlled vocabularies and cross-format type/role translation tables.
- crockford
- Generate, encode and decode random base32 identifiers. This encoder/decoder:
- crossref
- data
- Core Commonmeta data model.
- date_
utils - Date and datetime utilities.
- doi_
utils - Utilities for working with DOIs
- error
- geonames
- GeoNames populated-places reference data.
- io_
utils - progress
- pubmed
- ror_
countries - schema_
utils - JSON Schema and XSD validation utilities.
- spdx
- SPDX license vocabulary lookup.
- utils
- vocabularies
- Embedded controlled vocabulary data files.
Structs§
- Affiliation
Match - A single match result from the ROR affiliation API.
- Fill
Report - Person
Affiliation - Push
Result - The outcome of pushing a single record to InvenioRDM.
- Ror
- RorRelease
- Metadata about a ROR data release published on Zenodo.
- Validation
Error - A single record that failed commonmeta v1.0 schema validation.
- Validation
Report - Summary returned by [
validate_sqlite].
Enums§
- Junction
Table - Populate
works_referencesfor all existing rows inworksthat have no entry yet. Reads metadata blobs in streaming batches, extracts resolved DOI reference IDs, and inserts them withINSERT OR IGNORE(safe to re-run).
Constants§
Functions§
- backfill_
junction_ tables - Backfill one or more junction tables (
works_orcid,works_ror,works_references) for every row inworks.providersrestricts to specific provider values (e.g.["Crossref"]); empty = all providers. Reads blobs in 50 k-row streaming batches; usesINSERT OR IGNOREso it is safe to re-run or interrupt and resume. Returns(works_scanned, rows_inserted). - backfill_
works_ references - Convenience wrapper: backfill only
works_references. - convert
- Read from one format and write to another in a single call.
- convert_
citation - Like
convert, but passes CSLstyleandlocalethrough to the citation writer. - count_
sqlite_ works - Return the total number of rows in the
workstable of a commonmeta SQLite database — useful for reporting the cumulative count after an upsert. - crossref_
fetch_ page_ with_ cursor - Fetch one page of Crossref works using cursor-based pagination.
- download_
ror_ all - Convenience: fetch the latest release metadata then immediately download
and parse the dump. Returns
(RorRelease, Vec<Ror>, from_cache). - download_
ror_ release - Download and parse the zip archive described by
release. The zip is cached locally for 30 days so repeat installs of the same version skip the network round-trip. Returns(records, from_cache). - enrich_
citations - Populate
data.citationsfrom theworks_referencesjunction table, merging with any citations already present (e.g. from DataCite/OpenAlex). No-op whendb_pathdoes not exist or the lookup fails. - enrich_
ror_ locations - Enrich missing
geonames_detailsfields for each location in aRorrecord using the locally installed GeoNames SQLite database. Only fills empty fields. - fetch_
all_ crossref_ by_ orcid - Fetch all works by ORCID from Crossref using cursor-based pagination.
- fetch_
all_ datacite_ by_ orcid - Fetch all works by ORCID from DataCite, iterating pages until exhausted.
- fetch_
crossref_ by_ orcid - Fetch works by ORCID from Crossref, sorted by date descending.
pageis 1-based; Crossref offset is computed as(page-1) * limit. - fetch_
crossref_ by_ ror - Fetch works by ROR from Crossref, sorted by date descending.
pageis 1-based; Crossref offset is computed as(page-1) * limit. - fetch_
datacite_ by_ orcid - Fetch works by ORCID from DataCite, sorted by date descending.
pageis 1-based and maps directly to DataCite’spage[number]parameter. - fetch_
datacite_ by_ ror - Fetch works by ROR from DataCite, sorted by date descending.
pageis 1-based and maps directly to DataCite’spage[number]parameter. - fetch_
geonames_ sqlite - Look up a GeoNames place by its integer
idfrom the local SQLite database. - fetch_
installed_ geonames_ date - Return the GeoNames install date stored in the local database’s
settingstable, orNonewhen the database does not exist or no date has been recorded yet. - fetch_
installed_ orcid_ public_ data_ version - Read the installed ORCID Public Data File version from the
settingstable. Read the installed ORCID Public Data File version from thesettingstable. ReturnsNonewhen no version has been recorded yet. - fetch_
installed_ ror_ version - Return the ROR version string stored in the local database’s
settingstable, orNonewhen the database does not exist or no version has been recorded yet. - fetch_
installed_ vraix_ date - Return the
vraix_date(pidbox install date,YYYY-MM-DD) stored in the local works database’ssettingstable, orNonewhen the database does not exist or no date has been recorded yet. - fetch_
latest_ orcid_ release - Fetch the latest ORCID Public Data File release metadata from figshare. Fetch the latest ORCID Public Data File release from figshare.
- fetch_
latest_ ror_ release - Fetch metadata for the latest ROR data release from Zenodo (InvenioRDM) without downloading the full archive. Returns the version tag, release date, Zenodo record ID, zip filename, and direct download URL.
- fetch_
orcid - Fetch a person from the ORCID public API and return their record as
Data. Accepts a bare ORCID iD (0000-0003-1419-2405) or a full ORCID URL. - fetch_
orcid_ affiliations - Fetch employment and education records from the ORCID public API, returning
them as a combined list sorted by start date. Supersedes
fetch_orcid_employmentswhen both affiliation types are needed. - fetch_
orcid_ affiliations_ sqlite - Read affiliations stored in the
affiliationscolumn of thepeopleSQLite table. Returns an empty vec when the record is absent or the column is empty. - fetch_
orcid_ employments - Fetch employment records from the ORCID public API for the given ORCID URL.
Returns affiliations sorted by start date. When
db_pathis provided, non-ROR organization identifiers (GRID, ISNI, FundRef, Wikidata) are resolved to ROR IDs via the localorganizationsSQLite table. - fetch_
orcid_ person_ json - Fetch a person from the ORCID public API and return the raw ORCID 3.0 person
JSON conforming to
orcid_schema_v3.0.json. - fetch_
orcid_ person_ json_ sqlite - Look up a person from a local
peopleSQLite table and return the raw ORCID 3.0 person JSON conforming toorcid_schema_v3.0.json. - fetch_
orcid_ sqlite - Look up a person from a local
peopleSQLite table and return their record asData. Accepts a bare ORCID iD or a full ORCID URL. Handles both XML blobs (bulk import) and JSON blobs (single-record API import). - fetch_
orcid_ with_ json - Fetch a person from the ORCID public API and return both the parsed
Dataand the raw ORCID 3.0 person JSON in a single HTTP request. - fetch_
orcid_ work_ dois - Fetch the DOIs of all works listed on an ORCID profile, returned as
normalised
https://doi.org/…URLs in response order. - fetch_
reference_ works - Fetch the referenced works of
datathat have a DOI. - fetch_
ror - Fetch a ROR organization by its ROR URL or other organization identifier
from the ROR API. Returns the record converted to the commonmeta
Datamodel. - fetch_
ror_ raw - Fetch the raw
Rorstruct from the ROR v2 API, bypassing the lossyDataconversion. - fetch_
ror_ raw_ sqlite - Return the raw
Rorstruct for a given ROR URL from the local SQLite database, bypassing the lossyDataconversion. - fetch_
ror_ sqlite - Look up a ROR organization by its full URL (e.g.
https://ror.org/012xzy7a9) from a local SQLite database written bywrite_ror_sqlite. Returns the record converted to the commonmetaDatamodel, or an error when not found. - fetch_
vraix_ dump - Fetch commonmeta records from a VRAIX daily dump for
from(“crossref” or “datacite”) anddate(YYYY-MM-DD). - fill_
sqlite - Fill missing or convertible affiliation/organization identifiers in the works database.
- flush_
dragoman_ cache - Delete all rows from the VRAIX-schema transport table in the dragoman
cache at
pathand VACUUM to reclaim disk space. Call this after a successfulstream_pidbox_to_sqliteimport to prevent re-importing the same records on the next run. Returns the number of rows deleted. - get_
all_ sqlite_ settings - Return all rows from the
settingstable, sorted by key. - get_
sqlite_ setting - Read a value from the
settingstable. ReturnsNonewhen the key is absent. - import_
orcid_ person - Fetch a single person record from the ORCID public API, upsert the person
into
people_db, and fetch their works from Crossref and DataCite and upsert them intoworks_db(may be the same path aspeople_db). Accepts a bare ORCID iD or a full ORCID URL. Returns the number of works written. - import_
orcid_ public_ data - Download and import the ORCID Public Data File summaries into the
peopletable atoutput_path. Skips the download if the current version is already installed; resumes partial downloads automatically. - import_
prefixes - Bulk-resolve all distinct DOI prefixes in the works database against the DOI RA API
and populate the
prefixestable. - install_
geonames_ sqlite - Download the GeoNames cities500 dump, admin1 codes, and country info; parse
them; and write the records to the
geonames,geonames_admin1, andgeonames_countriestables in the SQLite database atpath. Caches all three files for 30 days; other tables in the database are untouched. Returns(record_count, from_cache). - match_
ror_ affiliation - Match a free-text affiliation string against ROR organizations using the ROR v2 affiliation endpoint.
- match_
ror_ affiliation_ sqlite - Match a free-text affiliation string against a local ROR SQLite database
written by
write_ror_sqlite. Uses Turso’s Tantivy-backed FTS index for full-text search across all organization name variants. Returns results in relevance order withchosenset on the top result. - prepare_
commonmeta - Prepare a
Datarecord for commonmeta v1.0 JSON serialization: normalises IDs, strips schema-private reference fields, clears invalid ROR/ORCID ids, etc. - push_
inveniordm - Create-or-update, then publish, a list of records in InvenioRDM.
- put_
inveniordm - Create-or-update, then publish, a single record in InvenioRDM.
- read
- Read a single record from
fromformat, without writing it back out. - read_
parquet - Read a list of commonmeta records back from the Parquet schema written by
write_parquet. Lossless: each record is restored from itsjsoncolumn, the complete original serialization. - read_
ror_ sqlite - Read a page of ROR organizations from the local SQLite database as
Datarecords.limitcaps records returned;offsetis the zero-based row offset.country_codefilters by ISO 3166-1 alpha-2 code;queryapplies FTS. - read_
ror_ sqlite_ raw - read_
sqlite_ by_ arxiv - read_
sqlite_ by_ citation - Fetch all works that cite
doi(i.e. have it in their reference list), ordered bydate_publisheddescending. - read_
sqlite_ by_ dois - Fetch all works whose DOI matches any entry in
doisin a single SQL query. DOIs are normalised before lookup; records not found are silently omitted. - read_
sqlite_ by_ id - Look up a single record by its
id(DOI URL) in a commonmeta SQLite database. ReturnsNonewhen the record is not present. - read_
sqlite_ by_ openalex - read_
sqlite_ by_ orcid - Fetch all works with a contributor whose ORCID matches
orcid_url, ordered bydate_publisheddescending. - read_
sqlite_ by_ pmcid - read_
sqlite_ by_ pmid - read_
sqlite_ by_ ror - Fetch all works with a contributor affiliated with
ror_url, ordered bydate_publisheddescending. - read_
sqlite_ commonmeta - Read records from a commonmeta SQLite database written by
write_sqlite. - read_
vraix_ sqlite - Read commonmeta records from a VRAIX daily dump SQLite file already on
disk at
sqlite_path, e.g. an already-downloadedcrossref-2026-06-14.sqlite3. - rebuild_
organizations_ fts - Drop and rebuild the
organizations_ftsFTS5 virtual table. - rebuild_
people_ fts - Drop and rebuild the
people_ftsFTS5 virtual table. - rebuild_
works_ fts - Drop and rebuild the
works_ftsFTS5 virtual table from the content inworks. - run_cli
- Run any commonmeta CLI subcommand from a list of arguments.
- run_
migrations - Apply any pending schema migrations to an existing database, printing
per-step progress and timing to stderr. Returns
(steps_applied, version). - sample_
ror_ sqlite - Return a random sample of ROR organizations from the local SQLite database.
- sample_
ror_ sqlite_ raw - set_
sqlite_ setting - Write a key/value pair into the
settingstable of a commonmeta SQLite database. - stream_
cache_ orcid_ to_ people_ sqlite - Read ORCID person rows written by dragoman into
cache.sqlite3and upsert them into thepeopletable atpeople_path. - stream_
pidbox_ to_ sqlite - Stream the pidbox dump (a mixed-source VRAIX SQLite file containing crossref,
datacite, and ROR rows) directly to a commonmeta SQLite database. Each row
is routed to the appropriate parser by its
source_id; ROR rows are skipped. Whenupdateis false the output file is recreated; when true rows are upserted byid. Returns the number of records written. - stream_
pmc_ ids_ to_ sqlite - Stream a gzip-compressed PMC-ids CSV file into the commonmeta SQLite
database at
output_path, upserting rows that have a DOI. Passlimit = 0to process all rows. Returns the number of records written. - stream_
vraix_ to_ sqlite - Stream a VRAIX daily dump at
input_pathdirectly to a commonmeta SQLite database atoutput_pathin batches of 10 000 rows, converting withfrom-specific parser and writing each batch in a single transaction.limitcaps total records written; pass0for all rows. Whenupdateis false the output file is deleted and recreated (default). Whenupdateis true the existing file is kept and rows are upserted by theiridprimary key — new rows are inserted, existing rows are replaced. Returns the number of records written. NoVec<Data>is held for the whole file — peak memory is proportional to one batch, not the whole dump. - stream_
zst_ pidbox_ to_ sqlite - Like
stream_pidbox_to_sqlitebut reads directly from the zstd-compressed pidbox file without decompressing it to disk first. Requires the database to be well-organised (VACUUM’d or sequential bulk inserts) so that pages appear in DFS pre-order. - upsert_
sqlite - Like
write_sqlitebut opens an existing database instead of recreating it. Rows whoseidalready exists are replaced; new rows are inserted. - validate_
sqlite - Validate records in a commonmeta SQLite database against the v1.0 JSON schema.
- write
- Write an already-loaded record to
toformat. - write_
archive - Render
listtotoformat, split into entries of at mostbatch_sizerecords each — suitable for packing into an archive viaio_utils::write_zip_archive/io_utils::write_tar_gz_archive.base_name(e.g."out.json") names the single entry directly when there’s only one batch, or gets a numbered suffix ("out-00000.json","out-00001.json", …) when there are several. - write_
archive_ citation - Like
write_archive, but passes CSLstyle/localethrough to the citation writer whento == "citation". - write_
list - Render a list of records to
toformat as a single buffer: a JSON array for object-shaped formats (commonmeta,csl,datacite,inveniordm,schemaorg,ror), or newline-joined output for line/document-shaped formats (e.g.bibtex,ris,crossref_xml). - write_
list_ citation - Like
write_list, but passes CSLstyle/localethrough to the citation writer whento == "citation"(ignored for every other format, same asconvert_citation/write_citation). - write_
orcid_ commonmeta - Convert ORCID 3.0 person JSON + resolved affiliations + works to a commonmeta
array validated against the commonmeta v1.0 schema.
worksmay be empty. - write_
orcid_ inveniordm_ yaml - Serialize a person to InvenioRDM names YAML (list form).
person_jsonis the ORCID 3.0/personresponse;affiliationsfromfetch_orcid_employments. - write_
orcid_ json - Serialize an ORCID 3.0 person JSON value (from
fetch_orcid_person_jsonorfetch_orcid_person_json_sqlite) to bytes. - write_
parquet - Write a list of commonmeta records as a single Parquet file. Alongside a
flattened tabular projection of each record’s fields (for filtering in
tools like DuckDB without parsing JSON), every row also carries a
jsoncolumn with the record’s complete serialization, soread_parquetround-trips losslessly. - write_
ror_ commonmeta - Serialize a ROR organization
Dataas a v1.0-compliant commonmeta JSON array. - write_
ror_ json - Write a ROR-derived record as raw ROR-shaped JSON (as opposed to
write("ror", data), which produces InvenioRDM vocabulary YAML). - write_
ror_ sqlite - Write a list of ROR records to a SQLite3 database at
pathwith anorganizationstable. Existing file is deleted first. JSON array columns (types,locations,names,external_ids) are queryable via SQLite’sjson_each(). Themetadatacolumn stores the full ROR JSON as a zstd-compressed BLOB for lossless round-trips. - write_
ror_ v2_ json - Serialize a
Rorrecord as ROR v2-compatible JSON, converting empty-stringlangandpreferredfields to JSONnullto match the canonical API output. - write_
sqlite - Write
listas a SQLite3 database with aworkstable whose columns mirror the commonmeta v1.0 schema. Simple string fields are stored as TEXT; complex fields are stored as compact JSON TEXT. Any existing file atpathis deleted first. - write_
vraix_ table_ parquet - Write a VRAIX dump’s transport table (e.g.
pid_records) to a single Parquet file’s bytes, using its raw columns (pid,source_id,raw_metadata, …) as-is — not converted to commonmetaDatathe wayread_vraix_sqliteis. For analytics over the dump itself (e.g. via DataFusion/Polars/DuckDB), not for ingesting it as commonmeta records.batch_sizecontrols how many rows land in each internal Parquet row group (see [formats::commonmeta::write_parquet_all]’s analogousROW_GROUP_SIZEfor why this matters for large dumps). - write_
with_ style - Like [
write], but forwardsstyleandlocaleto the citation writer. For non-"citation"formats both parameters are ignored.