Expand description
commonmeta — a Rust port of front-matter/commonmeta.
Convert scholarly metadata between formats. The native model is Data;
format modules read into it and write out of it.
Re-exports§
pub use data::Data;pub use error::Error;pub use error::Result;pub use schema_utils::SCHEMA_JSON;
Modules§
- author_
utils - constants
- Controlled vocabularies and cross-format type/role translation tables.
- crockford
- Generate, encode and decode random base32 identifiers. This encoder/decoder:
- crossref
- data
- Core Commonmeta data model.
- date_
utils - Date and datetime utilities.
- doi_
utils - Utilities for working with DOIs
- error
- io_
utils - progress
- schema_
utils - JSON Schema and XSD validation utilities.
- spdx
- SPDX license vocabulary lookup.
- utils
- vocabularies
- Embedded controlled vocabulary data files.
Structs§
- Affiliation
Match - A single match result from the ROR affiliation API.
- Push
Result - The outcome of pushing a single record to InvenioRDM.
- RorRelease
- Metadata about a ROR data release published on Zenodo.
- Validation
Error - A single record that failed commonmeta v1.0 schema validation.
- Validation
Report - Summary returned by [
validate_sqlite].
Constants§
Functions§
- convert
- Read from one format and write to another in a single call.
- convert_
citation - Like
convert, but passes CSLstyleandlocalethrough to the citation writer. - count_
sqlite_ works - Return the total number of rows in the
workstable of a commonmeta SQLite database — useful for reporting the cumulative count after an upsert. - crossref_
fetch_ page_ with_ cursor - Fetch one page of Crossref works using cursor-based pagination.
- download_
ror_ all - Convenience: fetch the latest release metadata then immediately download
and parse the dump. Returns
(RorRelease, Vec<Ror>, from_cache). - download_
ror_ release - Download and parse the zip archive described by
release. The zip is cached locally for 30 days so repeat installs of the same version skip the network round-trip. Returns(records, from_cache). - fetch_
installed_ ror_ version - Return the ROR version string stored in the local database’s
settingstable, orNonewhen the database does not exist or no version has been recorded yet. - fetch_
installed_ vraix_ date - Return the
vraix_date(pidbox install date,YYYY-MM-DD) stored in the local works database’ssettingstable, orNonewhen the database does not exist or no date has been recorded yet. - fetch_
latest_ ror_ release - Fetch metadata for the latest ROR data release from Zenodo (InvenioRDM) without downloading the full archive. Returns the version tag, release date, Zenodo record ID, zip filename, and direct download URL.
- fetch_
ror - Fetch a ROR organization by its ROR URL or other organization identifier
from the ROR API. Returns the record converted to the commonmeta
Datamodel. - fetch_
ror_ sqlite - Look up a ROR organization by its full URL (e.g.
https://ror.org/012xzy7a9) from a local SQLite database written bywrite_ror_sqlite. Returns the record converted to the commonmetaDatamodel, or an error when not found. - fetch_
vraix_ dump - Fetch commonmeta records from a VRAIX daily dump for
from(“crossref” or “datacite”) anddate(YYYY-MM-DD). - match_
ror_ affiliation - Match a free-text affiliation string against ROR organizations using the ROR v2 affiliation endpoint.
- match_
ror_ affiliation_ sqlite - Match a free-text affiliation string against a local ROR SQLite database
written by
write_ror_sqlite. Uses Turso’s Tantivy-backed FTS index for full-text search across all organization name variants. Returns results in relevance order withchosenset on the top result. - push_
inveniordm - Create-or-update, then publish, a list of records in InvenioRDM.
- put_
inveniordm - Create-or-update, then publish, a single record in InvenioRDM.
- read
- Read a single record from
fromformat, without writing it back out. - read_
parquet - Read a list of commonmeta records back from the Parquet schema written by
write_parquet. Lossless: each record is restored from itsjsoncolumn, the complete original serialization. - read_
sqlite_ by_ id - Look up a single record by its
id(DOI URL) in a commonmeta SQLite database. ReturnsNonewhen the record is not present. - read_
sqlite_ commonmeta - Read records from a commonmeta SQLite database written by
write_sqlite. - read_
vraix_ sqlite - Read commonmeta records from a VRAIX daily dump SQLite file already on
disk at
sqlite_path, e.g. an already-downloadedcrossref-2026-06-14.sqlite3. - stream_
pidbox_ to_ sqlite - Stream the pidbox dump (a mixed-source VRAIX SQLite file containing crossref,
datacite, and ROR rows) directly to a commonmeta SQLite database. Each row
is routed to the appropriate parser by its
source_id; ROR rows are skipped. Whenupdateis false the output file is recreated; when true rows are upserted byid. Returns the number of records written. - stream_
vraix_ to_ sqlite - Stream a VRAIX daily dump at
input_pathdirectly to a commonmeta SQLite database atoutput_pathin batches of 10 000 rows, converting withfrom-specific parser and writing each batch in a single transaction.limitcaps total records written; pass0for all rows. Whenupdateis false the output file is deleted and recreated (default). Whenupdateis true the existing file is kept and rows are upserted by theiridprimary key — new rows are inserted, existing rows are replaced. Returns the number of records written. NoVec<Data>is held for the whole file — peak memory is proportional to one batch, not the whole dump. - stream_
zst_ pidbox_ to_ sqlite - Like
stream_pidbox_to_sqlitebut reads directly from the zstd-compressed pidbox file without decompressing it to disk first. Requires the database to be well-organised (VACUUM’d or sequential bulk inserts) so that pages appear in DFS pre-order. - upsert_
sqlite - Like
write_sqlitebut opens an existing database instead of recreating it. Rows whoseidalready exists are replaced; new rows are inserted. - validate_
sqlite - Validate records in a commonmeta SQLite database against the v1.0 JSON schema.
- write
- Write an already-loaded record to
toformat. - write_
archive - Render
listtotoformat, split into entries of at mostbatch_sizerecords each — suitable for packing into an archive viaio_utils::write_zip_archive/io_utils::write_tar_gz_archive.base_name(e.g."out.json") names the single entry directly when there’s only one batch, or gets a numbered suffix ("out-00000.json","out-00001.json", …) when there are several. - write_
archive_ citation - Like
write_archive, but passes CSLstyle/localethrough to the citation writer whento == "citation". - write_
list - Render a list of records to
toformat as a single buffer: a JSON array for object-shaped formats (commonmeta,csl,datacite,inveniordm,schemaorg,ror), or newline-joined output for line/document-shaped formats (e.g.bibtex,ris,crossref_xml). - write_
list_ citation - Like
write_list, but passes CSLstyle/localethrough to the citation writer whento == "citation"(ignored for every other format, same asconvert_citation/write_citation). - write_
parquet - Write a list of commonmeta records as a single Parquet file. Alongside a
flattened tabular projection of each record’s fields (for filtering in
tools like DuckDB without parsing JSON), every row also carries a
jsoncolumn with the record’s complete serialization, soread_parquetround-trips losslessly. - write_
ror_ json - Write a ROR-derived record as raw ROR-shaped JSON (as opposed to
write("ror", data), which produces InvenioRDM vocabulary YAML). - write_
ror_ sqlite - Write a list of ROR records to a SQLite3 database at
pathwith anorganizationstable. Existing file is deleted first. JSON array columns (types,locations,names,external_ids) are queryable via SQLite’sjson_each(). Themetadatacolumn stores the full ROR JSON as a zstd-compressed BLOB for lossless round-trips. - write_
sqlite - Write
listas a SQLite3 database with aworkstable whose columns mirror the commonmeta v1.0 schema. Simple string fields are stored as TEXT; complex fields are stored as compact JSON TEXT. Any existing file atpathis deleted first. - write_
vraix_ table_ parquet - Write a VRAIX dump’s transport table (e.g.
pid_records) to a single Parquet file’s bytes, using its raw columns (pid,source_id,raw_metadata, …) as-is — not converted to commonmetaDatathe wayread_vraix_sqliteis. For analytics over the dump itself (e.g. via DataFusion/Polars/DuckDB), not for ingesting it as commonmeta records.batch_sizecontrols how many rows land in each internal Parquet row group (see [formats::commonmeta::write_parquet_all]’s analogousROW_GROUP_SIZEfor why this matters for large dumps). - write_
with_ style - Like [
write], but forwardsstyleandlocaleto the citation writer. For non-"citation"formats both parameters are ignored.