Skip to main content

Crate commonmeta

Crate commonmeta 

Source
Expand description

commonmeta — a Rust port of front-matter/commonmeta.

Convert scholarly metadata between formats. The native model is Data; format modules read into it and write out of it.

Re-exports§

pub use data::Data;
pub use error::Error;
pub use error::Result;

Modules§

author_utils
constants
Controlled vocabularies and cross-format type/role translation tables.
crockford
Generate, encode and decode random base32 identifiers. This encoder/decoder:
crossref
data
Core Commonmeta data model.
doi_utils
Utilities for working with DOIs
error
file_utils
progress
schema_utils
JSON Schema and XSD validation utilities.
spdx
SPDX license vocabulary lookup.
utils
vocabularies
Embedded controlled vocabulary data files.

Structs§

AffiliationMatch
A single match result from the ROR affiliation API.
PushResult
The outcome of pushing a single record to InvenioRDM.

Constants§

VERSION

Functions§

convert
Read from one format and write to another in a single call.
convert_citation
Like convert, but passes CSL style and locale through to the citation writer.
count_sqlite_works
Return the total number of rows in the works table of a commonmeta SQLite database — useful for reporting the cumulative count after an upsert.
fetch_vraix_dump
Fetch commonmeta records from a VRAIX daily dump for from (“crossref” or “datacite”) and date (YYYY-MM-DD).
match_ror_affiliation
Match a free-text affiliation string against ROR organizations using the ROR v2 affiliation endpoint.
push_inveniordm
Create-or-update, then publish, a list of records in InvenioRDM.
put_inveniordm
Create-or-update, then publish, a single record in InvenioRDM.
read
Read a single record from from format, without writing it back out.
read_parquet
Read a list of commonmeta records back from the Parquet schema written by write_parquet. Lossless: each record is restored from its json column, the complete original serialization.
read_sqlite_commonmeta
Read records from a commonmeta SQLite database written by write_sqlite.
read_vraix_sqlite
Read commonmeta records from a VRAIX daily dump SQLite file already on disk at sqlite_path, e.g. an already-downloaded crossref-2026-06-14.sqlite3.
stream_pidbox_to_sqlite
Stream the pidbox dump (a mixed-source VRAIX SQLite file containing crossref, datacite, and ROR rows) directly to a commonmeta SQLite database. Each row is routed to the appropriate parser by its source_id; ROR rows are skipped. When update is false the output file is recreated; when true rows are upserted by id. Returns the number of records written.
stream_vraix_to_sqlite
Stream a VRAIX daily dump at input_path directly to a commonmeta SQLite database at output_path in batches of 10 000 rows, converting with from-specific parser and writing each batch in a single transaction. limit caps total records written; pass 0 for all rows. When update is false the output file is deleted and recreated (default). When update is true the existing file is kept and rows are upserted by their id primary key — new rows are inserted, existing rows are replaced. Returns the number of records written. No Vec<Data> is held for the whole file — peak memory is proportional to one batch, not the whole dump.
upsert_sqlite
Like write_sqlite but opens an existing database instead of recreating it. Rows whose id already exists are replaced; new rows are inserted.
write
Write an already-loaded record to to format.
write_archive
Render list to to format, split into entries of at most batch_size records each — suitable for packing into an archive via file_utils::write_zip_archive/file_utils::write_tar_gz_archive. base_name (e.g. "out.json") names the single entry directly when there’s only one batch, or gets a numbered suffix ("out-00000.json", "out-00001.json", …) when there are several.
write_archive_citation
Like write_archive, but passes CSL style/locale through to the citation writer when to == "citation".
write_list
Render a list of records to to format as a single buffer: a JSON array for object-shaped formats (commonmeta, csl, datacite, inveniordm, schemaorg, ror), or newline-joined output for line/document-shaped formats (e.g. bibtex, ris, crossref_xml).
write_list_citation
Like write_list, but passes CSL style/locale through to the citation writer when to == "citation" (ignored for every other format, same as convert_citation/write_citation).
write_parquet
Write a list of commonmeta records as a single Parquet file. Alongside a flattened tabular projection of each record’s fields (for filtering in tools like DuckDB without parsing JSON), every row also carries a json column with the record’s complete serialization, so read_parquet round-trips losslessly.
write_ror_json
Write a ROR-derived record as raw ROR-shaped JSON (as opposed to write("ror", data), which produces InvenioRDM vocabulary YAML).
write_sqlite
Write list as a SQLite3 database with a works table whose columns mirror the commonmeta v1.0 schema. Simple string fields are stored as TEXT; complex fields are stored as compact JSON TEXT. Any existing file at path is deleted first.
write_vraix_table_parquet
Write a VRAIX dump’s transport table (e.g. pid_records) to a single Parquet file’s bytes, using its raw columns (pid, source_id, raw_metadata, …) as-is — not converted to commonmeta Data the way read_vraix_sqlite is. For analytics over the dump itself (e.g. via DataFusion/Polars/DuckDB), not for ingesting it as commonmeta records. batch_size controls how many rows land in each internal Parquet row group (see [formats::commonmeta::write_parquet_all]’s analogous ROW_GROUP_SIZE for why this matters for large dumps).
write_with_style
Like [write], but forwards style and locale to the citation writer. For non-"citation" formats both parameters are ignored.