Skip to main content

Crate commonmeta

Crate commonmeta 

Source
Expand description

commonmeta — a Rust port of front-matter/commonmeta.

Convert scholarly metadata between formats. The native model is Data; format modules read into it and write out of it.

Re-exports§

pub use data::Data;
pub use error::Error;
pub use error::Result;

Modules§

author_utils
constants
Controlled vocabularies and cross-format type/role translation tables.
crockford
Generate, encode and decode random base32 identifiers. This encoder/decoder:
crossref
data
Core Commonmeta data model.
doi_utils
Utilities for working with DOIs
error
file_utils
progress
schema_utils
JSON Schema and XSD validation utilities.
spdx
SPDX license vocabulary lookup.
utils
vocabularies
Embedded controlled vocabulary data files.

Structs§

AffiliationMatch
A single match result from the ROR affiliation API.
PushResult
The outcome of pushing a single record to InvenioRDM.

Constants§

VERSION

Functions§

convert
Read from one format and write to another in a single call.
convert_citation
Like convert, but passes CSL style and locale through to the citation writer.
fetch_vraix_dump
Fetch commonmeta records from a VRAIX daily dump for from (“crossref” or “datacite”) and date (YYYY-MM-DD).
match_ror_affiliation
Match a free-text affiliation string against ROR organizations using the ROR v2 affiliation endpoint.
push_inveniordm
Create-or-update, then publish, a list of records in InvenioRDM.
put_inveniordm
Create-or-update, then publish, a single record in InvenioRDM.
read
Read a single record from from format, without writing it back out.
read_parquet
Read a list of commonmeta records back from the Parquet schema written by write_parquet. Lossless: each record is restored from its json column, the complete original serialization.
read_sqlite_commonmeta
Read records from a commonmeta SQLite database written by write_sqlite.
read_vraix_sqlite
Read commonmeta records from a VRAIX daily dump SQLite file already on disk at sqlite_path, e.g. an already-downloaded crossref-2026-06-14.sqlite3.
stream_vraix_to_sqlite
Stream a VRAIX daily dump at input_path directly to a commonmeta SQLite database at output_path in batches of 10 000 rows, converting with from-specific parser and writing each batch in a single transaction. limit caps total records written; pass 0 for all rows. Returns the number of records written. No Vec<Data> is held for the whole file — peak memory is proportional to one batch, not the whole dump.
write
Write an already-loaded record to to format.
write_archive
Render list to to format, split into entries of at most batch_size records each — suitable for packing into an archive via file_utils::write_zip_archive/file_utils::write_tar_gz_archive. base_name (e.g. "out.json") names the single entry directly when there’s only one batch, or gets a numbered suffix ("out-00000.json", "out-00001.json", …) when there are several.
write_archive_citation
Like write_archive, but passes CSL style/locale through to the citation writer when to == "citation".
write_list
Render a list of records to to format as a single buffer: a JSON array for object-shaped formats (commonmeta, csl, datacite, inveniordm, schemaorg, ror), or newline-joined output for line/document-shaped formats (e.g. bibtex, ris, crossref_xml).
write_list_citation
Like write_list, but passes CSL style/locale through to the citation writer when to == "citation" (ignored for every other format, same as convert_citation/write_citation).
write_parquet
Write a list of commonmeta records as a single Parquet file. Alongside a flattened tabular projection of each record’s fields (for filtering in tools like DuckDB without parsing JSON), every row also carries a json column with the record’s complete serialization, so read_parquet round-trips losslessly.
write_ror_json
Write a ROR-derived record as raw ROR-shaped JSON (as opposed to write("ror", data), which produces InvenioRDM vocabulary YAML).
write_sqlite
Write list as a SQLite3 database with a works table whose columns mirror the commonmeta v1.0 schema. Simple string fields are stored as TEXT; complex fields are stored as compact JSON TEXT.
write_vraix_table_parquet
Write a VRAIX dump’s transport table (e.g. pid_records) to a single Parquet file’s bytes, using its raw columns (pid, source_id, raw_metadata, …) as-is — not converted to commonmeta Data the way read_vraix_sqlite is. For analytics over the dump itself (e.g. via DataFusion/Polars/DuckDB), not for ingesting it as commonmeta records. batch_size controls how many rows land in each internal Parquet row group (see [formats::commonmeta::write_parquet_all]’s analogous ROW_GROUP_SIZE for why this matters for large dumps).