commonmeta 0.8.1

Library for conversions to/from the Commonmeta scholarly metadata format
Documentation

commonmeta-rs

commonmeta-rs is a Rust library to implement Commonmeta, the common Metadata Model for Scholarly Metadata. Use commonmeta to convert scholarly metadata in a variety of formats, listed below. Commonmeta-rs is work in progress, the first release was on June 17, 2026. Implementations in other languages are also available (Go, Python, Ruby).

Supported Metadata Formats

Commonmeta-rs reads and/or writes these metadata formats:

Format Name Content Type Read Write
Commonmeta commonmeta application/vnd.commonmeta+json yes yes
CrossRef XML crossref_xml application/vnd.crossref.unixref+xml yes yes
Crossref crossref application/vnd.crossref+json yes yes
DataCite datacite application/vnd.datacite.datacite+json yes yes
DataCite XML datacite_xml application/vnd.datacite.datacite+xml yes yes
Schema.org (in JSON-LD) schema_org application/vnd.schemaorg.ld+json yes yes
RDF XML rdf_xml application/rdf+xml no later
RDF Turtle turtle text/turtle no later
CSL-JSON csl application/vnd.citationstyles.csl+json yes yes
Formatted text citation citation text/x-bibliography n/a yes
Codemeta codemeta application/vnd.codemeta.ld+json yes later
Citation File Format (CFF) cff application/vnd.cff+yaml yes later
JATS jats application/vnd.jats+xml later later
CSV csv text/csv no later
BibTex bibtex application/x-bibtex yes yes
RIS ris application/x-research-info-systems yes yes
InvenioRDM inveniordm application/vnd.inveniordm.v1+json yes yes
JSON Feed jsonfeed application/feed+json yes later
OpenAlex openalex n/a yes no

commonmeta: the Commonmeta format is the native format for the library and used internally. later: we plan to implement this format in a later release.

Build & run

cargo build
cargo test

The commonmeta binary has eight subcommands: convert, encode, decode, import, list, push, put, and match.

# Encode/decode a Crockford base32 identifier suffix given a DOI prefix
cargo run -- encode 10.5555
cargo run -- decode 10.5555/nwbyp-29t86

# Convert a single record between formats, fetching it by DOI
cargo run -- convert 10.5555/12345678 --from crossref --to csl

# Convert a local file and write the result to disk
cargo run -- convert record.json --from commonmeta --to csl --file out.json

# Render a formatted citation (CSL style + locale)
cargo run -- convert 10.5555/12345678 --from crossref --to citation --style apa --locale en-US

# Fetch a batch of records from an API and write them as a commonmeta JSON array
cargo run -- list --from crossref --number 100 --type journal-article --file out.json

# Read all records from a local VRAIX SQLite file and convert to another format
cargo run -- list crossref-2026-06-15.sqlite3 --number 0 --to commonmeta --file out.json.gz

# Parquet output (.parquet file extension, --to commonmeta only): records are split into batches of 100,000, written in parallel, and zstd-compressed
cargo run --release -- list crossref-2026-06-15.sqlite3 --number 0 --file out.parquet

# Import a single record by DOI into the local commonmeta database (source auto-detected)
cargo run -- import 10.7554/elife.01567

# Import all Crossref records for a ROR-identified institution (paginates through all results)
cargo run -- import --from crossref --ror 00pd74e08

# Import all records from a Crossref VRAIX daily dump
cargo run -- import --from crossref --date 2026-06-15

# See the Local database section below for the full import command reference
# including annual public data files (Crossref torrent, DataCite TAR).

# Register records with a live InvenioRDM instance (creates/updates and publishes
# real records — registration is currently only supported with --to inveniordm)
cargo run -- push --from crossref --number 10 --to inveniordm --host rogue-scholar.org --token TOKEN

# Same as push, but for a single record (DOI, URL, or file path)
cargo run -- put 10.5555/12345678 --from crossref --to inveniordm --host rogue-scholar.org --token TOKEN

# Match a free-text affiliation string to a ROR organization (uses local DB when available)
cargo run -- match "Leibniz Universität Hannover"
cargo run -- match "Leibniz Universität Hannover" --to inveniordm

# Look up a ROR organization (uses local DB when available)
cargo run -- convert https://ror.org/02nr0ka47
cargo run -- convert https://ror.org/02nr0ka47 --to inveniordm

# Work fully offline — fails fast if a network call would be required
cargo run -- convert record.json --from commonmeta --to csl --no-network
cargo run -- list crossref-2026-06-15.sqlite3 --no-network --file out.json
cargo run -- import crossref-2026-06-15.sqlite3 --no-network
cargo run -- match "Leibniz Universität Hannover" --no-network

Use cargo run -- <subcommand> --help for the full list of options for each subcommand.

--no-network flag

convert, list, import, and match all accept a --no-network flag. When set, any operation that would make an outbound HTTP request is rejected immediately with a clear error message. Operations on local files always succeed regardless of this flag. push and put always require network access and do not expose this flag.

Local database

The import command populates a local commonmeta SQLite database with scholarly metadata records. All imports upsert — existing records are updated rather than replaced. The database is also used by match and convert for offline lookups.

# Import a single record by DOI (source auto-detected from the DOI prefix)
commonmeta import 10.7554/elife.01567
commonmeta import https://doi.org/10.7554/elife.01567

# Import all Crossref records for an institution (ROR ID, paginates automatically)
commonmeta import --from crossref --ror 00pd74e08

# Import all DataCite records for an author (ORCID, paginates automatically)
commonmeta import --from datacite --orcid 0000-0003-1419-2405

# Import a full daily dump (downloads from metadata.vraix.org)
commonmeta import --from crossref --date 2026-06-15
commonmeta import --from datacite --date 2026-06-15

# Import from a locally downloaded VRAIX dump (source auto-detected from filename)
commonmeta import crossref-2026-06-15.sqlite3

# Import the Crossref annual public data file (~223 GB)
# Option A: Academic Torrents (aria2c required, free)
commonmeta import --from crossref
commonmeta import --from crossref --sample   # first 5 files only (~40 MB)
# Option B: AWS S3 requester-pays bucket (aws CLI + credentials required, ~$18)
# Bucket: s3://api-snapshots-reqpays-crossref   see https://www.crossref.org/documentation/retrieve-metadata/bulk-downloads/
# TAR cached at ~/Library/Caches/commonmeta/crossref/crossref-annual-s3.tar
commonmeta import --from crossref --s3

# Import the DataCite annual public data file (108 M records, 33 GB compressed)
# First run: obtain a time-limited download URL by submitting your email at
#   https://datafiles.datacite.org/datafiles/public-2025
# The TAR archive is cached at ~/Library/Caches/commonmeta/datacite/public-2025.tar
# for subsequent re-imports without a new token.
commonmeta import "https://datafiles.datacite.org/datafiles/public-2025/download?token=<TOKEN>"
commonmeta import "https://datafiles.datacite.org/datafiles/public-2025/download?token=<TOKEN>" --sample
# Re-import or re-parse from cache (no token needed after the first download):
commonmeta import --from datacite
commonmeta import --from datacite --sample

# Import the full VRAIX pidbox dump
commonmeta import --from pidbox

# Import latest ROR organization data
commonmeta import --from ror

The database path is resolved in this order:

  1. COMMONMETA_DB environment variable
  2. Platform default:
Platform Default path
macOS ~/Library/Application Support/commonmeta/commonmeta.sqlite3
Linux /var/lib/commonmeta/commonmeta.sqlite3
# Use a custom path via environment variable
COMMONMETA_DB=/data/commonmeta.sqlite3 commonmeta import --from crossref --date 2026-06-15

Validate

The validate command checks records in the local database against the commonmeta v1.0 JSON schema and reports any violations. Each failing record shows the JSON Pointer to the offending field and a short description of the constraint that was violated.

Errors are persisted in a validation_errors table inside the database so that --recheck can quickly re-run only the records that failed last time.

# Validate all records
commonmeta validate

# Validate only DataCite records
commonmeta validate --from datacite

# Validate only DataCite datasets
commonmeta validate --from datacite --type Dataset

# Validate the first 1 000 records
commonmeta validate --number 1000

# Repair invalid records in-place (re-applies schema normalization)
commonmeta validate --fix

# Re-validate only records that failed in the previous run
commonmeta validate --recheck

# Repair only previously-failing records
commonmeta validate --recheck --fix

# Write errors as JSONL to a file instead of stderr
commonmeta validate --report errors.jsonl

# Validate a different database
commonmeta validate /path/to/other.sqlite3

Options

Option Description
--from / -f Filter by provider (crossref, datacite, openalex).
--type Filter by work type, e.g. Dataset, JournalArticle.
--number / -n Maximum number of records to check (default: all).
--fix Attempt to repair invalid records in-place. Applies prepare() normalization: removes non-ROR organization ids, clears invalid URIs, deduplicates geo-locations, normalizes EISSN → ISSN, etc. Repaired records are removed from validation_errors; records that cannot be repaired remain.
--recheck Only re-validate records listed in the validation_errors table from the previous run. Combine with --fix for an efficient repair loop.
--report Write errors as JSONL (one {"id": "…", "errors": […]} object per record) to the given file instead of printing to stderr.

Repair loop

A typical workflow for cleaning up an imported database:

# 1. Full first pass — saves all failures to validation_errors
commonmeta validate --from datacite --fix

# 2. Subsequent passes — only re-checks and re-repairs the remaining failures
commonmeta validate --recheck --fix

The command exits with a non-zero status if any records remain invalid after the run.

Documentation

Documentation (work in progress) for using the library is available at the commonmeta-rs Documentation website.

Meta

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License: MIT