commonmeta-rs
commonmeta-rs is a Rust library to implement Commonmeta, the common Metadata Model for Scholarly Metadata. Use commonmeta to convert scholarly metadata in a variety of formats, listed below. Commonmeta-rs is work in progress, the first release was on June 17, 2026. Implementations in other languages are also available (Go, Python, Ruby).
Supported Metadata Formats
Commonmeta-rs reads and/or writes these metadata formats:
| Format | Name | Content Type | Read | Write |
|---|---|---|---|---|
| Commonmeta | commonmeta | application/vnd.commonmeta+json | yes | yes |
| CrossRef XML | crossref_xml | application/vnd.crossref.unixref+xml | yes | yes |
| Crossref | crossref | application/vnd.crossref+json | yes | yes |
| DataCite | datacite | application/vnd.datacite.datacite+json | yes | yes |
| DataCite XML | datacite_xml | application/vnd.datacite.datacite+xml | yes | yes |
| Schema.org (in JSON-LD) | schema_org | application/vnd.schemaorg.ld+json | yes | yes |
| RDF XML | rdf_xml | application/rdf+xml | no | later |
| RDF Turtle | turtle | text/turtle | no | later |
| CSL-JSON | csl | application/vnd.citationstyles.csl+json | yes | yes |
| Formatted text citation | citation | text/x-bibliography | n/a | yes |
| Codemeta | codemeta | application/vnd.codemeta.ld+json | yes | later |
| Citation File Format (CFF) | cff | application/vnd.cff+yaml | yes | later |
| JATS | jats | application/vnd.jats+xml | later | later |
| CSV | csv | text/csv | no | later |
| BibTex | bibtex | application/x-bibtex | yes | yes |
| RIS | ris | application/x-research-info-systems | yes | yes |
| InvenioRDM | inveniordm | application/vnd.inveniordm.v1+json | yes | yes |
| JSON Feed | jsonfeed | application/feed+json | yes | later |
| OpenAlex | openalex | n/a | yes | no |
commonmeta: the Commonmeta format is the native format for the library and used internally. later: we plan to implement this format in a later release.
Build & run
The commonmeta binary has eight subcommands: convert, encode, decode, import, list, push, put, and match.
# Encode/decode a Crockford base32 identifier suffix given a DOI prefix
# Convert a single record between formats, fetching it by DOI
# Convert a local file and write the result to disk
# Render a formatted citation (CSL style + locale)
# Fetch a batch of records from an API and write them as a commonmeta JSON array
# Read all records from a local VRAIX SQLite file and convert to another format
# Parquet output (.parquet file extension, --to commonmeta only): records are split into batches of 100,000, written in parallel, and zstd-compressed
# Import a single record by DOI into the local commonmeta database (source auto-detected)
# Import all Crossref records for a ROR-identified institution (paginates through all results)
# Import all records from a Crossref VRAIX daily dump
# See the Local database section below for the full import command reference
# including annual public data files (Crossref torrent, DataCite TAR).
# Register records with a live InvenioRDM instance (creates/updates and publishes
# real records — registration is currently only supported with --to inveniordm)
# Same as push, but for a single record (DOI, URL, or file path)
# Match a free-text affiliation string to a ROR organization (uses local DB when available)
# Look up a ROR organization (uses local DB when available)
# Work fully offline — fails fast if a network call would be required
Use cargo run -- <subcommand> --help for the full list of options for each subcommand.
--no-network flag
convert, list, import, and match all accept a --no-network flag. When set, any
operation that would make an outbound HTTP request is rejected immediately with a clear error
message. Operations on local files always succeed regardless of this flag. push and put
always require network access and do not expose this flag.
Local database
The import command populates a local commonmeta SQLite database with scholarly metadata records. All imports upsert — existing records are updated rather than replaced. The database is also used by match and convert for offline lookups.
# Import a single record by DOI (source auto-detected from the DOI prefix)
# Import all Crossref records for an institution (ROR ID, paginates automatically)
# Import all DataCite records for an author (ORCID, paginates automatically)
# Import a full daily dump (downloads from metadata.vraix.org)
# Import from a locally downloaded VRAIX dump (source auto-detected from filename)
# Import the Crossref annual public data file (~223 GB) via Academic Torrents (aria2c required)
# Import the DataCite annual public data file (108 M records, 33 GB compressed)
# First run: obtain a time-limited download URL by submitting your email at
# https://datafiles.datacite.org/datafiles/public-2025
# The TAR archive is cached at ~/Library/Caches/commonmeta/datacite/public-2025.tar
# for subsequent re-imports without a new token.
# Re-import or re-parse from cache (no token needed after the first download):
# Import the full VRAIX pidbox dump
# Import latest ROR organization data
The database path is resolved in this order:
COMMONMETA_DBenvironment variable- Platform default:
| Platform | Default path |
|---|---|
| macOS | ~/Library/Application Support/commonmeta/commonmeta.sqlite3 |
| Linux | /var/lib/commonmeta/commonmeta.sqlite3 |
# Use a custom path via environment variable
COMMONMETA_DB=/data/commonmeta.sqlite3
Validate
The validate command checks records in the local database against the commonmeta v1.0 JSON schema and reports any violations. Each failing record shows the JSON Pointer to the offending field and a short description of the constraint that was violated.
Errors are persisted in a validation_errors table inside the database so that --recheck can quickly re-run only the records that failed last time.
# Validate all records
# Validate only DataCite records
# Validate only DataCite datasets
# Validate the first 1 000 records
# Repair invalid records in-place (re-applies schema normalization)
# Re-validate only records that failed in the previous run
# Repair only previously-failing records
# Write errors as JSONL to a file instead of stderr
# Validate a different database
Options
| Option | Description |
|---|---|
--from / -f |
Filter by provider (crossref, datacite, openalex). |
--type |
Filter by work type, e.g. Dataset, JournalArticle. |
--number / -n |
Maximum number of records to check (default: all). |
--fix |
Attempt to repair invalid records in-place. Applies prepare() normalization: removes non-ROR organization ids, clears invalid URIs, deduplicates geo-locations, normalizes EISSN → ISSN, etc. Repaired records are removed from validation_errors; records that cannot be repaired remain. |
--recheck |
Only re-validate records listed in the validation_errors table from the previous run. Combine with --fix for an efficient repair loop. |
--report |
Write errors as JSONL (one {"id": "…", "errors": […]} object per record) to the given file instead of printing to stderr. |
Repair loop
A typical workflow for cleaning up an imported database:
# 1. Full first pass — saves all failures to validation_errors
# 2. Subsequent passes — only re-checks and re-repairs the remaining failures
The command exits with a non-zero status if any records remain invalid after the run.
Documentation
Documentation (work in progress) for using the library is available at the commonmeta-rs Documentation website.
Meta
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
License: MIT