Skip to main content

Module utils

Module utils 

Source

Enums§

PmcResolver
Selects the URL base for normalize_pmid and normalize_pmcid.

Functions§

camel_case_string
Lowercases the first character of a PascalCase string (PascalCase → camelCase).
camel_case_to_words
Converts a PascalCase / camelCase string to “Title Words” form.
community_slug_as_url
Returns a community slug as a Rogue Scholar API URL.
decode_id
dedupe_slice
Removes duplicate elements from a Vec while preserving order.
find_from_format
Auto-detects the commonmeta reader format from various hints.
find_from_format_by_ext
Detects format by file extension.
find_from_format_by_filename
Detects format by filename.
find_from_format_by_id
Detects format by PID (DOI, URL patterns).
find_from_format_by_string
Detects format by parsing the JSON string and examining key fields.
get_language
Returns a language code in the requested format. format: “iso639-3” for 3-letter code, “name” for English name, otherwise ISO 639-1 alpha-2. Accepts alpha-2, alpha-3, or English name as input.
issn_as_url
Returns an ISSN expressed as a portal.issn.org URL.
kebab_case_to_camel_case
Converts kebab-case to camelCase.
kebab_case_to_pascal_case
Converts kebab-case to PascalCase.
normalize_arxiv
Normalizes an arXiv identifier to a canonical URL: https://arxiv.org/abs/XXXX.
normalize_cc_url
Normalizes a Creative Commons license URL to the canonical /legalcode form. Returns (normalized_url, true) on success, ("", false) otherwise.
normalize_id
Normalizes any PID: DOI → canonical URL, UUID, Wikidata, or plain URL.
normalize_orcid
Returns a normalized ORCID URL.
normalize_organization_id
Normalizes an organization identifier (ROR, Crossref Funder ID, GRID, Wikidata, ISNI).
normalize_person_id
Normalizes a person identifier (ORCID, ISNI, Wikidata).
normalize_pmcid
Normalizes a PMC ID to a canonical URL.
normalize_pmid
Normalizes a PubMed ID to a canonical URL.
normalize_ror
Returns a normalized ROR URL.
normalize_string
Unicode-normalizes a string: NFD decomposition, strip combining diacritics, NFC recompose.
normalize_url
Normalizes a URL: upgrades http→https when secure, lowercases when lower.
normalize_work_id
Normalizes a work identifier (DOI, UUID, URL, Wikidata).
sanitize
Strips HTML, allowing only safe inline elements.
split_string
Inserts sep every n characters.
string_to_slug
Converts a string to a URL slug: normalize, keep only lowercase letters/digits.
title_case
Uppercases only the first character of a string.
validate_crossref_funder_id
Validates a Crossref Funder ID
validate_grid
Validates a GRID ID GRID ID is a string prefixed with grid followed by dot number dot string
validate_id
Validates an identifier and returns the identifier and its type. Type can be: DOI, UUID, PMID, PMCID, OpenAlex, ORCID, ROR, GRID, RID, Wikidata, ISNI, ISSN, Crossref Funder ID, URL, or “”.
validate_id_category
Validates an identifier and additionally returns its category. Category: “Work”, “Person”, “Organization”, “Contributor”, “All”, or “”.
validate_isni
Validates an ISNI ISNI is a 16-character string in blocks of four optionally separated by hyphens or spaces and NOT between 0000-0001-5000-0007 and 0000-0003-5000-0001, or between 0009-0000-0000-0000 and 0009-0010-0000-0000 (the ranged reserved for ORCID).
validate_issn
Validates an ISSN
validate_openalex
Validates an OpenAlex ID. First letter indicates resource type (A author, F funder, I institution, P publisher, S source, W work), followed by 8-10 digits.
validate_orcid
Validates an ORCID ORCID is a 16-character string in blocks of four separated by hyphens between 0000-0001-5000-0007 and 0000-0003-5000-0001, or between 0009-0000-0000-0000 and 0009-0010-0000-0000.
validate_pmcid
Validates a PubMed Central ID (PMCID). Accepts bare numbers, PMC{n}, NCBI PMC URLs, and Europe PMC PMC URLs. Returns the bare numeric part.
validate_pmid
Validates a PubMed ID (PMID).
validate_rid
Validates a RID RID is the unique identifier used by the InvenioRDM platform
validate_ror
Validates a ROR ID The ROR ID starts with 0 followed by a 6-character alphanumeric string which is base32-encoded and a 2-digit checksum.
validate_url
Validates a URL and checks if it is a DOI
validate_uuid
Validates a UUID
validate_wikidata
Validates a Wikidata item ID Wikidata item ID is a string prefixed with Q followed by a number
words_to_camel_case
Converts “words in a string” to camelCase.