biblib
biblib is a Rust library for parsing citation exports into a shared data model and deduplicating the resulting records.
It is built for import pipelines, evidence synthesis tooling, registry ingestion, and any workflow that needs to turn citation files from multiple sources into one normalized Citation shape.
What It Supports
biblib currently ships parsers for:
| Source format | Feature | Parser |
|---|---|---|
| RIS | ris |
RisParser |
PubMed / MEDLINE (.nbib) |
pubmed |
PubMedParser |
| EndNote XML | xml |
EndNoteXmlParser |
EndNote Tagged / EndNote Web (.enw) |
enw |
EnwParser |
BibTeX / BibLaTeX (.bib) |
bib |
BibParser |
| Generic CSV / delimited data | csv |
csv::CsvParser |
| ICTRP registry CSV exports | csv |
IctrpCsvParser |
All parser outputs converge on the same Citation struct, including normalized fields such as title, authors, date, doi, accession_number, pmid, pmc_id, urls, and extra_fields.
Installation
[]
= "0.6"
For a smaller build:
[]
= { = "0.6", = false, = ["ris"] }
Quick Start
Parse RIS
use ;
let input = r#"TY - JOUR
TI - Machine Learning in Healthcare
AU - Smith, John
AU - Doe, Jane
PY - 2023
DO - 10.1000/example
ER -"#;
let citations = new.parse.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
Parse PubMed / MEDLINE
use ;
let input = r#"PMID- 12345678
TI - Immunotherapy in Oncology
FAU - Smith, John
JT - Journal of Clinical Research
DP - 2024 Jun 15
AB - Example abstract."#;
let citations = new.parse.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Parse EndNote Tagged (.enw)
use ;
let input = r#"%0 Journal Article
%T Machine Learning in Healthcare
%A Smith, John
%D 2023
%R 10.1000/example
"#;
let citations = new.parse.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Parse BibTeX / BibLaTeX (.bib)
use ;
let input = r#"@article{smith2024,
title = {Machine Learning in Healthcare},
author = {Smith, John and Doe, Jane},
date = {2024-05-02},
doi = {10.1000/example}
}"#;
let citations = new.parse.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Auto-detect Supported Formats
detect_and_parse() currently auto-detects RIS, PubMed, EndNote XML, EndNote Tagged (.enw), BibTeX / BibLaTeX (.bib), and ICTRP CSV. Generic CSV should still be parsed explicitly with CsvParser.
use detect_and_parse;
let input = "TY - JOUR\nTI - Example\nER -";
let = detect_and_parse.unwrap;
assert_eq!;
assert_eq!;
Parse ICTRP CSV
use ;
let input = concat!;
let citations = new.parse.unwrap;
let citation = &citations;
assert_eq!;
assert_eq!;
assert_eq!;
Parse Generic CSV with Custom Headers
use ;
use CitationParser;
let mut config = new;
config
.set_delimiter
.set_header_mapping
.set_header_mapping
.set_header_mapping;
let input = "Article Name;Writers;Published\nExample Paper;Smith, John;2023";
let citations = with_config.parse.unwrap;
assert_eq!;
assert_eq!;
Deduplicate Parsed Records
use ;
use ;
let citations = vec!;
let config = DeduplicatorConfig ;
let groups = new
.with_config
.find_duplicates
.unwrap;
let duplicate_group = groups
.iter
.find
.unwrap;
assert_eq!;
Data Model
The core output type is Citation.
Important fields include:
| Field | Type | Purpose |
|---|---|---|
citation_type |
Vec<String> |
Source and work-type labels |
title |
String |
Main normalized title |
authors |
Vec<Author> |
Parsed people with name parts and affiliations |
journal |
Option<String> |
Full journal or source title |
journal_abbr |
Option<String> |
Journal abbreviation |
date |
Option<Date> |
Year with optional month/day |
volume |
Option<String> |
Volume string |
issue |
Option<String> |
Issue or number string |
pages |
Option<String> |
Normalized page range |
issn |
Vec<String> |
One or more ISSNs/serial identifiers |
doi |
Option<String> |
Normalized DOI |
accession_number |
Option<String> |
Registry or source accession identifier |
pmid |
Option<String> |
PubMed identifier |
pmc_id |
Option<String> |
PubMed Central identifier |
abstract_text |
Option<String> |
Abstract text |
keywords |
Vec<String> |
Parsed keywords |
urls |
Vec<String> |
Collected links |
language |
Option<String> |
Language code or label |
mesh_terms |
Vec<String> |
PubMed MeSH terms |
publisher |
Option<String> |
Publisher or sponsor |
extra_fields |
HashMap<String, Vec<String>> |
Source-specific leftovers preserved raw |
This makes it easy to normalize aggressively where the library has clear semantics, while still keeping source-specific information available.
Feature Flags
| Feature | Enables |
|---|---|
ris |
RIS parser |
pubmed |
PubMed / MEDLINE parser |
xml |
EndNote XML parser |
enw |
EndNote Tagged (.enw) parser |
bib |
BibTeX / BibLaTeX (.bib) parser |
csv |
Generic CSV parser and ICTRP CSV parser |
dedupe |
Deduplication engine |
diagnostics |
Pretty parse diagnostics via ariadne |
Default features: csv, pubmed, xml, ris, enw, bib, dedupe
Since v0.5, biblib no longer uses the regex crate or exposes regex-backend feature flags. It uses regex-lite internally, and regex backend selection is no longer part of the public API surface.
Errors and Diagnostics
All parsers return ParseError on malformed input. Errors carry:
- The source format
- A 1-based line number when available
- A byte span when available
- A structured
ValueError
Example:
use ;
let input = "TY - JOUR\nAU - Smith, John\nER -\n";
match new.parse
For human-friendly diagnostics, enable diagnostics:
[]
= { = "0.6", = ["diagnostics"] }
Then use parse_with_diagnostics():
use ;
let input = "TY - JOUR\nAU - Smith, John\nER -\n";
let rendered = parse_with_diagnostics;
assert!;
Guides
- PARSING_GUIDE.md - format-specific mapping and normalization rules
- DEDUPLICATION_GUIDE.md - duplicate matching behavior and configuration
- docs.rs/biblib - API reference
License
Licensed under either MIT or Apache-2.0, at your option.