Expand description
A comprehensive library for parsing, managing, and deduplicating academic citations.
biblib parses citation exports from multiple sources into one normalized
Citation model, then optionally deduplicates the result set.
It is designed for ingestion pipelines, review tooling, registry imports, and any workflow that needs to reconcile heterogeneous citation files.
§What You Get
- Dedicated parsers for RIS, PubMed / MEDLINE, EndNote XML, ICTRP XML,
EndNote Tagged (
.enw), BibTeX / BibLaTeX (.bib), generic CSV, and ICTRP CSV exports - A shared
Citationoutput type with normalized identifiers such as DOI, PMID, PMCID, andaccession_number - Preservation of source-specific leftovers through
extra_fields - Optional duplicate detection via
dedupe::Deduplicator - Optional human-friendly parse diagnostics with the
diagnosticsfeature
§Quick Start
use biblib::{CitationParser, RisParser};
let input = r#"TY - JOUR
TI - Example Article
AU - Smith, John
DO - 10.1000/example
ER -"#;
let citations = RisParser::new().parse(input).unwrap();
assert_eq!(citations.len(), 1);
assert_eq!(citations[0].title, "Example Article");
assert_eq!(citations[0].doi.as_deref(), Some("10.1000/example"));§Supported Parsers
use biblib::{
BibParser, CitationParser, EndNoteXmlParser, EnwParser, IctrpXmlParser, PubMedParser,
RisParser,
};
use biblib::csv::CsvParser;
let _ris = RisParser::new();
let _pubmed = PubMedParser::new();
let _endnote = EndNoteXmlParser::new();
let _ictrp_xml = IctrpXmlParser::new();
let _enw = EnwParser::new();
let _bib = BibParser::new();
let _csv = CsvParser::new();§Auto-Detection
detect_and_parse currently auto-detects RIS, PubMed, ICTRP XML,
EndNote XML, EndNote Tagged, BibTeX / BibLaTeX, and ICTRP CSV. ICTRP XML
is the preferred ICTRP ingestion path; ICTRP CSV remains for backward
compatibility. Generic CSV remains explicit because header mapping is
application-specific.
use biblib::detect_and_parse;
let input = "TY - JOUR\nTI - Example\nER -";
let (citations, format) = detect_and_parse(input).unwrap();
assert_eq!(format.as_str(), "RIS");
assert_eq!(citations[0].title, "Example");§Feature Flags
Disable default features when you only need a subset of parsers:
[dependencies]
biblib = { version = "0.7", default-features = false, features = ["ris", "csv"] }Available public features:
rispubmedxmlcsvenwbibdedupediagnostics
Since v0.5, biblib no longer uses the regex crate or exposes regex
backend feature flags. It uses regex-lite internally, and regex backend
choice is not part of the public feature surface.
§Deduplication
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};
use biblib::{Citation, Date};
let citations = vec![
Citation {
title: "Example Title".to_string(),
doi: Some("10.1000/example".to_string()),
date: Some(Date { year: 2023, month: None, day: None }),
journal: Some("Example Journal".to_string()),
..Default::default()
},
Citation {
title: "Example Title".to_string(),
doi: Some("10.1000/example".to_string()),
date: Some(Date { year: 2023, month: None, day: None }),
journal: Some("Example Journal".to_string()),
..Default::default()
},
];
let config = DeduplicatorConfig {
group_by_year: true,
run_in_parallel: true,
source_preferences: vec!["PubMed".to_string()],
};
let groups = Deduplicator::new()
.with_config(config)
.find_duplicates(&citations)
.unwrap();
let duplicate_group = groups
.iter()
.find(|group| group.unique.doi.as_deref() == Some("10.1000/example"))
.unwrap();
assert_eq!(duplicate_group.duplicates.len(), 1);§Errors and Diagnostics
Parsers return ParseError with line numbers and, when available, source
spans.
use biblib::{CitationParser, RisParser, ValueError};
let input = "TY - JOUR\nAU - Smith, John\nER -\n";
let err = RisParser::new().parse(input).unwrap_err();
assert_eq!(err.line, Some(1));
assert!(matches!(err.error, ValueError::MissingValue { key: "TI", .. }));Re-exports§
pub use bib::BibParser;pub use csv::CsvParser;pub use csv::IctrpCsvParser;Deprecated pub use endnote_xml::EndNoteXmlParser;pub use enw::EnwParser;pub use error::CitationError;pub use error::ParseError;pub use error::SourceSpan;pub use error::ValueError;pub use pubmed::PubMedParser;pub use ris::RisParser;
Modules§
- bib
- BibTeX / BibLaTeX (
.bib) parser implementation. - csv
- CSV format parser implementation.
- dedupe
- Citations deduplicator implementation.
- endnote_
xml - EndNote XML format parser implementation.
- enw
- EndNote Tagged (
.enw) parser implementation. - error
- Error types for citation parsing operations.
- ictrp_
xml - pubmed
- PubMed format parser implementation.
- ris
- RIS format parser implementation.
Structs§
- Author
- Represents an author of a citation.
- Citation
- Represents a single citation with its metadata.
- Date
- Represents a publication date with required year and optional month/day components.
- Duplicate
Group - Represents a group of duplicate citations with one unique citation
- Ictrp
XmlParser - Parser for ICTRP XML exports.
Enums§
- Citation
Format - Citation format types supported by the library.
Traits§
- Citation
Parser - Trait for implementing citation parsers.
Functions§
- detect_
and_ parse - Format detection and automatic parsing of citation files