Skip to main content

Crate biblib

Crate biblib 

Source
Expand description

A comprehensive library for parsing, managing, and deduplicating academic citations.

biblib parses citation exports from multiple sources into one normalized Citation model, then optionally deduplicates the result set.

It is designed for ingestion pipelines, review tooling, registry imports, and any workflow that needs to reconcile heterogeneous citation files.

§What You Get

  • Dedicated parsers for RIS, PubMed / MEDLINE, EndNote XML, ICTRP XML, EndNote Tagged (.enw), BibTeX / BibLaTeX (.bib), generic CSV, and ICTRP CSV exports
  • A shared Citation output type with normalized identifiers such as DOI, PMID, PMCID, and accession_number
  • Preservation of source-specific leftovers through extra_fields
  • Optional duplicate detection via dedupe::Deduplicator
  • Optional human-friendly parse diagnostics with the diagnostics feature

§Quick Start

use biblib::{CitationParser, RisParser};

let input = r#"TY  - JOUR
TI  - Example Article
AU  - Smith, John
DO  - 10.1000/example
ER  -"#;

let citations = RisParser::new().parse(input).unwrap();

assert_eq!(citations.len(), 1);
assert_eq!(citations[0].title, "Example Article");
assert_eq!(citations[0].doi.as_deref(), Some("10.1000/example"));

§Supported Parsers

use biblib::{
    BibParser, CitationParser, EndNoteXmlParser, EnwParser, IctrpXmlParser, PubMedParser,
    RisParser,
};
use biblib::csv::CsvParser;

let _ris = RisParser::new();
let _pubmed = PubMedParser::new();
let _endnote = EndNoteXmlParser::new();
let _ictrp_xml = IctrpXmlParser::new();
let _enw = EnwParser::new();
let _bib = BibParser::new();
let _csv = CsvParser::new();

§Auto-Detection

detect_and_parse currently auto-detects RIS, PubMed, ICTRP XML, EndNote XML, EndNote Tagged, BibTeX / BibLaTeX, and ICTRP CSV. ICTRP XML is the preferred ICTRP ingestion path; ICTRP CSV remains for backward compatibility. Generic CSV remains explicit because header mapping is application-specific.

use biblib::detect_and_parse;

let input = "TY  - JOUR\nTI  - Example\nER  -";
let (citations, format) = detect_and_parse(input).unwrap();

assert_eq!(format.as_str(), "RIS");
assert_eq!(citations[0].title, "Example");

§Feature Flags

Disable default features when you only need a subset of parsers:

[dependencies]
biblib = { version = "0.7", default-features = false, features = ["ris", "csv"] }

Available public features:

  • ris
  • pubmed
  • xml
  • csv
  • enw
  • bib
  • dedupe
  • diagnostics

Since v0.5, biblib no longer uses the regex crate or exposes regex backend feature flags. It uses regex-lite internally, and regex backend choice is not part of the public feature surface.

§Deduplication

use biblib::dedupe::{Deduplicator, DeduplicatorConfig};
use biblib::{Citation, Date};

let citations = vec![
    Citation {
        title: "Example Title".to_string(),
        doi: Some("10.1000/example".to_string()),
        date: Some(Date { year: 2023, month: None, day: None }),
        journal: Some("Example Journal".to_string()),
        ..Default::default()
    },
    Citation {
        title: "Example Title".to_string(),
        doi: Some("10.1000/example".to_string()),
        date: Some(Date { year: 2023, month: None, day: None }),
        journal: Some("Example Journal".to_string()),
        ..Default::default()
    },
];

let config = DeduplicatorConfig {
    group_by_year: true,
    run_in_parallel: true,
    source_preferences: vec!["PubMed".to_string()],
};

let groups = Deduplicator::new()
    .with_config(config)
    .find_duplicates(&citations)
    .unwrap();

let duplicate_group = groups
    .iter()
    .find(|group| group.unique.doi.as_deref() == Some("10.1000/example"))
    .unwrap();

assert_eq!(duplicate_group.duplicates.len(), 1);

§Errors and Diagnostics

Parsers return ParseError with line numbers and, when available, source spans.

use biblib::{CitationParser, RisParser, ValueError};

let input = "TY  - JOUR\nAU  - Smith, John\nER  -\n";
let err = RisParser::new().parse(input).unwrap_err();

assert_eq!(err.line, Some(1));
assert!(matches!(err.error, ValueError::MissingValue { key: "TI", .. }));

Re-exports§

pub use bib::BibParser;
pub use csv::CsvParser;
pub use csv::IctrpCsvParser;Deprecated
pub use endnote_xml::EndNoteXmlParser;
pub use enw::EnwParser;
pub use error::CitationError;
pub use error::ParseError;
pub use error::SourceSpan;
pub use error::ValueError;
pub use pubmed::PubMedParser;
pub use ris::RisParser;

Modules§

bib
BibTeX / BibLaTeX (.bib) parser implementation.
csv
CSV format parser implementation.
dedupe
Citations deduplicator implementation.
endnote_xml
EndNote XML format parser implementation.
enw
EndNote Tagged (.enw) parser implementation.
error
Error types for citation parsing operations.
ictrp_xml
pubmed
PubMed format parser implementation.
ris
RIS format parser implementation.

Structs§

Author
Represents an author of a citation.
Citation
Represents a single citation with its metadata.
Date
Represents a publication date with required year and optional month/day components.
DuplicateGroup
Represents a group of duplicate citations with one unique citation
IctrpXmlParser
Parser for ICTRP XML exports.

Enums§

CitationFormat
Citation format types supported by the library.

Traits§

CitationParser
Trait for implementing citation parsers.

Functions§

detect_and_parse
Format detection and automatic parsing of citation files