Crate biblib

Crate biblib 

Source
Expand description

A comprehensive library for parsing, managing, and deduplicating academic citations.

biblib provides robust functionality for working with academic citations in various formats. It focuses on accurate parsing, format conversion, and intelligent deduplication of citations.

§Features

The library has several optional features that can be enabled in your Cargo.toml:

  • csv - Enable CSV format support (enabled by default)
  • pubmed - Enable PubMed/MEDLINE format support (enabled by default)
  • xml - Enable EndNote XML support (enabled by default)
  • ris - Enable RIS format support (enabled by default)
  • dedupe - Enable citation deduplication (enabled by default)

To use only specific features, disable default features and enable just what you need:

[dependencies]
biblib = { version = "0.3.0", default-features = false, features = ["csv", "ris"] }

§Key Characteristics

  • Multiple Format Support: Parse citations from:

    • RIS (Research Information Systems)
    • PubMed/MEDLINE
    • EndNote XML
    • CSV with configurable mappings
  • Rich Metadata Support:

    • Authors with affiliations
    • Journal details (name, abbreviation, ISSN)
    • DOIs and other identifiers
    • Complete citation metadata

§Basic Usage

use biblib::{CitationParser, RisParser};

// Parse RIS format
let input = r#"TY  - JOUR
TI  - Example Article
AU  - Smith, John
ER  -"#;

let parser = RisParser::new();
let citations = parser.parse(input).unwrap();
println!("Title: {}", citations[0].title);

§Citation Formats

Each format has a dedicated parser with format-specific features:

use biblib::{RisParser, PubMedParser, EndNoteXmlParser, csv::CsvParser};

// RIS format
let ris = RisParser::new();

// PubMed format
let pubmed = PubMedParser::new();

// EndNote XML format
let endnote = EndNoteXmlParser::new();

// CSV format
let csv = CsvParser::new();

§Citation Deduplication

use biblib::{Citation, CitationParser, RisParser};

let ris_input = r#"TY  - JOUR
TI  - Example Citation 1
AU  - Smith, John
ER  -

TY  - JOUR
TI  - Example Citation 2
AU  - Smith, John
ER  -"#;

let parser = RisParser::new();
let mut citations = parser.parse(ris_input).unwrap();

// Configure deduplication
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};

// Configure deduplication
let config = DeduplicatorConfig {
    group_by_year: true,
    run_in_parallel: true,
    ..Default::default()
};

let deduplicator = Deduplicator::new().with_config(config);
let duplicate_groups = deduplicator.find_duplicates(&citations).unwrap();

for group in duplicate_groups {
    println!("Original: {}", group.unique.title);
    for duplicate in group.duplicates {
        println!("  Duplicate: {}", duplicate.title);
    }
}

§Error Handling

The library uses a custom Result type that wraps CitationError for consistent error handling across all operations:

use biblib::{CitationParser, RisParser, CitationError};

let result = RisParser::new().parse("invalid input");
match result {
    Ok(citations) => println!("Parsed {} citations", citations.len()),
    Err(e) => eprintln!("Parse error: {}", e),
}

§Performance Considerations

  • Use year-based grouping for large datasets
  • Enable parallel processing for better performance
  • Consider using CSV format for very large datasets

§Thread Safety

All parser implementations are thread-safe and can be shared between threads. The deduplicator supports parallel processing through the run_in_parallel option.

Re-exports§

pub use csv::CsvParser;
pub use endnote_xml::EndNoteXmlParser;
pub use error::CitationError;
pub use error::ParseError;
pub use error::ValueError;
pub use pubmed::PubMedParser;
pub use ris::RisParser;

Modules§

csv
CSV format parser implementation.
dedupe
Citations deduplicator implementation.
endnote_xml
EndNote XML format parser implementation.
error
Error types for citation parsing operations.
pubmed
PubMed format parser implementation.
ris
RIS format parser implementation.

Structs§

Author
Represents an author of a citation.
Citation
Represents a single citation with its metadata.
Date
Represents a publication date with required year and optional month/day components.
DuplicateGroup
Represents a group of duplicate citations with one unique citation

Enums§

CitationFormat
Citation format types supported by the library.

Traits§

CitationParser
Trait for implementing citation parsers.

Functions§

detect_and_parse
Format detection and automatic parsing of citation files