Crate biblib

Source
Expand description

A comprehensive library for parsing, managing, and deduplicating academic citations.

biblib provides robust functionality for working with academic citations in various formats. It focuses on accurate parsing, format conversion, and intelligent deduplication of citations.

§Features

The library has several optional features that can be enabled in your Cargo.toml:

  • csv - Enable CSV format support (enabled by default)
  • pubmed - Enable PubMed/MEDLINE format support (enabled by default)
  • xml - Enable EndNote XML support (enabled by default)
  • ris - Enable RIS format support (enabled by default)
  • dedupe - Enable citation deduplication (enabled by default)

To use only specific features, disable default features and enable just what you need:

[dependencies]
biblib = { version = "0.2.0", default-features = false, features = ["csv", "ris"] }

§Key Characteristics

  • Multiple Format Support: Parse citations from:

    • RIS (Research Information Systems)
    • PubMed/MEDLINE
    • EndNote XML
    • CSV with configurable mappings
  • Source Tracking: Each parser can track the source of citations

    • with_source() method available on all parsers
    • Source information preserved in Citation objects
    • Useful for tracking citation origins
  • Rich Metadata Support:

    • Authors with affiliations
    • Journal details (name, abbreviation, ISSN)
    • DOIs and other identifiers
    • Complete citation metadata

§Basic Usage

use biblib::{CitationParser, RisParser};

// Parse RIS format with source tracking
let input = r#"TY  - JOUR
TI  - Example Article
AU  - Smith, John
ER  -"#;

let parser = RisParser::new().with_source("Pubmed");
let citations = parser.parse(input).unwrap();
println!("Title: {}", citations[0].title);
println!("Source: {}", citations[0].source.clone().unwrap());

§Citation Formats

Each format has a dedicated parser with format-specific features:

use biblib::{RisParser, PubMedParser, EndNoteXmlParser, csv::CsvParser};

// RIS format
let ris = RisParser::new();

// PubMed format
let pubmed = PubMedParser::new().with_source("Pubmed");

// EndNote XML
let endnote = EndNoteXmlParser::new().with_source("Google Scholar");

// CSV format
let csv = CsvParser::new().with_source("Cochrane");

§Citation Deduplication

use biblib::{Citation, CitationParser, RisParser};

let ris_input = r#"TY  - JOUR
TI  - Example Citation 1
AU  - Smith, John
ER  -

TY  - JOUR
TI  - Example Citation 2
AU  - Smith, John
ER  -"#;

let parser = RisParser::new();
let mut citations = parser.parse(ris_input).unwrap();

// Configure deduplication
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};

// Configure deduplication
let config = DeduplicatorConfig {
    group_by_year: true,
    run_in_parallel: true,
    source_preferences: vec!["PubMed".to_string(), "Cochrane".to_string()],
};

let deduplicator = Deduplicator::new().with_config(config);
let duplicate_groups = deduplicator.find_duplicates(&citations).unwrap();

for group in duplicate_groups {
    println!("Original: {}", group.unique.title);
    for duplicate in group.duplicates {
        println!("  Duplicate: {}", duplicate.title);
    }
}

§Error Handling

The library uses a custom Result type that wraps CitationError for consistent error handling across all operations:

use biblib::{CitationParser, RisParser, CitationError};

let result = RisParser::new().parse("invalid input");
match result {
    Ok(citations) => println!("Parsed {} citations", citations.len()),
    Err(CitationError::InvalidFormat(msg)) => eprintln!("Parse error: {}", msg),
    Err(e) => eprintln!("Other error: {}", e),
}

§Performance Considerations

  • Use year-based grouping for large datasets
  • Enable parallel processing for better performance
  • Consider using CSV format for very large datasets

§Thread Safety

All parser implementations are thread-safe and can be shared between threads. The deduplicator supports parallel processing through the run_in_parallel option.

Re-exports§

pub use csv::CsvParser;
pub use endnote_xml::EndNoteXmlParser;
pub use pubmed::PubMedParser;
pub use ris::RisParser;

Modules§

csv
CSV format parser implementation with source tracking support.
dedupe
Citations deduplicator implementation.
endnote_xml
EndNote XML format parser implementation with source tracking support.
pubmed
PubMed format parser implementation with source tracking support.
ris
RIS format parser implementation with source tracking support.

Structs§

Author
Represents an author of a citation.
Citation
Represents a single citation with its metadata.
DuplicateGroup
Represents a group of duplicate citations with one unique citation

Enums§

CitationError
Represents errors that can occur during citation parsing.

Traits§

CitationParser
Trait for implementing citation parsers.

Functions§

detect_and_parse
Format detection and automatic parsing of citation files

Type Aliases§

Result
A specialized Result type for citation operations.