Expand description
A comprehensive library for parsing, managing, and deduplicating academic citations.
biblib provides robust functionality for working with academic citations in various formats.
It focuses on accurate parsing, format conversion, and intelligent deduplication of citations.
§Features
The library has several optional features that can be enabled in your Cargo.toml:
csv- Enable CSV format support (enabled by default)pubmed- Enable PubMed/MEDLINE format support (enabled by default)xml- Enable EndNote XML support (enabled by default)ris- Enable RIS format support (enabled by default)dedupe- Enable citation deduplication (enabled by default)
To use only specific features, disable default features and enable just what you need:
[dependencies]
biblib = { version = "0.3.0", default-features = false, features = ["csv", "ris"] }§Key Characteristics
-
Multiple Format Support: Parse citations from:
- RIS (Research Information Systems)
- PubMed/MEDLINE
- EndNote XML
- CSV with configurable mappings
-
Rich Metadata Support:
- Authors with affiliations
- Journal details (name, abbreviation, ISSN)
- DOIs and other identifiers
- Complete citation metadata
§Basic Usage
use biblib::{CitationParser, RisParser};
// Parse RIS format
let input = r#"TY - JOUR
TI - Example Article
AU - Smith, John
ER -"#;
let parser = RisParser::new();
let citations = parser.parse(input).unwrap();
println!("Title: {}", citations[0].title);§Citation Formats
Each format has a dedicated parser with format-specific features:
use biblib::{RisParser, PubMedParser, EndNoteXmlParser, csv::CsvParser};
// RIS format
let ris = RisParser::new();
// PubMed format
let pubmed = PubMedParser::new();
// EndNote XML format
let endnote = EndNoteXmlParser::new();
// CSV format
let csv = CsvParser::new();§Citation Deduplication
use biblib::{Citation, CitationParser, RisParser};
let ris_input = r#"TY - JOUR
TI - Example Citation 1
AU - Smith, John
ER -
TY - JOUR
TI - Example Citation 2
AU - Smith, John
ER -"#;
let parser = RisParser::new();
let mut citations = parser.parse(ris_input).unwrap();
// Configure deduplication
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};
// Configure deduplication
let config = DeduplicatorConfig {
group_by_year: true,
run_in_parallel: true,
..Default::default()
};
let deduplicator = Deduplicator::new().with_config(config);
let duplicate_groups = deduplicator.find_duplicates(&citations).unwrap();
for group in duplicate_groups {
println!("Original: {}", group.unique.title);
for duplicate in group.duplicates {
println!(" Duplicate: {}", duplicate.title);
}
}§Error Handling
The library uses a custom Result type that wraps CitationError for consistent
error handling across all operations:
use biblib::{CitationParser, RisParser, CitationError};
let result = RisParser::new().parse("invalid input");
match result {
Ok(citations) => println!("Parsed {} citations", citations.len()),
Err(e) => eprintln!("Parse error: {}", e),
}§Performance Considerations
- Use year-based grouping for large datasets
- Enable parallel processing for better performance
- Consider using CSV format for very large datasets
§Thread Safety
All parser implementations are thread-safe and can be shared between threads.
The deduplicator supports parallel processing through the run_in_parallel option.
Re-exports§
pub use csv::CsvParser;pub use endnote_xml::EndNoteXmlParser;pub use error::CitationError;pub use error::ParseError;pub use error::ValueError;pub use pubmed::PubMedParser;pub use ris::RisParser;
Modules§
- csv
- CSV format parser implementation.
- dedupe
- Citations deduplicator implementation.
- endnote_
xml - EndNote XML format parser implementation.
- error
- Error types for citation parsing operations.
- pubmed
- PubMed format parser implementation.
- ris
- RIS format parser implementation.
Structs§
- Author
- Represents an author of a citation.
- Citation
- Represents a single citation with its metadata.
- Date
- Represents a publication date with required year and optional month/day components.
- Duplicate
Group - Represents a group of duplicate citations with one unique citation
Enums§
- Citation
Format - Citation format types supported by the library.
Traits§
- Citation
Parser - Trait for implementing citation parsers.
Functions§
- detect_
and_ parse - Format detection and automatic parsing of citation files