Expand description
A comprehensive library for parsing, managing, and deduplicating academic citations.
biblib provides robust functionality for working with academic citations in various formats.
It focuses on accurate parsing, format conversion, and intelligent deduplication of citations.
§Features
The library has several optional features that can be enabled in your Cargo.toml:
csv- Enable CSV format support (enabled by default)pubmed- Enable PubMed/MEDLINE format support (enabled by default)xml- Enable EndNote XML support (enabled by default)ris- Enable RIS format support (enabled by default)dedupe- Enable citation deduplication (enabled by default)
To use only specific features, disable default features and enable just what you need:
[dependencies]
biblib = { version = "0.2.0", default-features = false, features = ["csv", "ris"] }§Key Characteristics
-
Multiple Format Support: Parse citations from:
- RIS (Research Information Systems)
- PubMed/MEDLINE
- EndNote XML
- CSV with configurable mappings
-
Source Tracking: Each parser can track the source of citations
with_source()method available on all parsers- Source information preserved in Citation objects
- Useful for tracking citation origins
-
Rich Metadata Support:
- Authors with affiliations
- Journal details (name, abbreviation, ISSN)
- DOIs and other identifiers
- Complete citation metadata
§Basic Usage
use biblib::{CitationParser, RisParser};
// Parse RIS format with source tracking
let input = r#"TY - JOUR
TI - Example Article
AU - Smith, John
ER -"#;
let parser = RisParser::new().with_source("Pubmed");
let citations = parser.parse(input).unwrap();
println!("Title: {}", citations[0].title);
println!("Source: {}", citations[0].source.clone().unwrap());§Citation Formats
Each format has a dedicated parser with format-specific features:
use biblib::{RisParser, PubMedParser, EndNoteXmlParser, csv::CsvParser};
// RIS format
let ris = RisParser::new();
// PubMed format
let pubmed = PubMedParser::new().with_source("Pubmed");
// EndNote XML
let endnote = EndNoteXmlParser::new().with_source("Google Scholar");
// CSV format
let csv = CsvParser::new().with_source("Cochrane");§Citation Deduplication
use biblib::{Citation, CitationParser, RisParser};
let ris_input = r#"TY - JOUR
TI - Example Citation 1
AU - Smith, John
ER -
TY - JOUR
TI - Example Citation 2
AU - Smith, John
ER -"#;
let parser = RisParser::new();
let mut citations = parser.parse(ris_input).unwrap();
// Configure deduplication
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};
// Configure deduplication
let config = DeduplicatorConfig {
group_by_year: true,
run_in_parallel: true,
source_preferences: vec!["PubMed".to_string(), "Cochrane".to_string()],
};
let deduplicator = Deduplicator::new().with_config(config);
let duplicate_groups = deduplicator.find_duplicates(&citations).unwrap();
for group in duplicate_groups {
println!("Original: {}", group.unique.title);
for duplicate in group.duplicates {
println!(" Duplicate: {}", duplicate.title);
}
}§Error Handling
The library uses a custom Result type that wraps CitationError for consistent
error handling across all operations:
use biblib::{CitationParser, RisParser, CitationError};
let result = RisParser::new().parse("invalid input");
match result {
Ok(citations) => println!("Parsed {} citations", citations.len()),
Err(CitationError::InvalidFormat(msg)) => eprintln!("Parse error: {}", msg),
Err(e) => eprintln!("Other error: {}", e),
}§Performance Considerations
- Use year-based grouping for large datasets
- Enable parallel processing for better performance
- Consider using CSV format for very large datasets
§Thread Safety
All parser implementations are thread-safe and can be shared between threads.
The deduplicator supports parallel processing through the run_in_parallel option.
Re-exports§
pub use csv::CsvParser;pub use endnote_xml::EndNoteXmlParser;pub use pubmed::PubMedParser;pub use ris::RisParser;
Modules§
- csv
- CSV format parser implementation with source tracking support.
- dedupe
- Citations deduplicator implementation.
- endnote_
xml - EndNote XML format parser implementation with source tracking support.
- pubmed
- PubMed format parser implementation with source tracking support.
- ris
- RIS format parser implementation with source tracking support.
Structs§
- Author
- Represents an author of a citation.
- Citation
- Represents a single citation with its metadata.
- Duplicate
Group - Represents a group of duplicate citations with one unique citation
Enums§
- Citation
Error - Represents errors that can occur during citation parsing.
Traits§
- Citation
Parser - Trait for implementing citation parsers.
Functions§
- detect_
and_ parse - Format detection and automatic parsing of citation files
Type Aliases§
- Result
- A specialized Result type for citation operations.