biblib
A Rust library for parsing and deduplicating academic citations.
Installation
[]
= "0.4"
For minimal builds:
[]
= { = "0.4", = false, = ["ris", "regex"] }
Supported Formats
| Format | Feature | Description |
|---|---|---|
| RIS | ris |
Research Information Systems format |
| PubMed | pubmed |
MEDLINE/PubMed .nbib files |
| EndNote XML | xml |
EndNote XML export format |
| CSV | csv |
Configurable delimited files |
All format features are enabled by default.
Quick Start
Parsing Citations
use ;
let ris_content = r#"TY - JOUR
TI - Machine Learning in Healthcare
AU - Smith, John
AU - Doe, Jane
PY - 2023
ER -"#;
let parser = new;
let citations = parser.parse.unwrap;
println!;
println!;
Auto-Detecting Format
use detect_and_parse;
let content = "TY - JOUR\nTI - Example\nER -";
let = detect_and_parse.unwrap;
println!; // "RIS"
Deduplicating Citations
use ;
let config = DeduplicatorConfig ;
let deduplicator = new.with_config;
let groups = deduplicator.find_duplicates.unwrap;
for group in groups
CSV with Custom Headers
use ;
use CitationParser;
let mut config = new;
config
.set_delimiter
.set_header_mapping
.set_header_mapping;
let parser = with_config;
let citations = parser.parse.unwrap;
Citation Fields
Each parsed citation contains:
| Field | Type | Description |
|---|---|---|
title |
String |
Work title |
authors |
Vec<Author> |
Authors with name, given name, affiliations |
journal |
Option<String> |
Full journal name |
journal_abbr |
Option<String> |
Journal abbreviation |
date |
Option<Date> |
Year, month, day |
volume |
Option<String> |
Volume number |
issue |
Option<String> |
Issue number |
pages |
Option<String> |
Page range |
doi |
Option<String> |
Digital Object Identifier |
pmid |
Option<String> |
PubMed ID |
pmc_id |
Option<String> |
PubMed Central ID |
issn |
Vec<String> |
ISSNs |
abstract_text |
Option<String> |
Abstract |
keywords |
Vec<String> |
Keywords |
urls |
Vec<String> |
Related URLs |
mesh_terms |
Vec<String> |
MeSH terms (PubMed) |
extra_fields |
HashMap |
Additional format-specific fields |
Features
| Feature | Dependencies | Description |
|---|---|---|
ris |
- | RIS format parser |
pubmed |
- | PubMed/MEDLINE parser |
xml |
quick-xml |
EndNote XML parser |
csv |
csv |
CSV parser |
dedupe |
rayon, strsim |
Deduplication engine |
regex |
regex |
Full regex support |
lite |
regex-lite |
Lightweight regex (smaller binary) |
diagnostics |
ariadne |
Pretty, coloured error output with source context |
Default: all features enabled except lite and diagnostics.
Note: At least one of
regexorlitemust always be enabled — the crate will not compile without one of them. They are mutually exclusive; do not enable both.
Error Handling
All parse errors carry a 1-based line number and, where available, a byte-offset span pointing to the problematic citation record:
use ;
match new.parse
Pretty diagnostics (optional)
Enable the diagnostics feature for human-friendly, coloured output powered by ariadne:
[]
= { = "0.4", = ["diagnostics"] }
use ;
let source = read_to_string?;
match parse_with_diagnostics
You can also call error.to_diagnostic(filename, source) directly on any ParseError.
Documentation
- Parsing Guide — Format-specific tag mappings, date formats, and author handling
- Deduplication Guide — Matching algorithm, similarity thresholds, and configuration
- API Docs — Complete API reference
License
Licensed under either of Apache License 2.0 or MIT at your option.