# biblib
[](https://crates.io/crates/biblib)
[](https://docs.rs/biblib)
[](LICENSE-MIT)
A Rust library for parsing and deduplicating academic citations.
## Installation
```toml
[dependencies]
biblib = "0.3.0"
```
For minimal builds:
```toml
[dependencies]
biblib = { version = "0.3.0", default-features = false, features = ["ris"] }
```
## Supported Formats
| RIS | `ris` | Research Information Systems format |
| PubMed | `pubmed` | MEDLINE/PubMed `.nbib` files |
| EndNote XML | `xml` | EndNote XML export format |
| CSV | `csv` | Configurable delimited files |
All format features are enabled by default.
## Quick Start
### Parsing Citations
```rust
use biblib::{CitationParser, RisParser};
let ris_content = r#"TY - JOUR
TI - Machine Learning in Healthcare
AU - Smith, John
AU - Doe, Jane
PY - 2023
ER -"#;
let parser = RisParser::new();
let citations = parser.parse(ris_content).unwrap();
println!("Title: {}", citations[0].title);
println!("Authors: {:?}", citations[0].authors);
```
### Auto-Detecting Format
```rust
use biblib::detect_and_parse;
let content = "TY - JOUR\nTI - Example\nER -";
let (citations, format) = detect_and_parse(content).unwrap();
println!("Detected format: {}", format); // "RIS"
```
### Deduplicating Citations
```rust
use biblib::dedupe::{Deduplicator, DeduplicatorConfig};
let config = DeduplicatorConfig {
group_by_year: true, // Group by year for performance
run_in_parallel: true, // Use parallel processing
source_preferences: vec!["PubMed".to_string()], // Prefer PubMed records
};
let deduplicator = Deduplicator::new().with_config(config);
let groups = deduplicator.find_duplicates(&citations).unwrap();
for group in groups {
if !group.duplicates.is_empty() {
println!("Kept: {}", group.unique.title);
println!("Duplicates: {}", group.duplicates.len());
}
}
```
### CSV with Custom Headers
```rust
use biblib::csv::{CsvParser, CsvConfig};
use biblib::CitationParser;
let mut config = CsvConfig::new();
config
.set_delimiter(b';')
.set_header_mapping("title", vec!["Article Name".to_string()])
.set_header_mapping("authors", vec!["Writers".to_string()]);
let parser = CsvParser::with_config(config);
let citations = parser.parse("Article Name;Writers\nMy Paper;Smith J").unwrap();
```
## Citation Fields
Each parsed citation contains:
| `title` | `String` | Work title |
| `authors` | `Vec<Author>` | Authors with name, given name, affiliations |
| `journal` | `Option<String>` | Full journal name |
| `journal_abbr` | `Option<String>` | Journal abbreviation |
| `date` | `Option<Date>` | Year, month, day |
| `volume` | `Option<String>` | Volume number |
| `issue` | `Option<String>` | Issue number |
| `pages` | `Option<String>` | Page range |
| `doi` | `Option<String>` | Digital Object Identifier |
| `pmid` | `Option<String>` | PubMed ID |
| `pmc_id` | `Option<String>` | PubMed Central ID |
| `issn` | `Vec<String>` | ISSNs |
| `abstract_text` | `Option<String>` | Abstract |
| `keywords` | `Vec<String>` | Keywords |
| `urls` | `Vec<String>` | Related URLs |
| `mesh_terms` | `Vec<String>` | MeSH terms (PubMed) |
| `extra_fields` | `HashMap` | Additional format-specific fields |
## Features
| `ris` | - | RIS format parser |
| `pubmed` | - | PubMed/MEDLINE parser |
| `xml` | `quick-xml` | EndNote XML parser |
| `csv` | `csv` | CSV parser |
| `dedupe` | `rayon`, `strsim` | Deduplication engine |
| `regex` | `regex` | Full regex support |
| `lite` | `regex-lite` | Lightweight regex (smaller binary) |
Default: all features enabled except `lite`.
## Documentation
- **[Parsing Guide](PARSING_GUIDE.md)** — Format-specific tag mappings, date formats, and author handling
- **[Deduplication Guide](DEDUPLICATION_GUIDE.md)** — Matching algorithm, similarity thresholds, and configuration
- **[API Docs](https://docs.rs/biblib)** — Complete API reference
## License
Licensed under either of [Apache License 2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT) at your option.