# mdd_api
Application programming interface to interact with MDD (Mammal Diversity Database) data.
Code documented at [docs.rs/mdd_api](https://docs.rs/mdd_api).
## Overview
This crate provides parsers and lightweight aggregation utilities for turning raw
MDD CSV / TOML release assets into structured Rust data types or JSON suitable
for downstream API or web delivery.
### Core Data Structures
- `MddData` – Single species row from the main MDD CSV (verbatim textual
preservation of taxonomic + distribution + authority fields). Field‑level docs
explain each column.
- `SynonymData` – Historical / alternative names and associated bibliographic
metadata from the synonym CSV. The `MDD_` prefix is removed and headers are
converted to camelCase.
- `CountryMDDStats` – Aggregated per‑country distribution statistics excluding
domesticated and widespread placeholder entries. Predicted occurrences are
marked with a trailing `?` on species IDs.
- `ReleasedMddData` – Compact bundle of simplified species + attached synonyms
(only synonyms that resolve to an accepted species) plus summary `MetaData`.
- `AllMddData` – Full raw species + all synonyms without filtering.
### Release Metadata
A `release.toml` file (see `tests/data/release.toml` for an example) is parsed
into `ReleaseToml` / `ReleaseMetadata` and can be used to drive versioned output.
### Typical Workflow
1. Read the MDD species CSV and parse into `Vec<MddData>` using
`MddData::from_csv` (returns typed structs).
2. Read the synonym CSV and parse into `Vec<SynonymData>` via
`SynonymData::from_csv`.
3. (Optional) Aggregate into a `ReleasedMddData` with
`ReleasedMddData::from_parser` providing the desired version + release date.
4. Serialize to JSON or gzip using standard tooling.
5. (Optional) Build `CountryMDDStats` for geographic summaries.
### Country Statistics
The parser normalizes country / region names via helper code. Unrecognized
names are kept verbatim and a warning is emitted. Species with a distribution of
`domesticated` or `NA` are collected separately and excluded from per‑country
counts.
### Extensibility
The crate intentionally keeps most columns as `String` to avoid lossy
assumptions. Applications needing strict numeric coordinates or enumerated
status codes can layer additional domain models on top.
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
mdd_api = "0.6"
```
Or using `cargo add`:
```powershell
cargo add mdd_api
```
Minimal example parsing inline CSV strings and building a release bundle:
```rust
use mdd_api::parser::{mdd::MddData, synonyms::SynonymData, ReleasedMddData};
let mdd_csv = "id,sciName,mainCommonName,otherCommonNames,phylosort,subclass,infraclass,magnorder,superorder,order,suborder,infraorder,parvorder,superfamily,family,subfamily,tribe,genus,subgenus,specificEpithet,authoritySpeciesAuthor,authoritySpeciesYear,authorityParentheses,originalNameCombination,authoritySpeciesCitation,authoritySpeciesLink,typeVoucher,typeKind,typeVoucherURIs,typeLocality,typeLocalityLatitude,typeLocalityLongitude,nominalNames,taxonomyNotes,taxonomyNotesCitation,distributionNotes,distributionNotesCitation,subregionDistribution,countryDistribution,continentDistribution,biogeographicRealm,iucnStatus,extinct,domestic,flagged,CMW_sciName,diffSinceCMW,MSW3_matchtype,MSW3_sciName,diffSinceMSW3\n1,Panthera leo,Lion,,1,Theria,Eutheria,,Laurasiatheria,Carnivora,,,,Felidae,,,Panthera,,leo,Linnaeus,1758,0,,citation,,voucher,,uri,Locality,,,names,notes,,distNotes,,Subregion,Kenya|Tanzania,Africa,Afrotropic,LC,0,0,0,Name,0,match,Name,diff";
let syn_csv = "MDD_syn_id,hesp_id,species_id,species,root_name,author,year,authority_parentheses,nomenclature_status,validity,original_combination,original_rank,authority_citation,unchecked_authority_citation,sourced_unverified_citations,citation_group,citation_kind,authority_page,authority_link,authority_page_link,unchecked_authority_page_link,old_type_locality,original_type_locality,unchecked_type_locality,emended_type_locality,type_latitude,type_longitude,type_country,type_subregion,type_subregion2,holotype,type_kind,type_specimen_link,order,family,genus,specific_epithet,subspecific_epithet,variant_of,senior_homonym,variant_name_citations,name_usages,comments\n1,0,1,Panthera leo,Panthera leo,Linnaeus,1758,0,,valid,,species,citation,,,,,,link,,,loc,loc2,,loc3,0,0,Country,Sub,Sub2,Holotype,Kind,SpecLink,Carnivora,Felidae,Panthera,leo,,,,,,";
let species = MddData::new().from_csv(mdd_csv);
let synonyms = SynonymData::new().from_csv(syn_csv);
let release = ReleasedMddData::from_parser(species, synonyms, "2025.1", "2025-09-01");
println!("{}", release.to_json());
```
CLI usage (after installing with `cargo install mdd_api` or running from source):
```powershell
# Parse CSVs and output JSON bundle
mdd json --input mdd.csv --synonym synonyms.csv --output ./out --mdd="v2.0" --date 2025-09-01
```
### Zip Quick Start
If you have an official MDD release archive (for example `MDD.zip`) that
contains the species CSV (named like `MDD_v*.csv`), the synonym CSV
(`Species_Syn_v*.csv`), and optionally a `release.toml`, you can parse it in a
single step. The `zip` subcommand currently serves as a convenience entry point
and example; programmatic parsing typically gives you more control.
Programmatic (minimal) example using the internal `ZipParser` logic found in
`main.rs` (API surface may stabilize later):
```rust
use std::fs::File;
use std::path::Path;
use zip::ZipArchive;
use mdd_api::parser::{mdd::MddData, synonyms::SynonymData, ReleasedMddData};
fn parse_from_zip<P: AsRef<Path>>(zip_path: P) -> anyhow::Result<ReleasedMddData> {
// Open the archive
let file = File::open(zip_path)?;
let mut archive = ZipArchive::new(file)?;
// Locate the two core CSV entries (pattern-matching the expected prefixes)
let mut mdd_csv = String::new();
let mut syn_csv = String::new();
for i in 0..archive.len() {
let mut f = archive.by_index(i)?;
let name = f.name().to_string();
if name.starts_with("MDD_v") && name.ends_with(".csv") {
use std::io::Read; f.read_to_string(&mut mdd_csv)?;
} else if name.starts_with("Species_Syn_v") && name.ends_with(".csv") {
use std::io::Read; f.read_to_string(&mut syn_csv)?;
}
}
// Parse into typed rows
let species = MddData::new().from_csv(&mdd_csv);
let synonyms = SynonymData::new().from_csv(&syn_csv);
Ok(ReleasedMddData::from_parser(species, synonyms, "2025.1", "2025-09-01"))
}
```
CLI (auto-detects matching CSV names inside the archive):
```powershell
# Extract and parse directly from a ZIP archive; outputs JSON to current directory
mdd zip --input MDD.zip --output ./out
```
Notes:
- The current `zip` subcommand focuses on demonstration; future versions may
emit multiple artifacts (e.g. filtered JSON, stats) similar to `json`.
- You can still manually unzip then invoke `mdd json -i <species.csv> -s <synonyms.csv>`
if you prefer an explicit pipeline.
## Testing
Run all tests:
```powershell
cargo test
```
## License
See [LICENSE](LICENSE).