# mdd_api
Application programming interface to interact with MDD (Mammal Diversity Database) data.
Code documented at [docs.rs/mdd_api](https://docs.rs/mdd_api).
## Overview
This crate provides parsers and lightweight aggregation utilities for turning raw
MDD CSV / TOML release assets into structured Rust data types or JSON suitable
for downstream API or web delivery.
### Core Data Structures
- `MddData` – Single species row from the main MDD CSV (verbatim textual
preservation of taxonomic + distribution + authority fields). Field‑level docs
explain each column.
- `SynonymData` – Historical / alternative names and associated bibliographic
metadata from the synonym CSV. The `MDD_` prefix is removed and headers are
converted to camelCase.
- `CountryMDDStats` – Aggregated per‑country distribution statistics excluding
domesticated and widespread placeholder entries. Predicted occurrences are
marked with a trailing `?` on species IDs.
- `ReleasedMddData` – Compact bundle of simplified species + attached synonyms
(only synonyms that resolve to an accepted species) plus summary `MetaData`.
- `AllMddData` – Full raw species + all synonyms without filtering.
### Release Metadata
A `release.toml` file (see `tests/data/release.toml` for an example) is parsed
into `ReleaseToml` / `ReleaseMetadata` and can be used to drive versioned output.
### Typical Workflow
1. Read the MDD species CSV and parse into `Vec<MddData>` using
`MddData::from_csv` (returns typed structs).
2. Read the synonym CSV and parse into `Vec<SynonymData>` via
`SynonymData::from_csv`.
3. (Optional) Aggregate into a `ReleasedMddData` with
`ReleasedMddData::from_parser` providing the desired version + release date.
4. Serialize to JSON or gzip using standard tooling.
5. (Optional) Build `CountryMDDStats` for geographic summaries.
### Country Statistics
The parser normalizes country / region names via helper code. Unrecognized
names are kept verbatim and a warning is emitted. Species with a distribution of
`domesticated` or `NA` are collected separately and excluded from per‑country
counts.
### Extensibility
The crate intentionally keeps most columns as `String` to avoid lossy
assumptions. Applications needing strict numeric coordinates or enumerated
status codes can layer additional domain models on top.
## Quick Start
### CLI Usage
Install the CLI tool with:
```powershell
cargo install mdd_api
```
#### Unpack and Parse from Zip
If you have an official MDD release archive (for example `MDD.zip`) that
contains the species CSV (named like `MDD_v*.csv`), the synonym CSV
(`Species_Syn_v*.csv`), and optionally a `release.toml`, you can parse it in a
single step.
```powershell
# Extract and parse directly from a ZIP archive; outputs JSON to current directory
mdd unpack --input MDD.zip --output ./out
```
#### Filter by Country
Use the `filter country` subcommand to extract species by country code from an MDD zip file. The country code should be provided in ISO 3166-1 alpha-2 format.
```powershell
# Filter species by country (e.g., Indonesia) and output JSON to ./out
mdd filter country -i MDD.zip -c ID -o ./out/indonesia
#### MIL Data Preparation
You can prepare Mammal Image Library (MIL) metadata and merge it with MDD records. The input MIL metadata file can be in CSV or Excel format, or it can be a compressed archive (`.zip` or `.tar.gz`) containing both the metadata and the images.
```powershell
# Parse MIL and merge with MDD; outputs JSON
mdd mil --mil-file mil_metadata.xlsx --mdd-file MDD.csv --mil-img-dir ./images --output ./output.json
# Parse directly from a compressed MIL release archive (.tar.gz or .zip)
# (automatically extracts metadata and images from the archive)
mdd mil --mil-file mil-v2026-04-30.tar.gz --mdd-file MDD.csv --output ./output.json
```
#### Combined Unpack & Prepare
Use the `prepare` subcommand to unpack a raw MDD release zip archive (generating the standard MDD JSON files) and then prepare the MIL data using the unpacked MDD species CSV in one step.
```powershell
# Unpack MDD zip, parse MIL (csv/xlsx/archive), and output both to ./out directory
# (generates mil_mdd.json under the output directory)
mdd prepare --mdd-zip MDD.zip --mil-file mil-v2026-04-30.tar.gz --output ./out
```
### Library Usage
Add to your `Cargo.toml`:
```toml
[dependencies]
mdd_api = "0.6"
```
Or using `cargo add`:
```powershell
cargo add mdd_api
```
Minimal example parsing inline CSV strings and building a release bundle:
```rust
use mdd_api::parser::{mdd::MddData, synonyms::SynonymData, ReleasedMddData};
let mdd_csv = "id,sciName,mainCommonName,otherCommonNames,phylosort,subclass,infraclass,magnorder,superorder,order,suborder,infraorder,parvorder,superfamily,family,subfamily,tribe,genus,subgenus,specificEpithet,authoritySpeciesAuthor,authoritySpeciesYear,authorityParentheses,originalNameCombination,authoritySpeciesCitation,authoritySpeciesLink,typeVoucher,typeKind,typeVoucherURIs,typeLocality,typeLocalityLatitude,typeLocalityLongitude,nominalNames,taxonomyNotes,taxonomyNotesCitation,distributionNotes,distributionNotesCitation,subregionDistribution,countryDistribution,continentDistribution,biogeographicRealm,iucnStatus,extinct,domestic,flagged,CMW_sciName,diffSinceCMW,MSW3_matchtype,MSW3_sciName,diffSinceMSW3\n1,Panthera leo,Lion,,1,Theria,Eutheria,,Laurasiatheria,Carnivora,,,,Felidae,,,Panthera,,leo,Linnaeus,1758,0,,citation,,voucher,,uri,Locality,,,names,notes,,distNotes,,Subregion,Kenya|Tanzania,Africa,Afrotropic,LC,0,0,0,Name,0,match,Name,diff";
let syn_csv = "MDD_syn_id,hesp_id,species_id,species,root_name,author,year,authority_parentheses,nomenclature_status,validity,original_combination,original_rank,authority_citation,unchecked_authority_citation,sourced_unverified_citations,citation_group,citation_kind,authority_page,authority_link,authority_page_link,unchecked_authority_page_link,old_type_locality,original_type_locality,unchecked_type_locality,emended_type_locality,type_latitude,type_longitude,type_country,type_subregion,type_subregion2,holotype,type_kind,type_specimen_link,order,family,genus,specific_epithet,subspecific_epithet,variant_of,senior_homonym,variant_name_citations,name_usages,comments\n1,0,1,Panthera leo,Panthera leo,Linnaeus,1758,0,,valid,,species,citation,,,,,,link,,,loc,loc2,,loc3,0,0,Country,Sub,Sub2,Holotype,Kind,SpecLink,Carnivora,Felidae,Panthera,leo,,,,,,";
let species = MddData::new().from_csv(mdd_csv);
let synonyms = SynonymData::new().from_csv(syn_csv);
let release = ReleasedMddData::from_parser(species, synonyms, "2025.1", "2025-09-01");
println!("{}", release.to_json());
```
### Zip Quick Start
```rust
use std::fs::File;
use std::path::Path;
use zip::ZipArchive;
use mdd_api::parser::{mdd::MddData, synonyms::SynonymData, ReleasedMddData};
fn parse_from_zip<P: AsRef<Path>>(zip_path: P) -> anyhow::Result<ReleasedMddData> {
// Open the archive
let file = File::open(zip_path)?;
let mut archive = ZipArchive::new(file)?;
// Locate the two core CSV entries (pattern-matching the expected prefixes)
let mut mdd_csv = String::new();
let mut syn_csv = String::new();
for i in 0..archive.len() {
let mut f = archive.by_index(i)?;
let name = f.name().to_string();
if name.starts_with("MDD_v") && name.ends_with(".csv") {
use std::io::Read; f.read_to_string(&mut mdd_csv)?;
} else if name.starts_with("Species_Syn_v") && name.ends_with(".csv") {
use std::io::Read; f.read_to_string(&mut syn_csv)?;
}
}
// Parse into typed rows
let species = MddData::new().from_csv(&mdd_csv);
let synonyms = SynonymData::new().from_csv(&syn_csv);
Ok(ReleasedMddData::from_parser(species, synonyms, "2025.1", "2025-09-01"))
}
```
## Testing
Run all tests:
```powershell
cargo test
```
## License
See [LICENSE](LICENSE).