mdd_api
Application programming interface to interact with MDD (Mammal Diversity Database) data.
Code documented at docs.rs/mdd_api.
Overview
This crate provides parsers and lightweight aggregation utilities for turning raw MDD CSV / TOML release assets into structured Rust data types or JSON suitable for downstream API or web delivery.
Core Data Structures
MddData
– Single species row from the main MDD CSV (verbatim textual preservation of taxonomic + distribution + authority fields). Field‑level docs explain each column.SynonymData
– Historical / alternative names and associated bibliographic metadata from the synonym CSV. TheMDD_
prefix is removed and headers are converted to camelCase.CountryMDDStats
– Aggregated per‑country distribution statistics excluding domesticated and widespread placeholder entries. Predicted occurrences are marked with a trailing?
on species IDs.ReleasedMddData
– Compact bundle of simplified species + attached synonyms (only synonyms that resolve to an accepted species) plus summaryMetaData
.AllMddData
– Full raw species + all synonyms without filtering.
Release Metadata
A release.toml
file (see tests/data/release.toml
for an example) is parsed
into ReleaseToml
/ ReleaseMetadata
and can be used to drive versioned output.
Typical Workflow
- Read the MDD species CSV and parse into
Vec<MddData>
usingMddData::from_csv
(returns typed structs). - Read the synonym CSV and parse into
Vec<SynonymData>
viaSynonymData::from_csv
. - (Optional) Aggregate into a
ReleasedMddData
withReleasedMddData::from_parser
providing the desired version + release date. - Serialize to JSON or gzip using standard tooling.
- (Optional) Build
CountryMDDStats
for geographic summaries.
Country Statistics
The parser normalizes country / region names via helper code. Unrecognized
names are kept verbatim and a warning is emitted. Species with a distribution of
domesticated
or NA
are collected separately and excluded from per‑country
counts.
Extensibility
The crate intentionally keeps most columns as String
to avoid lossy
assumptions. Applications needing strict numeric coordinates or enumerated
status codes can layer additional domain models on top.
Quick Start
Add to your Cargo.toml
:
[]
= "0.6"
Or using cargo add
:
cargo add mdd_api
Minimal example parsing inline CSV strings and building a release bundle:
use mdd_api::parser::{mdd::MddData, synonyms::SynonymData, ReleasedMddData};
let mdd_csv = "id,sciName,mainCommonName,otherCommonNames,phylosort,subclass,infraclass,magnorder,superorder,order,suborder,infraorder,parvorder,superfamily,family,subfamily,tribe,genus,subgenus,specificEpithet,authoritySpeciesAuthor,authoritySpeciesYear,authorityParentheses,originalNameCombination,authoritySpeciesCitation,authoritySpeciesLink,typeVoucher,typeKind,typeVoucherURIs,typeLocality,typeLocalityLatitude,typeLocalityLongitude,nominalNames,taxonomyNotes,taxonomyNotesCitation,distributionNotes,distributionNotesCitation,subregionDistribution,countryDistribution,continentDistribution,biogeographicRealm,iucnStatus,extinct,domestic,flagged,CMW_sciName,diffSinceCMW,MSW3_matchtype,MSW3_sciName,diffSinceMSW3\n1,Panthera leo,Lion,,1,Theria,Eutheria,,Laurasiatheria,Carnivora,,,,Felidae,,,Panthera,,leo,Linnaeus,1758,0,,citation,,voucher,,uri,Locality,,,names,notes,,distNotes,,Subregion,Kenya|Tanzania,Africa,Afrotropic,LC,0,0,0,Name,0,match,Name,diff";
let syn_csv = "MDD_syn_id,hesp_id,species_id,species,root_name,author,year,authority_parentheses,nomenclature_status,validity,original_combination,original_rank,authority_citation,unchecked_authority_citation,sourced_unverified_citations,citation_group,citation_kind,authority_page,authority_link,authority_page_link,unchecked_authority_page_link,old_type_locality,original_type_locality,unchecked_type_locality,emended_type_locality,type_latitude,type_longitude,type_country,type_subregion,type_subregion2,holotype,type_kind,type_specimen_link,order,family,genus,specific_epithet,subspecific_epithet,variant_of,senior_homonym,variant_name_citations,name_usages,comments\n1,0,1,Panthera leo,Panthera leo,Linnaeus,1758,0,,valid,,species,citation,,,,,,link,,,loc,loc2,,loc3,0,0,Country,Sub,Sub2,Holotype,Kind,SpecLink,Carnivora,Felidae,Panthera,leo,,,,,,";
let species = MddData::new().from_csv(mdd_csv);
let synonyms = SynonymData::new().from_csv(syn_csv);
let release = ReleasedMddData::from_parser(species, synonyms, "2025.1", "2025-09-01");
println!("{}", release.to_json());
CLI usage (after installing with cargo install mdd_api
or running from source):
# Parse CSVs and output JSON bundle
mdd json --input mdd.csv --synonym synonyms.csv --output ./out --mdd="v2.0" --date 2025-09-01
Zip Quick Start
If you have an official MDD release archive (for example MDD.zip
) that
contains the species CSV (named like MDD_v*.csv
), the synonym CSV
(Species_Syn_v*.csv
), and optionally a release.toml
, you can parse it in a
single step. The zip
subcommand currently serves as a convenience entry point
and example; programmatic parsing typically gives you more control.
Programmatic (minimal) example using the internal ZipParser
logic found in
main.rs
(API surface may stabilize later):
use File;
use Path;
use ZipArchive;
use ;
CLI (auto-detects matching CSV names inside the archive):
# Extract and parse directly from a ZIP archive; outputs JSON to current directory
mdd zip --input MDD.zip --output ./out
Notes:
- The current
zip
subcommand focuses on demonstration; future versions may emit multiple artifacts (e.g. filtered JSON, stats) similar tojson
. - You can still manually unzip then invoke
mdd json -i <species.csv> -s <synonyms.csv>
if you prefer an explicit pipeline.
Testing
Run all tests:
cargo test
License
See LICENSE.