Colonizer
Rust crate and CLI to work with the Catalogue of Life (CoL) via the ChecklistBank API.
Features
- Fetch the latest official CoL release dataset key automatically.
- Resolve CoL usage ID for an exact scientific name.
- List usages at a given rank with automatic pagination.
- List roots and children in the taxonomic tree.
- Show a taxon’s classification chain.
- Suggest usages for partial queries.
- Retrieve vernacular (common) names.
- Download full CoL packages from the static download server.
- Inspire mode: pick a random vernacular name for a given language (optionally one-word only), hyphenize it for crate names, and include a short Wikipedia summary plus Wikipedia/Wikidata links when available.
Install
- From crates.io (binary + library):
cargo install colonizer - Build from source:
cargo build --release
CLI Usage
colonizer latest— print latest CoL dataset key.colonizer id "Homo sapiens"— print CoL ID for the name.colonizer list-rank GENUS --max 100— list genera (ID, label, rank).colonizer roots— list tree roots (e.g., domains).colonizer children Homo --rank GENUS— list children of a taxon by name (or--by-id 636X2).colonizer classify "Homo sapiens" --rank SPECIES— show classification chain.colonizer suggest "homo sa" --limit 10— quick suggestions.colonizer vernacular "Homo sapiens" --rank SPECIES— common names.colonizer inspire --lang fra [--one-word]— print a random hyphenized vernacular, taxonID, scientificName, and a CoL link. Adds a short Wikipedia summary and links when available.--dataset <key>— operate on a specific dataset instead of the latest CoL release.--json— return JSON instead of TSV.
Inspire (random vernacular names)
- Text output example:
colonizer inspire --lang frascalaire 6FXL3 Epitonium clathrus https://www.catalogueoflife.org/data/taxon/6FXL3[fr wiki] Epitonium clathrus, le scalaire, est une espèce ...https://fr.wikipedia.org/wiki/Epitonium_clathrushttps://www.wikidata.org/wiki/Q1995213
- JSON output example:
colonizer --json inspire --lang eng --one-word{ "lang": "eng", "vernacularName": "bogue", "vernacularHyphenized": "bogue", "taxonID": "MHY3", "scientificName": "Boops boops", "link": "https://www.catalogueoflife.org/data/taxon/MHY3", "oneWord": true, "wikipediaLang": "en", "wikipediaSummary": "Boops boops, commonly called the boce, ...", "wikipediaUrl": "https://en.wikipedia.org/wiki/Boops_boops", "wikidataUrl": "https://www.wikidata.org/wiki/Q950498" }
- Options:
--lang <iso-639-2>: vernacular language (e.g.,fra,eng,spa). Defaults tofra.--one-word: only return vernaculars that are a single token (no spaces). Useful for short crate names; retries internally if needed.
- Notes:
- The random pick is API-based; it does not download any TSV locally.
- Some datasets include vernaculars whose
taxonIDis not a valid ChecklistBank usage id. The command validates IDs and retries when needed. - Wikipedia summary is best-effort: it tries your requested language (mapped from the 3-letter code) and falls back to English. If no page/summary is found, the wiki lines are omitted.
Downloading the full CoL
-
Easiest and fastest: use the static downloads hosted by ChecklistBank.
- Latest monthly release:
- DWCA:
curl -L -o col_latest_dwca.zip https://download.checklistbank.org/col/latest_dwca.zip - CoLDP:
curl -L -o col_latest_coldp.zip https://download.checklistbank.org/col/latest_coldp.zip - Text tree:
curl -L -o col_latest_txtree.zip https://download.checklistbank.org/col/latest_txtree.zip
- DWCA:
- Specific month (YYYY-MM-DD):
curl -L -o col_2025-08-20_dwca.zip https://download.checklistbank.org/col/monthly/2025-08-20_dwca.zip
- Latest monthly release:
-
From the CLI:
- Latest DWCA:
colonizer download dwca --latest - Match your selected dataset’s month (default):
colonizer download coldp - Specific date:
colonizer download txtree --date 2025-08-20
- Latest DWCA:
-
When to use the API export:
- If you need custom subsets (e.g., a clade root), include/exclude synonyms, or alternate formats beyond the static defaults, POST an export via the API (
/dataset/{key}/export) and then GET/export/{id}to fetch the file. The CLI may add this as an advanced option.
- If you need custom subsets (e.g., a clade root), include/exclude synonyms, or alternate formats beyond the static defaults, POST an export via the API (
Library Usage
use ColClient;
Notes
- Listing by rank across all taxa can be extremely large (e.g., species). Use
--maxto cap results. - The crate targets
https://api.checklistbank.org, which powers the Catalogue of Life. - When using name-based targets for commands like
children,classify, andvernacular, add--rankto disambiguate homonyms (e.g.,GENUS,SPECIES). Use--by-idto pass a usage ID directly.