colonizer 0.1.0

Catalogue of Life (ChecklistBank) client + CLI: search usages, browse tree, vernacular names, and an Inspire mode for crate-name ideas (with Wikipedia summaries).
Documentation
Colonizer
=========

Rust crate and CLI to work with the Catalogue of Life (CoL) via the ChecklistBank API.

Features
- Fetch the latest official CoL release dataset key automatically.
- Resolve CoL usage ID for an exact scientific name.
- List usages at a given rank with automatic pagination.
- List roots and children in the taxonomic tree.
- Show a taxon’s classification chain.
- Suggest usages for partial queries.
- Retrieve vernacular (common) names.
- Download full CoL packages from the static download server.
- Inspire mode: pick a random vernacular name for a given language (optionally one-word only), hyphenize it for crate names, and include a short Wikipedia summary plus Wikipedia/Wikidata links when available.

Install
- From crates.io (binary + library): `cargo install colonizer`
- Build from source: `cargo build --release`

CLI Usage
- `colonizer latest` — print latest CoL dataset key.
- `colonizer id "Homo sapiens"` — print CoL ID for the name.
- `colonizer list-rank GENUS --max 100` — list genera (ID, label, rank).
- `colonizer roots` — list tree roots (e.g., domains).
- `colonizer children Homo --rank GENUS` — list children of a taxon by name (or `--by-id 636X2`).
- `colonizer classify "Homo sapiens" --rank SPECIES` — show classification chain.
- `colonizer suggest "homo sa" --limit 10` — quick suggestions.
- `colonizer vernacular "Homo sapiens" --rank SPECIES` — common names.
- `colonizer inspire --lang fra [--one-word]` — print a random hyphenized vernacular, taxonID, scientificName, and a CoL link. Adds a short Wikipedia summary and links when available.
- `--dataset <key>` — operate on a specific dataset instead of the latest CoL release.
- `--json` — return JSON instead of TSV.

Inspire (random vernacular names)
- Text output example:
  - `colonizer inspire --lang fra`
    - `scalaire	6FXL3	Epitonium clathrus	https://www.catalogueoflife.org/data/taxon/6FXL3`
    - `[fr wiki] Epitonium clathrus, le scalaire, est une espèce ...`
    - `https://fr.wikipedia.org/wiki/Epitonium_clathrus`
    - `https://www.wikidata.org/wiki/Q1995213`
- JSON output example:
  - `colonizer --json inspire --lang eng --one-word`
    - `{ "lang": "eng", "vernacularName": "bogue", "vernacularHyphenized": "bogue", "taxonID": "MHY3", "scientificName": "Boops boops", "link": "https://www.catalogueoflife.org/data/taxon/MHY3", "oneWord": true, "wikipediaLang": "en", "wikipediaSummary": "Boops boops, commonly called the boce, ...", "wikipediaUrl": "https://en.wikipedia.org/wiki/Boops_boops", "wikidataUrl": "https://www.wikidata.org/wiki/Q950498" }`
- Options:
  - `--lang <iso-639-2>`: vernacular language (e.g., `fra`, `eng`, `spa`). Defaults to `fra`.
  - `--one-word`: only return vernaculars that are a single token (no spaces). Useful for short crate names; retries internally if needed.
- Notes:
  - The random pick is API-based; it does not download any TSV locally.
  - Some datasets include vernaculars whose `taxonID` is not a valid ChecklistBank usage id. The command validates IDs and retries when needed.
  - Wikipedia summary is best-effort: it tries your requested language (mapped from the 3-letter code) and falls back to English. If no page/summary is found, the wiki lines are omitted.

Downloading the full CoL
- Easiest and fastest: use the static downloads hosted by ChecklistBank.
  - Latest monthly release:
    - DWCA: `curl -L -o col_latest_dwca.zip https://download.checklistbank.org/col/latest_dwca.zip`
    - CoLDP: `curl -L -o col_latest_coldp.zip https://download.checklistbank.org/col/latest_coldp.zip`
    - Text tree: `curl -L -o col_latest_txtree.zip https://download.checklistbank.org/col/latest_txtree.zip`
  - Specific month (YYYY-MM-DD):
    - `curl -L -o col_2025-08-20_dwca.zip https://download.checklistbank.org/col/monthly/2025-08-20_dwca.zip`

- From the CLI:
  - Latest DWCA: `colonizer download dwca --latest`
  - Match your selected dataset’s month (default): `colonizer download coldp`
  - Specific date: `colonizer download txtree --date 2025-08-20`

- When to use the API export:
  - If you need custom subsets (e.g., a clade root), include/exclude synonyms, or alternate formats beyond the static defaults, POST an export via the API (`/dataset/{key}/export`) and then GET `/export/{id}` to fetch the file. The CLI may add this as an advanced option.

Library Usage
```rust
use colonizer::ColClient;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let col = ColClient::from_latest()?;
    let id = col.id_for_name("Homo sapiens", None)?;
    println!("id: {:?}", id);

    let genera = col.list_by_rank("GENUS", Some(100))?;
    println!("{} genera fetched", genera.len());
    Ok(())
}
```

Notes
- Listing by rank across all taxa can be extremely large (e.g., species). Use `--max` to cap results.
- The crate targets `https://api.checklistbank.org`, which powers the Catalogue of Life.
- When using name-based targets for commands like `children`, `classify`, and `vernacular`, add `--rank` to disambiguate homonyms (e.g., `GENUS`, `SPECIES`). Use `--by-id` to pass a usage ID directly.