panlabel 0.7.0 - Docs.rs

# CLI reference

Everything you can do with the `panlabel` command line.

## Global

- Binary name: `panlabel`
- Version: `panlabel -V`
- Help: `panlabel --help` and `panlabel <command> --help`

## Machine-readable output

Panlabel now has one cross-command spelling for structured stdout: `--output-format`.

- Read-only commands use `--output-format` and also accept `--output` as a backward-compatible alias:
  - `validate`
  - `stats`
  - `diff`
  - `list-formats`
- `convert` and `sample` use `--output-format <text|json>` for report formatting because `-o/--output` is already the filesystem output path.
- `convert` and `sample` also accept `--report <text|json>` as a backward-compatible alias.
- In JSON mode, structured payloads go to stdout. Fatal errors still go to stderr.
- JSON is pretty-printed when stdout is an interactive terminal, and compact when stdout is piped or captured.
- `stats` text output is rich/Unicode on a terminal, but switches to a plain text layout (ASCII framing/bars, no box-drawing or emoji) when stdout is piped or captured.

## Commands

### `validate`

Validate a dataset path and print a validation report.

- Positional: `input` (path; file or directory depending on format)
- `--format <format>` (default: `ir-json`)
  - supported values: `ir-json`, `coco`, `coco-json`, `ibm-cloud-annotations`, `cloud-annotations`, `cloud-annotations-json`, `ibm-cloud-annotations-json`, `cvat`, `cvat-xml`, `label-studio`, `label-studio-json`, `ls`, `labelbox`, `labelbox-json`, `labelbox-ndjson`, `scale-ai`, `scale`, `scale-ai-json`, `unity-perception`, `unity`, `unity-perception-json`, `solo`, `tfod`, `tfod-csv`, `tfrecord`, `tfrecords`, `tf-record`, `tfod-tfrecord`, `tfod-tfrerecord`, `vott-csv`, `vott`, `vott-json`, `vott-json-export`, `yolo`, `ultralytics`, `yolov8`, `yolov5`, `scaled-yolov4`, `scaled-yolov4-txt`, `yolo-keras`, `yolo-keras-txt`, `keras-yolo`, `yolov4-pytorch`, `yolov4-pytorch-txt`, `pytorch-yolov4`, `voc`, `pascal-voc`, `voc-xml`, `hf`, `hf-imagefolder`, `huggingface`, `sagemaker`, `sagemaker-manifest`, `sagemaker-ground-truth`, `ground-truth`, `groundtruth`, `aws-sagemaker`, `labelme`, `labelme-json`, `superannotate`, `superannotate-json`, `sa`, `supervisely`, `supervisely-json`, `sly`, `cityscapes`, `cityscapes-json`, `marmot`, `marmot-xml`, `create-ml`, `createml`, `create-ml-json`, `kitti`, `kitti-txt`, `via`, `via-json`, `vgg-via`, `retinanet`, `retinanet-csv`, `keras-retinanet`, `openimages`, `openimages-csv`, `open-images`, `kaggle-wheat`, `kaggle-wheat-csv`, `automl-vision`, `automl-vision-csv`, `google-cloud-automl`, `udacity`, `udacity-csv`, `self-driving-car`, `datumaro`, `datumaro-json`, `datumaro-dataset`, `wider-face`, `widerface`, `wider-face-txt`, `oidv4`, `oidv4-txt`, `openimages-v4-txt`, `oid`, `bdd100k`, `bdd100k-json`, `scalabel`, `scalabel-json`, `v7-darwin`, `darwin`, `darwin-json`, `v7`, `edge-impulse`, `edge-impulse-labels`, `edge-impulse-bounding-boxes`, `bounding-boxes-labels`, `openlabel`, `asam-openlabel`, `openlabel-json`, `asam-openlabel-json`, `via-csv`, `vgg-via-csv`
- `--strict` (treat warnings as errors)
- `--output-format <text|json>` (default: `text`)
- `--output <text|json>` (backward-compatible alias)

Invalid `--format` and output mode values are rejected by clap at parse time.


### `convert`

Convert annotations between formats using IR as the internal hub.

- `--from`, `-f`: `auto`, `ir-json`, `coco`, `coco-json`, `ibm-cloud-annotations`, `cloud-annotations`, `cloud-annotations-json`, `ibm-cloud-annotations-json`, `cvat`, `cvat-xml`, `label-studio`, `label-studio-json`, `ls`, `labelbox`, `labelbox-json`, `labelbox-ndjson`, `scale-ai`, `scale`, `scale-ai-json`, `unity-perception`, `unity`, `unity-perception-json`, `solo`, `tfod`, `tfod-csv`, `tfrecord`, `tfrecords`, `tf-record`, `tfod-tfrecord`, `tfod-tfrerecord`, `vott-csv`, `vott`, `vott-json`, `vott-json-export`, `yolo`, `ultralytics`, `yolov8`, `yolov5`, `scaled-yolov4`, `scaled-yolov4-txt`, `yolo-keras`, `yolo-keras-txt`, `keras-yolo`, `yolov4-pytorch`, `yolov4-pytorch-txt`, `pytorch-yolov4`, `voc`, `pascal-voc`, `voc-xml`, `hf`, `hf-imagefolder`, `huggingface`, `sagemaker`, `sagemaker-manifest`, `sagemaker-ground-truth`, `ground-truth`, `groundtruth`, `aws-sagemaker`, `labelme`, `labelme-json`, `superannotate`, `superannotate-json`, `sa`, `supervisely`, `supervisely-json`, `sly`, `cityscapes`, `cityscapes-json`, `marmot`, `marmot-xml`, `create-ml`, `createml`, `create-ml-json`, `kitti`, `kitti-txt`, `via`, `via-json`, `vgg-via`, `retinanet`, `retinanet-csv`, `keras-retinanet`, `openimages`, `openimages-csv`, `open-images`, `kaggle-wheat`, `kaggle-wheat-csv`, `automl-vision`, `automl-vision-csv`, `google-cloud-automl`, `udacity`, `udacity-csv`, `self-driving-car`, `datumaro`, `datumaro-json`, `datumaro-dataset`, `wider-face`, `widerface`, `wider-face-txt`, `oidv4`, `oidv4-txt`, `openimages-v4-txt`, `oid`, `bdd100k`, `bdd100k-json`, `scalabel`, `scalabel-json`, `v7-darwin`, `darwin`, `darwin-json`, `v7`, `edge-impulse`, `edge-impulse-labels`, `edge-impulse-bounding-boxes`, `bounding-boxes-labels`, `openlabel`, `asam-openlabel`, `openlabel-json`, `asam-openlabel-json`, `via-csv`, `vgg-via-csv`
- `--to`, `-t`: `ir-json`, `coco`, `coco-json`, `ibm-cloud-annotations`, `cloud-annotations`, `cloud-annotations-json`, `ibm-cloud-annotations-json`, `cvat`, `cvat-xml`, `label-studio`, `label-studio-json`, `ls`, `labelbox`, `labelbox-json`, `labelbox-ndjson`, `scale-ai`, `scale`, `scale-ai-json`, `unity-perception`, `unity`, `unity-perception-json`, `solo`, `tfod`, `tfod-csv`, `tfrecord`, `tfrecords`, `tf-record`, `tfod-tfrecord`, `tfod-tfrerecord`, `vott-csv`, `vott`, `vott-json`, `vott-json-export`, `yolo`, `ultralytics`, `yolov8`, `yolov5`, `scaled-yolov4`, `scaled-yolov4-txt`, `yolo-keras`, `yolo-keras-txt`, `keras-yolo`, `yolov4-pytorch`, `yolov4-pytorch-txt`, `pytorch-yolov4`, `voc`, `pascal-voc`, `voc-xml`, `hf`, `hf-imagefolder`, `huggingface`, `sagemaker`, `sagemaker-manifest`, `sagemaker-ground-truth`, `ground-truth`, `groundtruth`, `aws-sagemaker`, `labelme`, `labelme-json`, `superannotate`, `superannotate-json`, `sa`, `supervisely`, `supervisely-json`, `sly`, `cityscapes`, `cityscapes-json`, `marmot`, `marmot-xml`, `create-ml`, `createml`, `create-ml-json`, `kitti`, `kitti-txt`, `via`, `via-json`, `vgg-via`, `retinanet`, `retinanet-csv`, `keras-retinanet`, `openimages`, `openimages-csv`, `open-images`, `kaggle-wheat`, `kaggle-wheat-csv`, `automl-vision`, `automl-vision-csv`, `google-cloud-automl`, `udacity`, `udacity-csv`, `self-driving-car`, `datumaro`, `datumaro-json`, `datumaro-dataset`, `wider-face`, `widerface`, `wider-face-txt`, `oidv4`, `oidv4-txt`, `openimages-v4-txt`, `oid`, `bdd100k`, `bdd100k-json`, `scalabel`, `scalabel-json`, `v7-darwin`, `darwin`, `darwin-json`, `v7`, `edge-impulse`, `edge-impulse-labels`, `edge-impulse-bounding-boxes`, `bounding-boxes-labels`, `openlabel`, `asam-openlabel`, `openlabel-json`, `asam-openlabel-json`, `via-csv`, `vgg-via-csv`
- `--input`, `-i`: input path (required for local inputs; optional with `--hf-repo` when `--from hf`)
- `--output`, `-o`: output path
- `--strict`
- `--no-validate`
- `--allow-lossy`
- `--dry-run` (run detection/validation/reporting without writing output files)
- `--output-format <text|json>` (default: `text`)
- `--report <text|json>` (backward-compatible alias for `--output-format`)

Shared options:
- `--split <name>` — select a single split for HF or YOLO imports (see below)

HF-specific options (meaningful only with `--from hf` or `--to hf`):
- `--hf-bbox-format <xywh|xyxy>` (default: `xywh`)
- `--hf-objects-column <name>`
- `--hf-category-map <path>`
- `--hf-repo <namespace/dataset-or-url>` (remote import, `convert` only)
- `--revision <ref>`
- `--config <name>`
- `--token <token>` (also reads `HF_TOKEN`)

With `--output-format json`, the conversion report is printed as JSON to stdout.
On blocked lossy conversions, stdout still contains the full JSON report
while the blocking error goes to stderr (exit code 1).
With `--dry-run`, panlabel still runs format detection, input validation, and lossiness analysis, but skips the final write step.

Notes:
- `--split` can be used with `--from hf` or `--from yolo`. For YOLO, it selects a single split from a split-aware dataset layout (e.g. `--split train`). Without `--split`, all splits are merged. YOLO split paths in `data.yaml` may be image directories or image-list `.txt` files.
- `--hf-repo` can only be used with `--from hf`.
- `--revision`/`--config` require `--hf-repo`.
- Remote HF import (`--hf-repo`) needs a build with feature `hf-remote` (for full HF support from source: `cargo install panlabel --features hf`).
- Remote HF parquet datasets commonly use split shard files (for example `data/train-*.parquet`); these are supported with `hf-parquet`.
- Remote HF zip-style splits (for example `data/train.zip`) are supported when the extracted payload looks like YOLO, VOC, COCO JSON, or HF metadata layout.
- `--output` is still required even with `--dry-run`, so the report can say what would be written.
- `--dry-run` does **not** prove the output path is writable; it skips filesystem writes entirely.
- In `--output-format json` mode, dry runs emit the same conversion-report schema as normal runs (no extra wrapper field).

---

### `stats`

Show rich dataset statistics.

- Positional: `input`
- `--format <format>` (optional; if omitted panlabel auto-detects)
  - when detection fails for a **parseable JSON file**, stats falls back to `ir-json`
  - malformed JSON surfaces the parse error directly (no silent fallback)
- `--top <N>` (default: `10`) for label and co-occurrence top lists
- `--tolerance <PX>` (default: `0.5`) for OOB checks
- `--output-format <text|json|html>` (default: `text`)
- `--output <text|json|html>` (backward-compatible alias)

`--output html` returns a self-contained HTML report on stdout.
Text output uses the rich terminal renderer on a TTY and a plain text renderer when stdout is piped or captured.

---

### `diff`

Compare two datasets semantically.

Usage:
`panlabel diff [OPTIONS] <INPUT_A> <INPUT_B>`

- `--format-a <FORMAT>` (default: `auto`)
- `--format-b <FORMAT>` (default: `auto`)
- `--match-by <id|iou>` (default: `id`)
- `--iou-threshold <FLOAT>` (default: `0.5`, used by `--match-by iou`; must be in `(0.0, 1.0]`)
- `--detail` for item-level details
- `--output-format <text|json>` (default: `text`)
- `--output <text|json>` (backward-compatible alias)

Constraints:
- Each input dataset must have unique `image.file_name` values for reliable diffing.
- `--iou-threshold` is validated only when `--match-by iou` is used.

---

### `sample`

Create a subset dataset.

Usage:
`panlabel sample [OPTIONS] -i <INPUT> -o <OUTPUT>`

- `--from <FORMAT>` (default: `auto`)
- `--to <FORMAT>` (optional)
  - if omitted and `--from` is explicit, output uses same format
  - if omitted and `--from auto`, output defaults to `ir-json`
- `-n <COUNT>` or `--fraction <FLOAT>` (exactly one required)
- `--seed <INT>` for deterministic sampling
- `--strategy <random|stratified>` (default: `random`)
- `--categories <comma,separated,list>`
- `--category-mode <images|annotations>` (default: `images`)
- `--allow-lossy`
- `--dry-run` (sample in memory and report what would be written, without writing output files)
- `--output-format <text|json>` (default: `text`)
- `--report <text|json>` (alias for `--output-format`)

Sampling keeps original IDs and keeps all categories in output.

In text mode, sample prints a short summary line followed by the conversion report.
In JSON mode, sample prints only the conversion report JSON to stdout.
Blocked lossy sampling mirrors `convert`: stdout gets the full report, stderr gets the concise blocking error.

Notes:
- `--output` is still required even with `--dry-run`.
- `--dry-run` skips filesystem writes entirely, so it does not check whether the output path is writable.
- Use `--seed` if you want repeated dry runs to choose the same sampled subset.
- In `--output-format json` mode, dry runs emit the same conversion-report schema as normal runs.

---

### `list-formats`

Show format capabilities and lossiness class.

- `--output-format <text|json>` (default: `text`)
- `--output <text|json>` (backward-compatible alias)

`list-formats --output-format json` emits a JSON array. Each entry has:

- `name`
- `aliases`
- `read`
- `write`
- `lossiness` (`lossless`, `conditional`, or `lossy`)
- `description`
- `file_based`
- `directory_based`

## Auto-detection rules (`convert --from auto`, `diff --format-* auto`, `sample --from auto`, `stats` without `--format`)

1. If input path is a directory:
   - YOLO marker: `labels/` with `.txt` labels AND sibling `images/` directory (or path itself is `labels/` with sibling `images/`), or `data.yaml` with `train`/`val`/`test` split keys. Split keys may point to image directories or image-list `.txt` files. If `labels/` with `.txt` files exist but `images/` is missing, this is reported as an incomplete layout.
   - OIDv4 marker: recursive directories named exactly `Label/` containing `.txt` label files (distinct from YOLO lowercase `labels/`)
   - Edge Impulse marker: root `bounding_boxes.labels` file
   - YOLO Keras / YOLOv4 PyTorch TXT marker: a matching absolute-coordinate annotation file such as `yolo_keras.txt`, `yolov4_pytorch.txt`, `annotations.txt`, `train_annotations.txt`, or `train.txt`. Shared/generic filenames such as `train.txt` and `train_annotations.txt` can be ambiguous because both public names use the same row grammar.
   - VOC marker: `Annotations/` with top-level `.xml` files (or path itself is `Annotations/`). `JPEGImages/` is optional, matching the reader's behavior.
   - CVAT marker: `annotations.xml` at directory root
   - VoTT JSON marker: `vott-json-export/panlabel-export.json` or root `panlabel-export.json` with VoTT `assets`
   - Scale AI marker: `annotations/` with Scale AI task/response JSON files, or root-level matching Scale AI JSON files
   - Unity Perception marker: SOLO frame/captures `.json` files with a `captures` array containing filename + annotations
   - LabelMe marker: `annotations/` with LabelMe `.json` files (containing `shapes` key), or co-located LabelMe `.json` files
   - SuperAnnotate marker: root `annotations/` directory with SuperAnnotate JSON files (`metadata` object + `instances` array), or matching JSON files at root
   - Cityscapes marker: `gtFine/<split>/<city>/*_gtFine_polygons.json` files, a `gtFine/` root, or matching Cityscapes polygon JSON files
   - Marmot marker: `.xml` files whose root is `<Page CropBox="...">` plus same-stem companion images for dimensions; XML without images is reported as an incomplete layout
   - Supervisely marker: root `ann/` directory with Supervisely JSON files (`size` object + `objects` array), or project root with `meta.json` and one or more dataset `ann/` directories
   - KITTI marker: `label_2/` with top-level `.txt` files AND sibling `image_2/` directory (or path itself is `label_2/` with sibling `image_2/`). If `label_2/` with `.txt` files exist but `image_2/` is missing, this is reported as an incomplete layout.
   - HF marker: `metadata.jsonl` or `metadata.parquet` at root or in an immediate subdirectory, or parquet shard files (e.g. `data/train-*.parquet`)
   - if multiple markers match, detection fails with an ambiguity error listing the evidence for each format
   - if only partial matches exist (e.g. YOLO labels without images), the error explains what's missing
2. If input path is a file:
   - `.manifest` / `.jsonl` / `.ndjson`: first non-empty JSON object row with Labelbox `data_row` + `media_attributes` + `projects` → `labelbox`; otherwise `source-ref` + one object-detection label block (`groundtruth/object-detection` metadata, or `annotations` + `image_size`) → `sagemaker`
    - `.csv`: content-based detection — 8 columns → `tfod`, 6 columns → `retinanet`, 7-column VIA CSV header → `via-csv`, or detected by header match
   - `.tfrecord`: TFRecord framing + TFOD-style `tf.train.Example` payload probe → `tfrecord`
    - `.txt`: WIDER Face aggregate TXT is detected by grammar; conservative OIDv4 single-file detection only applies with OID filename hints; YOLO Keras-style absolute-coordinate rows are detected from specific filenames (`yolo_keras.txt` / `yolov4_pytorch.txt`); shared/generic names such as `train.txt` and `train_annotations.txt` are ambiguous between `yolo-keras` and `yolov4-pytorch` and require explicit `--from`
   - `.xml`:
     - root `<annotations>` -> `cvat`
     - root `<Page>` with a valid `CropBox` -> `marmot`
    - `.json`:
      - Edge Impulse labels schema (`type: "bounding-box-labels"` / `boundingBoxes`) -> `edge-impulse`
      - OpenLABEL schema (`openlabel.frames`) -> `openlabel`
      - Datumaro schema (`items` + `categories.label.labels`) -> `datumaro`
      - BDD100K/Scalabel schema (`frames` or frame array with `labels[].box2d`) -> `bdd100k`
      - V7 Darwin schema (`annotations[].bounding_box`) -> `v7-darwin`
     - empty array-root: ambiguous between Label Studio and CreateML (requires explicit `--from`)
     - non-empty array-root: Labelbox export-row shape (`data_row` + `media_attributes` + `projects`) -> `labelbox`; Scale AI task/response shape (`response.annotations`, root `annotations`, or `params.attachment`) -> `scale-ai`; Unity Perception frame/captures shape (`captures` with capture `filename` + `annotations`) -> `unity-perception`; Label Studio task shape -> `label-studio`; CreateML item shape -> `create-ml`
     - object-root with Labelbox export-row shape (`data_row` + `media_attributes` + `projects`) -> `labelbox`
     - object-root with Scale AI task/response shape (`response.annotations`, root `annotations`, or `params.attachment`) -> `scale-ai`
     - object-root with Unity Perception/SOLO `captures` array -> `unity-perception`
     - object-root with `shapes` array -> `labelme`
     - object-root with VoTT `asset` + `regions`, or aggregate `assets` entries -> `vott-json`
     - object-root with `metadata` object + `instances` array -> `superannotate`
     - object-root with `imgWidth`, `imgHeight`, and `objects` array -> `cityscapes`
     - object-root with `size` object + `objects` array -> `supervisely`
     - object-root with entries containing `filename` + `regions` -> `via`
     - object-root with `annotations[0].bbox` array -> `coco`
     - object-root with bbox object (`min/max` or `xmin/ymin/xmax/ymax`) -> `ir-json`
3. `stats` fallback: when detection fails for a `.json` file, stats tries `ir-json` as a fallback — but only if the JSON is parseable. Malformed JSON is reported directly as a parse error.

## Examples

```bash
# Validate a YOLO dataset root
panlabel validate /data/my_yolo --format yolo

# Validate with machine-readable output
panlabel validate tests/fixtures/sample_valid.ir.json --output-format json

# Auto-detect YOLO from directory, convert to COCO
panlabel convert --from auto --to coco -i /data/my_yolo -o out.json

# Machine-readable conversion report
panlabel convert --from auto --to coco -i in.json -o out.coco.json --output-format json

# Preview a conversion without writing output files
panlabel convert --from auto --to coco -i in.json -o out.coco.json --dry-run

# Dataset stats as JSON
panlabel stats --output-format json tests/fixtures/sample_valid.coco.json

# Dataset diff with details
panlabel diff --match-by id --detail a.ir.json b.ir.json

# Category-focused sampling with JSON report output
panlabel sample -i in.coco.json -o out.ir.json --from coco --to ir-json --categories person,car --category-mode images -n 100 --seed 42 --output-format json

# Preview a deterministic sample without writing output files
panlabel sample -i in.coco.json -o out.ir.json --from coco --to ir-json -n 100 --seed 42 --dry-run

# Machine-readable format discovery
panlabel list-formats --output-format json

# Convert a local HF ImageFolder directory to COCO
panlabel convert --from hf --to coco -i ./hf_dataset -o out.coco.json

# Convert a remote HF dataset repo to IR JSON (requires build with --features hf)
panlabel convert --from hf --to ir-json --hf-repo rishitdagli/cppe-5 --split train -o out.ir.json

# Zip-style remote dataset (auto-routed after extraction, still invoked as --from hf)
panlabel convert --from hf --to ir-json --hf-repo keremberke/football-object-detection --split train -o out.ir.json

# Convert a split-aware YOLO dataset (merges all splits by default)
panlabel convert --from yolo --to coco -i ./yolo_dataset -o out.coco.json --allow-lossy

# Convert only the train split from a YOLO dataset
panlabel convert --from yolo --to coco -i ./yolo_dataset -o out.coco.json --split train --allow-lossy

# Scaled-YOLOv4 names are aliases for the existing YOLO reader
panlabel convert --from scaled-yolov4-txt --to coco -i ./scaled_yolov4_dataset -o out.coco.json --allow-lossy

# Convert YOLO Keras absolute-coordinate TXT to COCO
panlabel convert --from yolo-keras --to coco -i ./train.txt -o out.coco.json --allow-lossy

# Convert YOLOv4 PyTorch TXT output from COCO
panlabel convert --from coco --to yolov4-pytorch -i annotations.json -o ./yolov4_pytorch_out --allow-lossy

# Convert SageMaker manifest to COCO JSON
panlabel convert -f sagemaker -t coco -i annotations.manifest -o coco_output.json

# Convert a LabelMe directory to COCO JSON
panlabel convert -f labelme -t coco -i ./labelme_dataset -o coco_output.json

# Convert a CreateML JSON file to COCO JSON
panlabel convert -f create-ml -t coco -i annotations.json -o coco_output.json

# Convert SuperAnnotate annotations to IR JSON
panlabel convert -f superannotate -t ir-json -i ./superannotate_export -o out.ir.json

# Convert Supervisely project annotations to COCO JSON
panlabel convert -f supervisely -t coco -i ./supervisely_project -o out.coco.json --allow-lossy

# Convert Cityscapes polygons to bbox IR JSON
panlabel convert -f cityscapes -t ir-json -i ./cityscapes_root -o out.ir.json

# Convert Marmot XML page-layout composites to bbox IR JSON
panlabel convert -f marmot -t ir-json -i ./marmot_xml_dir -o out.ir.json
```