# Panlabel
[](https://github.com/strickvl/panlabel/actions/workflows/ci.yml)






## The universal annotation converter
If you've ever written a one-off Python script to wrangle COCO annotations into
YOLO format (or vice versa), panlabel is here to save you the trouble. It's a
fast, single-binary CLI that converts between common object detection annotation
formats — with built-in validation, clear lossiness warnings, and no Python
dependencies to manage.
Panlabel’s current core scope is **mainstream/static-image 2D axis-aligned object-detection bbox conversion**.
It does **not** provide first-class segmentation, keypoints/pose, oriented boxes, video tracking IDs, or 3D/multisensor labels.
When broad schemas include richer structures, panlabel either skips/reports those structures or treats the conversion as lossy.
Panlabel is also available as a Rust library if you want to integrate format
conversion into your own tools.
> **Note**: Panlabel is in active development (v0.5.x). The CLI and library APIs
> may change between versions, so pin to a specific version if you're using it in
> production.
## Installation
### pip / uv (any platform)
```sh
pip install panlabel
# or
uv pip install panlabel
```
This installs a pre-built binary — no Rust toolchain needed.
### Homebrew (macOS / Linux)
```sh
brew install strickvl/tap/panlabel
```
### Shell script (macOS / Linux)
```sh
### PowerShell (Windows)
```powershell
### Cargo (from source)
```sh
cargo install panlabel
# Enable full HF support (remote Hub import + metadata.parquet)
cargo install panlabel --features hf
```
### Pre-built binaries
Download from the [latest GitHub Release](https://github.com/strickvl/panlabel/releases/latest). Builds are available for macOS (Intel + Apple Silicon), Linux (x86_64 + ARM64), and Windows.
### Docker
```sh
docker pull strickvl/panlabel
# Convert a COCO file in your current directory to YOLO
docker run --rm -v "$PWD":/data strickvl/panlabel convert -f coco -t yolo -i /data/annotations.json -o /data/yolo_out --allow-lossy
```
Multi-arch images (amd64 + arm64) are published for each release.
### As a Rust library
```sh
cargo add panlabel
```
## Quick start
```sh
# Convert between formats (auto-detects the input)
panlabel convert --from auto --to yolo -i annotations.json -o ./yolo_out --allow-lossy
# Check a dataset for problems before training
panlabel validate --format coco annotations.json
# Get a quick overview of what's in a dataset
panlabel stats --format coco annotations.json
# Compare two datasets semantically
panlabel diff --format-a auto --format-b auto old.json new.json
# Sample a smaller subset for quick experiments
panlabel sample -i annotations.json -o sample.ir.json --from auto --to ir-json -n 100 --seed 42
# See every supported format and its capabilities
panlabel list-formats
```
The `convert` shape is always `-f <source> -t <dest> -i <input> -o <output>` — pick any source/destination from the [Supported formats](#supported-formats) table. See [More convert examples](#more-convert-examples) below for lossless vs. lossy conversions, machine-readable JSON reports, dry runs, and remote Hugging Face datasets.
## What can panlabel do?
| `convert` | Convert between annotation formats, with clear warnings about what (if anything) gets lost |
| `validate` | Check your dataset for common problems — duplicate IDs, missing references, invalid bounding boxes |
| `stats` | Show rich dataset statistics in text, JSON, or HTML |
| `diff` | Compare two datasets semantically (summary or detailed output) |
| `sample` | Create subset datasets (random or stratified), with optional category filtering and JSON reports |
| `list-formats` | Show which formats are supported and their read/write/lossiness capabilities, including JSON discovery output |
## Supported formats
| `ir-json` | `.json` | Panlabel's own intermediate representation | Lossless |
| `coco` | `.json` | COCO object detection format | Conditional |
| `ibm-cloud-annotations` | `_annotations.json` file or directory | IBM Cloud Annotations localization JSON | Lossy |
| `cvat` | `.xml` / `annotations.xml` export | CVAT for images XML annotation export | Lossy |
| `label-studio` | `.json` | Label Studio task export JSON (`rectanglelabels`) | Lossy |
| `labelbox` | `.json` / `.jsonl` / `.ndjson` | Labelbox current export rows (`data_row` / `projects.*.labels`) | Lossy |
| `scale-ai` | `.json` file or directory (`annotations/`) | Scale AI image annotation task/response JSON | Lossy |
| `unity-perception` | `.json` file or SOLO-like directory | Unity Perception / SOLO synthetic-data bbox JSON | Lossy |
| `tfod` | `.csv` | TensorFlow Object Detection CSV (normalized bbox corners) | Lossy |
| `tfrecord` | `.tfrecord` | TensorFlow Object Detection API-style `tf.train.Example` records (single-file, uncompressed, bbox-only in v1) | Lossy |
| `vott-csv` | `.csv` | Microsoft VoTT CSV export (`image,xmin,ymin,xmax,ymax,label`) | Lossy |
| `vott-json` | `.json` file or `vott-json-export/` directory | Microsoft VoTT JSON export (`assets` / per-asset JSON with `regions`) | Lossy |
| `yolo` | `images/ + labels/` directory, or split `data.yaml` pointing to image-list `.txt` files | YOLO `.txt` labels (flat, split-aware, Scaled-YOLOv4 aliases, optional confidence) | Lossy |
| `yolo-keras` | `.txt` file or directory (`yolo_keras.txt`, `annotations.txt`, `train.txt`) | YOLO Keras absolute-coordinate TXT (`image xmin,ymin,xmax,ymax,class_id ...`) | Lossy |
| `yolov4-pytorch` | `.txt` file or directory (`yolov4_pytorch.txt`, `train_annotation.txt`, `train.txt`) | YOLOv4 PyTorch absolute-coordinate TXT (`image xmin,ymin,xmax,ymax,class_id ...`) | Lossy |
| `voc` | `Annotations/ + JPEGImages/` directory | Pascal VOC XML | Lossy |
| `hf` | `metadata.jsonl` / `metadata.parquet` directory | Hugging Face ImageFolder metadata | Lossy |
| `sagemaker` | `.manifest` / `.jsonl` file | AWS SageMaker Ground Truth object-detection manifest | Lossy |
| `labelme` | `.json` file or `annotations/` directory | LabelMe per-image JSON annotations | Lossy |
| `create-ml` | `.json` | Apple CreateML annotation format | Lossy |
| `kitti` | `label_2/ + image_2/` directory | KITTI object detection labels | Lossy |
| `via` | `.json` | VGG Image Annotator (VIA) JSON | Lossy |
| `retinanet` | `.csv` | keras-retinanet CSV format | Lossy |
| `openimages` | `.csv` | Google OpenImages CSV annotation format | Lossy |
| `kaggle-wheat` | `.csv` | Kaggle Global Wheat Detection CSV | Lossy |
| `automl-vision` | `.csv` | Google Cloud AutoML Vision CSV | Lossy |
| `udacity` | `.csv` | Udacity Self-Driving Car Dataset CSV | Lossy |
| `superannotate` | `.json` file or `annotations/` directory | SuperAnnotate JSON export | Lossy |
| `supervisely` | `.json` file or `ann/` / `meta.json` project directory | Supervisely JSON project / dataset | Lossy |
| `cityscapes` | `.json`, `gtFine/`, or dataset root with `gtFine/` | Cityscapes polygon JSON; polygons become bbox envelopes | Lossy |
| `marmot` | `.xml` file or directory with same-stem companion images | Marmot XML document-layout composites; hex doubles become pixel bboxes | Lossy |
| `datumaro` | `.json` | Datumaro JSON annotation format | Lossy |
| `wider-face` | `.txt` | WIDER Face aggregate TXT (single `face` class in panlabel) | Lossy |
| `oidv4` | directory with `Label/` or `.txt` | OIDv4 Toolkit TXT labels (directory probe uses `Label/`, not YOLO `labels/`) | Lossy |
| `bdd100k` | `.json` | BDD100K / Scalabel JSON detection subset | Lossy |
| `v7-darwin` | `.json` | V7 Darwin JSON bbox subset | Lossy |
| `edge-impulse` | `bounding_boxes.labels` file or containing directory | Edge Impulse bounding-box labels JSON | Lossy |
| `openlabel` | `.json` | ASAM OpenLABEL JSON static-image 2D bbox subset | Lossy |
| `via-csv` | `.csv` | VGG Image Annotator CSV (separate format from VIA JSON) | Lossy |
Run `panlabel list-formats` for the full details, or `panlabel list-formats --output json` for machine-readable format discovery.
TFRecord support in v1 is intentionally narrow: panlabel currently supports only single-file, uncompressed TensorFlow Object Detection API-style `tf.train.Example` bbox records (not arbitrary TFRecord payloads).
`list-formats` shows canonical names (for example `label-studio`), while commands also accept aliases (for example `ls`, `label-studio-json`). Across commands, `--output-format` is the consistent way to request JSON reports; `convert` and `sample` also keep `--report` as an alias. JSON is pretty-printed on a terminal and compact when piped or captured, which makes it friendlier for scripts and agents. `stats` also adapts its text renderer: rich/Unicode on a terminal, plain text layout when piped.
### More convert examples
```sh
# COCO to IR JSON (lossless — no data lost)
panlabel convert -f coco -t ir-json -i input.json -o output.json
# IR JSON to TFOD (lossy — requires explicit opt-in)
panlabel convert -f ir-json -t tfod -i input.json -o output.csv --allow-lossy
# Auto-detect input format from file extension/content or directory layout
panlabel convert --from auto -t coco -i input.csv -o output.json
# Request a machine-readable conversion report
panlabel convert --from auto -t coco -i input.csv -o output.json --output-format json
# Preview a conversion without touching the output path
panlabel convert --from auto -t coco -i input.csv -o output.json --dry-run
# Convert a remote Hugging Face dataset repo to COCO JSON
# (requires --features hf when building from source)
panlabel convert -f hf -t coco --hf-repo rishitdagli/cppe-5 --split train -o coco_output.json
# Convert a zip-style HF dataset repo split to IR JSON (auto-detects extracted payload)
panlabel convert -f hf -t ir-json --hf-repo keremberke/football-object-detection --split train -o football.ir.json
```
Dry runs still do the real thinking work — format detection, validation, sampling/conversion analysis, and lossiness checks — but they skip the final filesystem write. That means they are good for “what would happen?” checks, but they do **not** prove that the output path is writable.
### Getting help
```sh
panlabel --help # See all commands
panlabel convert --help # Help for a specific command
panlabel -V # Show version
```
## Documentation
Want to go deeper? The full docs are readable right here on GitHub:
- [Documentation home](https://github.com/strickvl/panlabel/blob/HEAD/docs/README.md) — start here
- [CLI reference](https://github.com/strickvl/panlabel/blob/HEAD/docs/cli.md) — every flag and option
- [Format reference](https://github.com/strickvl/panlabel/blob/HEAD/docs/formats.md) — how each format works
- [Tasks and use cases](https://github.com/strickvl/panlabel/blob/HEAD/docs/tasks.md) — what's supported today
- [Conversion and lossiness](https://github.com/strickvl/panlabel/blob/HEAD/docs/conversion.md) — understanding what gets lost
- [Contributing](https://github.com/strickvl/panlabel/blob/HEAD/CONTRIBUTING.md) — we'd love your help
- [Roadmap](https://github.com/strickvl/panlabel/blob/HEAD/ROADMAP.md) — what's coming next
## Contributing
Contributions are welcome! Whether it's a bug report, a new format adapter, or
a documentation fix — we appreciate the help. For major changes, please
[open an issue](https://github.com/strickvl/panlabel/issues) first so we can
discuss the approach.
See the [contributing guide](CONTRIBUTING.md) for details on the codebase
structure and how to make changes.
## License
MIT — see [LICENSE](LICENSE) for details.