# Formats
This page describes how each annotation format works inside panlabel — what gets
read, what gets written, and what you should expect.
Panlabel converts through a canonical intermediate representation (IR). All
bounding boxes are represented as **pixel-space XYXY** in the IR, and each
format adapter handles the mapping to/from its own coordinate system.
Current scope: **object detection** bounding boxes only.
## Format matrix
| `ir-json` | file (`.json`) | yes | yes | lossless |
| `coco` | file (`.json`) | yes | yes | conditional |
| `cvat` | file (`.xml`) or directory (`annotations.xml`) | yes | yes | lossy |
| `label-studio` | file (`.json`) | yes | yes | lossy |
| `tfod` | file (`.csv`) | yes | yes | lossy |
| `yolo` | directory (`images/` + `labels/`) | yes | yes | lossy |
| `voc` | directory (`Annotations/` + `JPEGImages/`) | yes | yes | lossy |
| `hf` | directory (`metadata.jsonl` / `metadata.parquet`) | yes | yes (`metadata.jsonl`) | lossy |
| `labelme` | file (`.json`) or directory (`annotations/`) | yes | yes | lossy |
| `create-ml` | file (`.json`) | yes | yes | lossy |
| `kitti` | directory (`label_2/` + `image_2/`) | yes | yes | lossy |
| `via` | file (`.json`) | yes | yes | lossy |
| `retinanet` | file (`.csv`) | yes | yes | lossy |
## IR JSON (`ir-json`)
- Canonical panlabel representation.
- Preserves dataset info, licenses, image metadata, and annotation attributes.
- Bboxes are stored in XYXY form.
## COCO JSON (`coco` / `coco-json`)
- Path kind: JSON file.
- Bbox format: `[x, y, width, height]` (absolute pixel coordinates).
- Converted to IR XYXY via bbox helpers.
- Writer behavior is deterministic (stable ordering by IDs).
- COCO `score` can map to IR `confidence` when present.
- COCO `segmentation` is accepted on read but ignored/dropped (panlabel currently models detection bboxes only). On write, panlabel emits `segmentation` as an empty array.
## Label Studio JSON (`label-studio` / `label-studio-json` / `ls`)
- Path kind: JSON file.
- Supported shape: Label Studio task export array (empty array is accepted as an empty dataset).
- Supported annotation type: `rectanglelabels` only.
- Coordinates are percentages; adapter maps to/from IR pixel XYXY.
- Reader supports legacy `completions` as fallback when `annotations` is absent.
- Label Studio result `score` (when present) maps to IR `confidence` (from either `annotations` or `predictions`).
Reader behavior:
- derives `Image.file_name` from `data.image` basename (normalizes `\` to `/`, strips query/fragment)
- requires derived basenames to be unique across tasks
- preserves full image reference in `Image.attributes["ls_image_ref"]`
- accepts either `annotations` or legacy `completions` per task (both present is an error)
- supports `predictions` alongside annotation sets
- each of `annotations` / `completions` / `predictions` may contain at most one result-set entry
- enforces `type == "rectanglelabels"` and exactly one label per result
- requires `original_width`/`original_height` on each result; if a task has zero results, falls back to `data.width`/`data.height`
- requires consistent `from_name`/`to_name` values within a task; when present, stores them in `Image.attributes["ls_from_name"]` and `Image.attributes["ls_to_name"]`
- stores non-zero rotation as `Annotation.attributes["ls_rotation_deg"]` and uses an axis-aligned envelope bbox in IR
Deterministic policy:
- reader image IDs: by derived basename (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then result order
- writer task order: by image file_name (lexicographic)
Writer behavior:
- writes Label Studio task export JSON
- splits results by confidence:
- `confidence == None` -> `annotations`
- `confidence == Some(_)` -> `predictions` + `score`
- this means any IR annotation with confidence is written under `predictions`
- uses `ls_from_name` / `ls_to_name` image attributes if present, else defaults to `label` / `image`
- requires unique image basenames (derived from `data.image`) to avoid ambiguous `Image.file_name` mapping
Limitations:
- currently only rectanglelabels bbox annotations are supported
- rotation is flattened to axis-aligned geometry (angle retained as `ls_rotation_deg` only)
- Label Studio-specific metadata outside this mapping is not preserved
## TFOD CSV (`tfod` / `tfod-csv`)
- Path kind: CSV file.
- Columns: `filename,width,height,class,xmin,ymin,xmax,ymax`.
- Coordinates are normalized (0..1).
Deterministic policy:
- reader image IDs: by filename (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by CSV row order
- writer row order: by annotation ID
Limitations:
- no dataset-level metadata/licenses
- no image-level license/date metadata
- no annotation confidence/attributes
- images without annotations are not represented in TFOD output
## YOLO directory (`yolo` / `ultralytics` / `yolov8` / `yolov5`)
- Path kind: directory.
- Accepted input path:
- dataset root containing `images/` and `labels/`
- or `labels/` directory directly (with sibling `../images/`)
- Supports both flat layouts (Darknet-style, no `data.yaml` required) and split-aware layouts.
- Label row format (one line per bbox):
- `<class_id> <x_center> <y_center> <width> <height> [confidence]`
- normalized values
- 5 tokens: detection bbox (confidence = None)
- 6 tokens: detection bbox + confidence score (mapped to IR `Annotation.confidence`)
- 7+ tokens: rejected (segmentation/pose not supported)
Reader behavior:
- class map precedence: `data.yaml` → `classes.txt` → inferred from labels
- flat layouts work without `data.yaml`: class names come from `classes.txt` (if present) or are inferred as `class_0`, `class_1`, etc.
- image resolution is read from image headers in `images/`
- each label file must map to a matching image file (same relative stem) under `images/`
- expected image extensions (lookup order): `jpg`, `png`, `jpeg`, `bmp`, `webp`
- lines with 7+ tokens are rejected (segmentation/pose not supported)
### Split-aware reading
When `data.yaml` contains `train:`, `val:`, or `test:` path keys (common in Roboflow/Ultralytics Hub exports), panlabel detects a split-aware layout and reads all splits.
Supported path patterns in `data.yaml`:
- Pattern A: `images/<split>` (e.g. `train: images/train`, labels inferred at `labels/train`)
- Pattern B: `<split>/images` (e.g. `train: train/images`, labels at `train/labels`)
- Pattern C: bare `<split>` pointing to a directory containing `images/` + `labels/`
Behavior:
- **Default (no `--split`):** all found splits are merged into a single IR Dataset. Image `file_name` values are prefixed with the split name (e.g. `train/img001.jpg`, `val/img002.jpg`) to avoid collisions.
- **`--split <name>`:** only the named split is read. Image `file_name` values are still prefixed with the split name for provenance.
- Class map: resolved from `data.yaml` `names:` when present, otherwise inferred across all selected label directories.
- `data.yaml` `path:` key (if present) is used as the base for resolving split-relative paths.
- Split provenance is stored in `Dataset.info.attributes`:
- `yolo_layout_mode`: `"split_aware"` or `"flat"`
- `yolo_splits_found`: comma-separated list of splits found (e.g. `"train,val,test"`)
- `yolo_splits_read`: comma-separated list of splits actually read
- An error is raised if `--split` names a split not present in `data.yaml`, or if `--split` is used on a flat (non-split-aware) layout.
Writer behavior:
- creates output `images/` and `labels/` directories
- writes `data.yaml` with a `names:` mapping (sorted by class index); does not emit train/val paths or `nc`
- creates empty `.txt` files for images without annotations
- does **not** copy image binaries
- writes normalized floats with 6 decimal places
- emits an optional 6th confidence token when `Annotation.confidence` is `Some`
## Pascal VOC XML (`voc` / `pascal-voc` / `voc-xml`)
- Path kind: directory.
- Accepted input path:
- dataset root containing `Annotations/`
- or `Annotations/` directory directly (with optional sibling `../JPEGImages/`)
- Reader uses `<size>/<width>` and `<size>/<height>` from XML (no image-header probing).
- Reader stores object fields `pose`, `truncated`, `difficult`, `occluded` in `Annotation.attributes`.
- Reader stores `<size>/<depth>` as image attribute `depth`.
- Coordinate policy: reads `xmin/ymin/xmax/ymax` exactly as provided (no 0/1-based adjustment).
- Reader scans `Annotations/` flat (non-recursive); nested XML files are skipped with a warning.
Deterministic policy:
- reader image IDs: by `<filename>` (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by XML file order, then `<object>` order
Writer behavior:
- creates `Annotations/` and `JPEGImages/README.txt`
- writes one XML per image (including images without annotations)
- preserves image subdirectory structure in XML output path (`train/001.jpg` -> `Annotations/train/001.xml`)
- does **not** copy image binaries
- normalizes boolean attribute values when writing:
- `true`/`yes`/`1` -> `1`
- `false`/`no`/`0` -> `0`
- any other value -> omitted
## Hugging Face ImageFolder metadata (`hf` / `hf-imagefolder` / `huggingface`)
- Path kind: directory.
- Accepted local input layout:
- dataset root containing `metadata.jsonl` or `metadata.parquet`
- split subdirectories (for example `train/`, `validation/`) each containing metadata
- parquet shard layouts (for example `data/train-00000-of-00001.parquet`, `data/validation-*.parquet`, or `<config>/<split>/*.parquet`)
- Remote Hub import is supported in `convert` via `--hf-repo` (requires `hf-remote` feature).
- Remote zip-style split archives (for example `data/train.zip`) are also supported when they extract to YOLO, VOC, COCO JSON, or HF metadata layouts.
Reader behavior:
- object-container auto-detection: `objects` first, then `faces` (override with `--hf-objects-column`)
- category field aliases: `categories` or `category`
- category values may be names or integer IDs
- integer category name resolution precedence:
- preflight ClassLabel names (remote)
- then `--hf-category-map`
- then integer fallback (`"0"`, `"1"`, ...)
- bbox interpretation is controlled by `--hf-bbox-format`:
- `xywh` (default) treats bbox as `[x, y, width, height]`
- `xyxy` treats bbox as `[x1, y1, x2, y2]`
- keeps bbox rows as parsed (validation reports degenerate/OOB issues later)
- width/height read from metadata when present, otherwise from image headers
- duplicate `file_name` rows are rejected
- when both `metadata.jsonl` and `metadata.parquet` are present, JSONL is preferred
- when no `metadata.jsonl` exists, panlabel can read supported parquet layouts (`metadata.parquet` or split parquet shards) with `hf-parquet`
- for parquet rows without `file_name`, panlabel derives it from `image.path` (or fallback IDs)
Writer behavior:
- writes `metadata.jsonl` (one row per image)
- writes `file_name`, `width`, `height`, and `objects.{bbox,categories}`
- deterministic output ordering:
- metadata rows by image `file_name` (lexicographic)
- per-image annotation lists by annotation ID
- does **not** copy image binaries
- output bbox format follows `--hf-bbox-format` (`xywh` default)
IR provenance notes:
- reader stores HF provenance in `Dataset.info.attributes` (for example `hf_bbox_format`)
- remote imports may also populate `hf_repo_id`, `hf_revision`, `hf_split`, `hf_license`, `hf_description`
## CVAT XML (`cvat` / `cvat-xml`)
- Path kind: XML file (`.xml`) or directory containing `annotations.xml`.
- Supported export: CVAT "for images" XML with `<annotations>` root.
- Supported annotation type: `<box>` only.
- Unsupported image-level annotation elements (for example `<polygon>`, `<points>`) are hard parse errors.
- Coordinates: absolute pixels (`xtl/ytl/xbr/ybr`) mapped 1:1 to IR pixel XYXY.
Reader behavior:
- accepts file input or directory input with root `annotations.xml`
- if `<meta><task><labels>` is present:
- keeps labels with `<type>bbox</type>` (or no `<type>`)
- verifies every `<box label="...">` exists in meta labels
- if meta labels are missing, infers categories from `<box label="...">`
- stores `<image id>` as `Image.attributes["cvat_image_id"]`
- stores box attributes as:
- `occluded="1"` -> `Annotation.attributes["occluded"] = "1"`
- non-zero `z_order` -> `Annotation.attributes["z_order"]`
- non-empty `source` -> `Annotation.attributes["source"]`
- `<attribute name="k">v</attribute>` -> `Annotation.attributes["cvat_attr_k"] = "v"`
Deterministic policy:
- reader image IDs: by `<image name>` (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then `<box>` order
Writer behavior:
- writes a single XML file (or `annotations.xml` inside output directory)
- emits minimal `<meta><task>` with `name='panlabel export'`, `mode='annotation'`, and `size` equal to image count
- writes labels only for categories referenced by annotations (unused categories are dropped)
- writes `<image>` entries for all images, including unannotated images
- image ordering: by `file_name` (lexicographic)
- image IDs are reassigned sequentially (0, 1, 2, ...) by sorted order; original `cvat_image_id` attributes are not preserved in output
- writes `<box>` entries sorted by annotation ID per image
- writes `cvat_attr_*` annotation attributes as `<attribute>` children of `<box>`
- normalizes `occluded` values:
- `true`/`yes`/`1` -> `1`
- `false`/`no`/`0` -> `0`
- otherwise or missing -> `0`
- defaults missing or empty `source` attribute to `manual`
- defaults missing or invalid `z_order` to `0`
## LabelMe JSON (`labelme` / `labelme-json`)
- Path kind: JSON file or directory.
- One JSON file per image containing a `shapes` array with rectangle and polygon annotations.
- Supported shapes: `rectangle` (2 points: top-left, bottom-right), `polygon` (3+ points: converted to axis-aligned bbox envelope). Other shape types are rejected.
- Coordinates: absolute pixels.
- Missing `shape_type` defaults to `rectangle`.
Reader input modes:
- **Single file**: one `.json` file → one-image dataset
- **Separate directory**: `annotations/` subdirectory containing `.json` files
- **Co-located directory**: `.json` files alongside image files (identified by presence of `shapes` key)
Reader behavior:
- requires `imagePath`, `imageWidth`, and `imageHeight` in each JSON file
- derives `Image.file_name` from `imagePath` basename (single-file mode) or from the relative JSON path stem + image extension (directory mode)
- stores original `imagePath` value in `Image.attributes["labelme_image_path"]`
- polygons are flattened to axis-aligned bounding box envelopes; original shape type stored as `Annotation.attributes["labelme_shape_type"] = "polygon"`
- requires unique derived image names across all JSON files in directory mode
Deterministic policy:
- reader image IDs: by derived file_name (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then shape order
- writer file order: by image file_name (lexicographic)
Writer behavior:
- single-image datasets to a `.json` path: writes one LabelMe JSON file
- multi-image datasets or directory paths: writes canonical `annotations/<stem>.json` + `images/README.txt` layout
- all annotations are written as `rectangle` shapes with 2 corner points (polygons are not restored)
- does **not** copy image binaries
- uses `labelme_image_path` image attribute for `imagePath` if present, otherwise `file_name`
Limitations:
- only `rectangle` and `polygon` shape types are supported (others are rejected)
- polygon geometry is flattened to axis-aligned bbox envelope (shape type retained as attribute only)
- `imageData` (embedded base64 image data) is not preserved
- LabelMe flags and group_id are not preserved
## CreateML JSON (`create-ml` / `createml` / `create-ml-json`)
- Path kind: JSON file.
- Apple's annotation format for Core ML training.
- Flat JSON array where each element represents one image with its annotations.
- Bbox format: center-based absolute pixel coordinates `{x, y, width, height}` where `(x, y)` is the center of the box.
- Image dimensions are **not** stored in the JSON — the reader resolves them from local image files relative to the JSON file's parent directory.
Reader behavior:
- parses top-level JSON array of `{image, annotations}` objects
- `image` must be a non-empty relative path (absolute paths and `..` traversal are rejected)
- resolves image dimensions from disk by probing `<base_dir>/<image>` then `<base_dir>/images/<image>`
- rejects duplicate `image` entries
- rejects empty annotation labels
Deterministic policy:
- reader image IDs: by image filename (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then annotation order
Writer behavior:
- writes a single JSON array with one object per image
- uses center-based absolute pixel coordinates: `{x, y, width, height}`
- deterministic output: image rows sorted by filename, annotations sorted by annotation ID
- images without annotations are included (empty `annotations` array)
- does **not** write image dimensions (this is by design — CreateML resolves them at training time)
Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (dimensions, license, date)
- no annotation confidence/attributes
- requires image files on disk for reading (to resolve dimensions)
## KITTI (`kitti` / `kitti-txt`)
- Path kind: directory.
- Accepted input path:
- dataset root containing `label_2/` and `image_2/`
- or `label_2/` directory directly (with sibling `../image_2/`)
- Standard format in autonomous driving research.
- Per-image `.txt` files with 15 space-separated fields per line (optional 16th field: score).
- Fields: `type truncated occluded alpha xmin ymin xmax ymax dim_height dim_width dim_length loc_x loc_y loc_z rotation_y [score]`
- Bbox: fields 4–7 (`xmin ymin xmax ymax`) are absolute pixel coordinates.
Reader behavior:
- scans `label_2/` flat (non-recursive, top-level `.txt` files only)
- resolves images from `image_2/` with extension precedence: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.webp`
- maps `type` → category name, fields 4–7 → `BBoxXYXY<Pixel>`, optional field 15 → `Annotation.confidence`
- stores remaining numeric fields as annotation attributes with `kitti_*` prefix: `kitti_truncated`, `kitti_occluded`, `kitti_alpha`, `kitti_dim_height`, `kitti_dim_width`, `kitti_dim_length`, `kitti_loc_x`, `kitti_loc_y`, `kitti_loc_z`, `kitti_rotation_y`
Deterministic policy:
- reader image IDs: by resolved image filename (lexicographic)
- reader category IDs: by class/type name (lexicographic)
- reader annotation IDs: by label file order then line number
Writer behavior:
- creates `label_2/` + `image_2/README.txt`
- one `.txt` per image, empty files for unannotated images
- sorts images by `file_name`, annotations within each image by ID
- sources KITTI-specific fields from `kitti_*` annotation attributes; uses defaults for missing values: truncated=0, occluded=0, alpha=−10, dims=−1, loc=−1000, rotation_y=−10
- rejects `Image.file_name` with path separators (KITTI layout is flat)
- does **not** copy image binaries
Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (license, date)
- no annotation attributes outside the `kitti_*` set
- confidence is preserved via the optional `score` field
## VGG Image Annotator JSON (`via` / `via-json` / `vgg-via`)
- Path kind: JSON file.
- Popular academic annotation tool.
- Single JSON file with object-root keyed by arbitrary strings (typically `filename+size`).
- Each entry: `{ filename, size, regions, file_attributes }`.
- Supported region type: `rect` only (`shape_attributes.name == "rect"` with `x`, `y`, `width`, `height`).
- Image dimensions are **not** stored in the JSON — resolved from local image files.
Reader behavior:
- supports `regions` as either an array or an object map (both forms exist in real VIA exports)
- label resolution precedence from `region_attributes`: `label`, then `class`, then sole scalar attribute
- non-rect shapes are skipped with a warning
- image dimension resolution: `<json_dir>/<filename>` then `<json_dir>/images/<filename>`
- rejects duplicate filenames across entries
- stores `via_size_bytes` as image attribute; scalar `file_attributes` as `via_file_attr_<key>` image attributes
- stores scalar `region_attributes` (excluding the label key) as `via_region_attr_<key>` annotation attributes
Deterministic policy:
- reader image IDs: by filename (lexicographic)
- reader category IDs: by resolved label (lexicographic)
- reader annotation IDs: by image order then region order (for object-form regions, keys sorted lexicographically)
Writer behavior:
- writes JSON object keyed by `<filename><size>`
- `regions` always emitted as array, sorted by annotation ID
- uses canonical `label` key in `region_attributes` for category name
- reconstructs `file_attributes` from `via_file_attr_*` image attributes
- unannotated images preserved with `regions: []`
- does **not** copy image binaries
Limitations:
- only rectangle regions are supported
- no dataset-level metadata/licenses
- no annotation confidence
- requires image files on disk for reading (to resolve dimensions)
## RetinaNet Keras CSV (`retinanet` / `retinanet-csv` / `keras-retinanet`)
- Path kind: CSV file.
- Simple format used with keras-retinanet: `path,x1,y1,x2,y2,class_name`.
- Coordinates are absolute pixels (unlike TFOD which uses normalized coordinates).
- No header required (optional header row is tolerated).
- Unannotated images: `path,,,,,` (all-empty row).
- Image dimensions are **not** in the CSV — resolved from local image files.
Reader behavior:
- tolerates optional header row exactly matching `path,x1,y1,x2,y2,class_name`
- supports empty rows (`path,,,,,`) for unannotated images
- rejects partial rows (some bbox fields present, others empty)
- resolves image paths relative to CSV parent directory; absolute paths used as-is
- caches dimension lookups per image path
Deterministic policy:
- reader image IDs: by path (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by CSV row order
Writer behavior:
- headerless CSV (matches keras-retinanet conventions)
- rows grouped by image, images sorted by `file_name`, annotations by ID
- unannotated images emit exactly one `path,,,,,` row
- does **not** copy image binaries
Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (dimensions, license, date)
- no annotation confidence/attributes
- requires image files on disk for reading (to resolve dimensions)
## Future expansion rule
When formats become numerous, split this page into per-format files under `docs/formats/<format>.md` and keep this page as an index.