panlabel 0.6.0

The universal annotation converter
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# Formats

This page describes how each annotation format works inside panlabel — what gets
read, what gets written, and what you should expect.

Panlabel converts through a canonical intermediate representation (IR). All
bounding boxes are represented as **pixel-space XYXY** in the IR, and each
format adapter handles the mapping to/from its own coordinate system.

Current scope: **object detection** bounding boxes only.

## Format matrix

| Format | Path kind | Read | Write | Lossiness vs IR |
|---|---|---|---|---|
| `ir-json` | file (`.json`) | yes | yes | lossless |
| `coco` | file (`.json`) | yes | yes | conditional |
| `cvat` | file (`.xml`) or directory (`annotations.xml`) | yes | yes | lossy |
| `label-studio` | file (`.json`) | yes | yes | lossy |
| `tfod` | file (`.csv`) | yes | yes | lossy |
| `yolo` | directory (`images/` + `labels/`) | yes | yes | lossy |
| `voc` | directory (`Annotations/` + `JPEGImages/`) | yes | yes | lossy |
| `hf` | directory (`metadata.jsonl` / `metadata.parquet`) | yes | yes (`metadata.jsonl`) | lossy |
| `labelme` | file (`.json`) or directory (`annotations/`) | yes | yes | lossy |
| `create-ml` | file (`.json`) | yes | yes | lossy |
| `kitti` | directory (`label_2/` + `image_2/`) | yes | yes | lossy |
| `via` | file (`.json`) | yes | yes | lossy |
| `retinanet` | file (`.csv`) | yes | yes | lossy |

## IR JSON (`ir-json`)

- Canonical panlabel representation.
- Preserves dataset info, licenses, image metadata, and annotation attributes.
- Bboxes are stored in XYXY form.

## COCO JSON (`coco` / `coco-json`)

- Path kind: JSON file.
- Bbox format: `[x, y, width, height]` (absolute pixel coordinates).
- Converted to IR XYXY via bbox helpers.
- Writer behavior is deterministic (stable ordering by IDs).
- COCO `score` can map to IR `confidence` when present.
- COCO `segmentation` is accepted on read but ignored/dropped (panlabel currently models detection bboxes only). On write, panlabel emits `segmentation` as an empty array.

## Label Studio JSON (`label-studio` / `label-studio-json` / `ls`)

- Path kind: JSON file.
- Supported shape: Label Studio task export array (empty array is accepted as an empty dataset).
- Supported annotation type: `rectanglelabels` only.
- Coordinates are percentages; adapter maps to/from IR pixel XYXY.
- Reader supports legacy `completions` as fallback when `annotations` is absent.
- Label Studio result `score` (when present) maps to IR `confidence` (from either `annotations` or `predictions`).

Reader behavior:
- derives `Image.file_name` from `data.image` basename (normalizes `\` to `/`, strips query/fragment)
- requires derived basenames to be unique across tasks
- preserves full image reference in `Image.attributes["ls_image_ref"]`
- accepts either `annotations` or legacy `completions` per task (both present is an error)
- supports `predictions` alongside annotation sets
- each of `annotations` / `completions` / `predictions` may contain at most one result-set entry
- enforces `type == "rectanglelabels"` and exactly one label per result
- requires `original_width`/`original_height` on each result; if a task has zero results, falls back to `data.width`/`data.height`
- requires consistent `from_name`/`to_name` values within a task; when present, stores them in `Image.attributes["ls_from_name"]` and `Image.attributes["ls_to_name"]`
- stores non-zero rotation as `Annotation.attributes["ls_rotation_deg"]` and uses an axis-aligned envelope bbox in IR

Deterministic policy:
- reader image IDs: by derived basename (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then result order
- writer task order: by image file_name (lexicographic)

Writer behavior:
- writes Label Studio task export JSON
- splits results by confidence:
  - `confidence == None` -> `annotations`
  - `confidence == Some(_)` -> `predictions` + `score`
  - this means any IR annotation with confidence is written under `predictions`
- uses `ls_from_name` / `ls_to_name` image attributes if present, else defaults to `label` / `image`
- requires unique image basenames (derived from `data.image`) to avoid ambiguous `Image.file_name` mapping

Limitations:
- currently only rectanglelabels bbox annotations are supported
- rotation is flattened to axis-aligned geometry (angle retained as `ls_rotation_deg` only)
- Label Studio-specific metadata outside this mapping is not preserved

## TFOD CSV (`tfod` / `tfod-csv`)

- Path kind: CSV file.
- Columns: `filename,width,height,class,xmin,ymin,xmax,ymax`.
- Coordinates are normalized (0..1).

Deterministic policy:
- reader image IDs: by filename (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by CSV row order
- writer row order: by annotation ID

Limitations:
- no dataset-level metadata/licenses
- no image-level license/date metadata
- no annotation confidence/attributes
- images without annotations are not represented in TFOD output

## YOLO directory (`yolo` / `ultralytics` / `yolov8` / `yolov5`)

- Path kind: directory.
- Accepted input path:
  - dataset root containing `images/` and `labels/`
  - or `labels/` directory directly (with sibling `../images/`)
- Supports both flat layouts (Darknet-style, no `data.yaml` required) and split-aware layouts.
- Label row format (one line per bbox):
  - `<class_id> <x_center> <y_center> <width> <height> [confidence]`
  - normalized values
  - 5 tokens: detection bbox (confidence = None)
  - 6 tokens: detection bbox + confidence score (mapped to IR `Annotation.confidence`)
  - 7+ tokens: rejected (segmentation/pose not supported)

Reader behavior:
- class map precedence: `data.yaml``classes.txt` → inferred from labels
- flat layouts work without `data.yaml`: class names come from `classes.txt` (if present) or are inferred as `class_0`, `class_1`, etc.
- image resolution is read from image headers in `images/`
- each label file must map to a matching image file (same relative stem) under `images/`
- expected image extensions (lookup order): `jpg`, `png`, `jpeg`, `bmp`, `webp`
- lines with 7+ tokens are rejected (segmentation/pose not supported)

### Split-aware reading

When `data.yaml` contains `train:`, `val:`, or `test:` path keys (common in Roboflow/Ultralytics Hub exports), panlabel detects a split-aware layout and reads all splits.

Supported path patterns in `data.yaml`:
- Pattern A: `images/<split>` (e.g. `train: images/train`, labels inferred at `labels/train`)
- Pattern B: `<split>/images` (e.g. `train: train/images`, labels at `train/labels`)
- Pattern C: bare `<split>` pointing to a directory containing `images/` + `labels/`

Behavior:
- **Default (no `--split`):** all found splits are merged into a single IR Dataset. Image `file_name` values are prefixed with the split name (e.g. `train/img001.jpg`, `val/img002.jpg`) to avoid collisions.
- **`--split <name>`:** only the named split is read. Image `file_name` values are still prefixed with the split name for provenance.
- Class map: resolved from `data.yaml` `names:` when present, otherwise inferred across all selected label directories.
- `data.yaml` `path:` key (if present) is used as the base for resolving split-relative paths.
- Split provenance is stored in `Dataset.info.attributes`:
  - `yolo_layout_mode`: `"split_aware"` or `"flat"`
  - `yolo_splits_found`: comma-separated list of splits found (e.g. `"train,val,test"`)
  - `yolo_splits_read`: comma-separated list of splits actually read
- An error is raised if `--split` names a split not present in `data.yaml`, or if `--split` is used on a flat (non-split-aware) layout.

Writer behavior:
- creates output `images/` and `labels/` directories
- writes `data.yaml` with a `names:` mapping (sorted by class index); does not emit train/val paths or `nc`
- creates empty `.txt` files for images without annotations
- does **not** copy image binaries
- writes normalized floats with 6 decimal places
- emits an optional 6th confidence token when `Annotation.confidence` is `Some`

## Pascal VOC XML (`voc` / `pascal-voc` / `voc-xml`)

- Path kind: directory.
- Accepted input path:
  - dataset root containing `Annotations/`
  - or `Annotations/` directory directly (with optional sibling `../JPEGImages/`)
- Reader uses `<size>/<width>` and `<size>/<height>` from XML (no image-header probing).
- Reader stores object fields `pose`, `truncated`, `difficult`, `occluded` in `Annotation.attributes`.
- Reader stores `<size>/<depth>` as image attribute `depth`.
- Coordinate policy: reads `xmin/ymin/xmax/ymax` exactly as provided (no 0/1-based adjustment).
- Reader scans `Annotations/` flat (non-recursive); nested XML files are skipped with a warning.

Deterministic policy:
- reader image IDs: by `<filename>` (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by XML file order, then `<object>` order

Writer behavior:
- creates `Annotations/` and `JPEGImages/README.txt`
- writes one XML per image (including images without annotations)
- preserves image subdirectory structure in XML output path (`train/001.jpg` -> `Annotations/train/001.xml`)
- does **not** copy image binaries
- normalizes boolean attribute values when writing:
  - `true`/`yes`/`1` -> `1`
  - `false`/`no`/`0` -> `0`
  - any other value -> omitted

## Hugging Face ImageFolder metadata (`hf` / `hf-imagefolder` / `huggingface`)

- Path kind: directory.
- Accepted local input layout:
  - dataset root containing `metadata.jsonl` or `metadata.parquet`
  - split subdirectories (for example `train/`, `validation/`) each containing metadata
  - parquet shard layouts (for example `data/train-00000-of-00001.parquet`, `data/validation-*.parquet`, or `<config>/<split>/*.parquet`)
- Remote Hub import is supported in `convert` via `--hf-repo` (requires `hf-remote` feature).
- Remote zip-style split archives (for example `data/train.zip`) are also supported when they extract to YOLO, VOC, COCO JSON, or HF metadata layouts.

Reader behavior:
- object-container auto-detection: `objects` first, then `faces` (override with `--hf-objects-column`)
- category field aliases: `categories` or `category`
- category values may be names or integer IDs
- integer category name resolution precedence:
  - preflight ClassLabel names (remote)
  - then `--hf-category-map`
  - then integer fallback (`"0"`, `"1"`, ...)
- bbox interpretation is controlled by `--hf-bbox-format`:
  - `xywh` (default) treats bbox as `[x, y, width, height]`
  - `xyxy` treats bbox as `[x1, y1, x2, y2]`
- keeps bbox rows as parsed (validation reports degenerate/OOB issues later)
- width/height read from metadata when present, otherwise from image headers
- duplicate `file_name` rows are rejected
- when both `metadata.jsonl` and `metadata.parquet` are present, JSONL is preferred
- when no `metadata.jsonl` exists, panlabel can read supported parquet layouts (`metadata.parquet` or split parquet shards) with `hf-parquet`
- for parquet rows without `file_name`, panlabel derives it from `image.path` (or fallback IDs)

Writer behavior:
- writes `metadata.jsonl` (one row per image)
- writes `file_name`, `width`, `height`, and `objects.{bbox,categories}`
- deterministic output ordering:
  - metadata rows by image `file_name` (lexicographic)
  - per-image annotation lists by annotation ID
- does **not** copy image binaries
- output bbox format follows `--hf-bbox-format` (`xywh` default)

IR provenance notes:
- reader stores HF provenance in `Dataset.info.attributes` (for example `hf_bbox_format`)
- remote imports may also populate `hf_repo_id`, `hf_revision`, `hf_split`, `hf_license`, `hf_description`

## CVAT XML (`cvat` / `cvat-xml`)

- Path kind: XML file (`.xml`) or directory containing `annotations.xml`.
- Supported export: CVAT "for images" XML with `<annotations>` root.
- Supported annotation type: `<box>` only.
- Unsupported image-level annotation elements (for example `<polygon>`, `<points>`) are hard parse errors.
- Coordinates: absolute pixels (`xtl/ytl/xbr/ybr`) mapped 1:1 to IR pixel XYXY.

Reader behavior:
- accepts file input or directory input with root `annotations.xml`
- if `<meta><task><labels>` is present:
  - keeps labels with `<type>bbox</type>` (or no `<type>`)
  - verifies every `<box label="...">` exists in meta labels
- if meta labels are missing, infers categories from `<box label="...">`
- stores `<image id>` as `Image.attributes["cvat_image_id"]`
- stores box attributes as:
  - `occluded="1"` -> `Annotation.attributes["occluded"] = "1"`
  - non-zero `z_order` -> `Annotation.attributes["z_order"]`
  - non-empty `source` -> `Annotation.attributes["source"]`
  - `<attribute name="k">v</attribute>` -> `Annotation.attributes["cvat_attr_k"] = "v"`

Deterministic policy:
- reader image IDs: by `<image name>` (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then `<box>` order

Writer behavior:
- writes a single XML file (or `annotations.xml` inside output directory)
- emits minimal `<meta><task>` with `name='panlabel export'`, `mode='annotation'`, and `size` equal to image count
- writes labels only for categories referenced by annotations (unused categories are dropped)
- writes `<image>` entries for all images, including unannotated images
- image ordering: by `file_name` (lexicographic)
- image IDs are reassigned sequentially (0, 1, 2, ...) by sorted order; original `cvat_image_id` attributes are not preserved in output
- writes `<box>` entries sorted by annotation ID per image
- writes `cvat_attr_*` annotation attributes as `<attribute>` children of `<box>`
- normalizes `occluded` values:
  - `true`/`yes`/`1` -> `1`
  - `false`/`no`/`0` -> `0`
  - otherwise or missing -> `0`
- defaults missing or empty `source` attribute to `manual`
- defaults missing or invalid `z_order` to `0`

## LabelMe JSON (`labelme` / `labelme-json`)

- Path kind: JSON file or directory.
- One JSON file per image containing a `shapes` array with rectangle and polygon annotations.
- Supported shapes: `rectangle` (2 points: top-left, bottom-right), `polygon` (3+ points: converted to axis-aligned bbox envelope). Other shape types are rejected.
- Coordinates: absolute pixels.
- Missing `shape_type` defaults to `rectangle`.

Reader input modes:
- **Single file**: one `.json` file → one-image dataset
- **Separate directory**: `annotations/` subdirectory containing `.json` files
- **Co-located directory**: `.json` files alongside image files (identified by presence of `shapes` key)

Reader behavior:
- requires `imagePath`, `imageWidth`, and `imageHeight` in each JSON file
- derives `Image.file_name` from `imagePath` basename (single-file mode) or from the relative JSON path stem + image extension (directory mode)
- stores original `imagePath` value in `Image.attributes["labelme_image_path"]`
- polygons are flattened to axis-aligned bounding box envelopes; original shape type stored as `Annotation.attributes["labelme_shape_type"] = "polygon"`
- requires unique derived image names across all JSON files in directory mode

Deterministic policy:
- reader image IDs: by derived file_name (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then shape order
- writer file order: by image file_name (lexicographic)

Writer behavior:
- single-image datasets to a `.json` path: writes one LabelMe JSON file
- multi-image datasets or directory paths: writes canonical `annotations/<stem>.json` + `images/README.txt` layout
- all annotations are written as `rectangle` shapes with 2 corner points (polygons are not restored)
- does **not** copy image binaries
- uses `labelme_image_path` image attribute for `imagePath` if present, otherwise `file_name`

Limitations:
- only `rectangle` and `polygon` shape types are supported (others are rejected)
- polygon geometry is flattened to axis-aligned bbox envelope (shape type retained as attribute only)
- `imageData` (embedded base64 image data) is not preserved
- LabelMe flags and group_id are not preserved

## CreateML JSON (`create-ml` / `createml` / `create-ml-json`)

- Path kind: JSON file.
- Apple's annotation format for Core ML training.
- Flat JSON array where each element represents one image with its annotations.
- Bbox format: center-based absolute pixel coordinates `{x, y, width, height}` where `(x, y)` is the center of the box.
- Image dimensions are **not** stored in the JSON — the reader resolves them from local image files relative to the JSON file's parent directory.

Reader behavior:
- parses top-level JSON array of `{image, annotations}` objects
- `image` must be a non-empty relative path (absolute paths and `..` traversal are rejected)
- resolves image dimensions from disk by probing `<base_dir>/<image>` then `<base_dir>/images/<image>`
- rejects duplicate `image` entries
- rejects empty annotation labels

Deterministic policy:
- reader image IDs: by image filename (lexicographic)
- reader category IDs: by label name (lexicographic)
- reader annotation IDs: by image order then annotation order

Writer behavior:
- writes a single JSON array with one object per image
- uses center-based absolute pixel coordinates: `{x, y, width, height}`
- deterministic output: image rows sorted by filename, annotations sorted by annotation ID
- images without annotations are included (empty `annotations` array)
- does **not** write image dimensions (this is by design — CreateML resolves them at training time)

Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (dimensions, license, date)
- no annotation confidence/attributes
- requires image files on disk for reading (to resolve dimensions)

## KITTI (`kitti` / `kitti-txt`)

- Path kind: directory.
- Accepted input path:
  - dataset root containing `label_2/` and `image_2/`
  - or `label_2/` directory directly (with sibling `../image_2/`)
- Standard format in autonomous driving research.
- Per-image `.txt` files with 15 space-separated fields per line (optional 16th field: score).
- Fields: `type truncated occluded alpha xmin ymin xmax ymax dim_height dim_width dim_length loc_x loc_y loc_z rotation_y [score]`
- Bbox: fields 4–7 (`xmin ymin xmax ymax`) are absolute pixel coordinates.

Reader behavior:
- scans `label_2/` flat (non-recursive, top-level `.txt` files only)
- resolves images from `image_2/` with extension precedence: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.webp`
- maps `type` → category name, fields 4–7 → `BBoxXYXY<Pixel>`, optional field 15 → `Annotation.confidence`
- stores remaining numeric fields as annotation attributes with `kitti_*` prefix: `kitti_truncated`, `kitti_occluded`, `kitti_alpha`, `kitti_dim_height`, `kitti_dim_width`, `kitti_dim_length`, `kitti_loc_x`, `kitti_loc_y`, `kitti_loc_z`, `kitti_rotation_y`

Deterministic policy:
- reader image IDs: by resolved image filename (lexicographic)
- reader category IDs: by class/type name (lexicographic)
- reader annotation IDs: by label file order then line number

Writer behavior:
- creates `label_2/` + `image_2/README.txt`
- one `.txt` per image, empty files for unannotated images
- sorts images by `file_name`, annotations within each image by ID
- sources KITTI-specific fields from `kitti_*` annotation attributes; uses defaults for missing values: truncated=0, occluded=0, alpha=−10, dims=−1, loc=−1000, rotation_y=−10
- rejects `Image.file_name` with path separators (KITTI layout is flat)
- does **not** copy image binaries

Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (license, date)
- no annotation attributes outside the `kitti_*` set
- confidence is preserved via the optional `score` field

## VGG Image Annotator JSON (`via` / `via-json` / `vgg-via`)

- Path kind: JSON file.
- Popular academic annotation tool.
- Single JSON file with object-root keyed by arbitrary strings (typically `filename+size`).
- Each entry: `{ filename, size, regions, file_attributes }`.
- Supported region type: `rect` only (`shape_attributes.name == "rect"` with `x`, `y`, `width`, `height`).
- Image dimensions are **not** stored in the JSON — resolved from local image files.

Reader behavior:
- supports `regions` as either an array or an object map (both forms exist in real VIA exports)
- label resolution precedence from `region_attributes`: `label`, then `class`, then sole scalar attribute
- non-rect shapes are skipped with a warning
- image dimension resolution: `<json_dir>/<filename>` then `<json_dir>/images/<filename>`
- rejects duplicate filenames across entries
- stores `via_size_bytes` as image attribute; scalar `file_attributes` as `via_file_attr_<key>` image attributes
- stores scalar `region_attributes` (excluding the label key) as `via_region_attr_<key>` annotation attributes

Deterministic policy:
- reader image IDs: by filename (lexicographic)
- reader category IDs: by resolved label (lexicographic)
- reader annotation IDs: by image order then region order (for object-form regions, keys sorted lexicographically)

Writer behavior:
- writes JSON object keyed by `<filename><size>`
- `regions` always emitted as array, sorted by annotation ID
- uses canonical `label` key in `region_attributes` for category name
- reconstructs `file_attributes` from `via_file_attr_*` image attributes
- unannotated images preserved with `regions: []`
- does **not** copy image binaries

Limitations:
- only rectangle regions are supported
- no dataset-level metadata/licenses
- no annotation confidence
- requires image files on disk for reading (to resolve dimensions)

## RetinaNet Keras CSV (`retinanet` / `retinanet-csv` / `keras-retinanet`)

- Path kind: CSV file.
- Simple format used with keras-retinanet: `path,x1,y1,x2,y2,class_name`.
- Coordinates are absolute pixels (unlike TFOD which uses normalized coordinates).
- No header required (optional header row is tolerated).
- Unannotated images: `path,,,,,` (all-empty row).
- Image dimensions are **not** in the CSV — resolved from local image files.

Reader behavior:
- tolerates optional header row exactly matching `path,x1,y1,x2,y2,class_name`
- supports empty rows (`path,,,,,`) for unannotated images
- rejects partial rows (some bbox fields present, others empty)
- resolves image paths relative to CSV parent directory; absolute paths used as-is
- caches dimension lookups per image path

Deterministic policy:
- reader image IDs: by path (lexicographic)
- reader category IDs: by class name (lexicographic)
- reader annotation IDs: by CSV row order

Writer behavior:
- headerless CSV (matches keras-retinanet conventions)
- rows grouped by image, images sorted by `file_name`, annotations by ID
- unannotated images emit exactly one `path,,,,,` row
- does **not** copy image binaries

Limitations:
- no dataset-level metadata/licenses
- no image-level metadata (dimensions, license, date)
- no annotation confidence/attributes
- requires image files on disk for reading (to resolve dimensions)

## Future expansion rule

When formats become numerous, split this page into per-format files under `docs/formats/<format>.md` and keep this page as an index.