hotcoco
Fast enough for every epoch, lean enough for every dataset. A drop-in replacement for pycocotools that doesn't become the bottleneck — in your training loop or at foundation model scale. Up to 23× faster on standard COCO, 39× faster on Objects365, and fits comfortably in memory where alternatives run out.
Available as a Python package, CLI tool, and Rust library. Pure Rust — no Cython, no C compiler, no Microsoft Build Tools. Prebuilt wheels for Linux, macOS, and Windows.
Documentation | Changelog | Roadmap
Performance
Benchmarked on COCO val2017 (5,000 images, 36,781 synthetic detections), Apple M1 MacBook Air:
| Eval Type | pycocotools | faster-coco-eval | hotcoco |
|---|---|---|---|
| bbox | 9.46s | 2.45s (3.9x) | 0.41s (23.0x) |
| segm | 9.16s | 4.36s (2.1x) | 0.49s (18.6x) |
| keypoints | 2.62s | 1.78s (1.5x) | 0.21s (12.7x) |
Speedups in parentheses are vs pycocotools. Results verified against pycocotools on COCO val2017 with a 10,000+ case parity test suite — your AP scores won't change.
At scale (Objects365 val — 80k images, 365 categories, 1.2M detections), hotcoco completes in 18s vs 721s for pycocotools (39x) and 251s for faster-coco-eval (14x) — while using half the memory. See the full benchmarks.
Quick Start
Python
=
=
=
Drop-in replacement for pycocotools
If you use Detectron2, Ultralytics YOLO, mmdetection, or any other pycocotools-based pipeline, call init_as_pycocotools() once at startup — no other code changes needed:
# Existing code works unchanged
LVIS evaluation
hotcoco supports LVIS federated evaluation with all 13 metrics (AP, APr, APc, APf, AR@300, and more). Use LVISeval directly or call init_as_lvis() to drop into any existing lvis-api pipeline:
=
=
=
# {"AP": ..., "APr": ..., "APc": ..., "APf": ..., "AR@300": ...}
# Or as a drop-in for Detectron2 / MMDetection lvis-api pipelines
# resolves to hotcoco
Format conversion
Convert between COCO JSON and YOLO label format in either direction:
# COCO → YOLO
=
=
# {'images': 5000, 'annotations': 36781, 'skipped_crowd': 12, 'missing_bbox': 0}
# YOLO → COCO (with Pillow to read image dims)
=
Or from the CLI:
F-scores
f_scores() computes F-beta scores from the precision/recall curves. For each IoU threshold and category it finds the operating point that maximises F-beta, then averages — analogous to mAP:
=
# {"F1": 0.523, "F150": 0.712, "F175": 0.581}
# precision-weighted F-score
# recall-weighted F-score
Logging metrics
get_results() accepts an optional prefix and per-class flag, returning a flat dict that plugs directly into any experiment tracker:
=
# {"val/bbox/AP": 0.578, ..., "val/bbox/AP/person": 0.82, "val/bbox/AP/car": 0.71, ...}
Saving results
results() returns a serializable dict; save_results() writes it as JSON:
# {"params": {"iou_type": "bbox", ...}, "metrics": {"AP": 0.378, ...}, "per_class": {...}}
TIDE error analysis
tide_errors() decomposes every false positive and false negative into six error types — Localization, Classification, Duplicate, Background, Both, and Miss — and reports the ΔAP for each. Use it to understand why your model falls short, not just how much:
=
=
Or from the CLI:
CLI
Rust
use ;
use IouType;
use Path;
let coco_gt = COCOnew?;
let coco_dt = coco_gt.load_res?;
let mut eval = new;
eval.evaluate;
eval.accumulate;
eval.summarize;
License
MIT