๐ฆ Ultralytics YOLO Rust Inference
High-performance YOLO inference library written in Rust. This library provides a fast, safe, and efficient interface for running YOLO models using ONNX Runtime, with an API designed to match the Ultralytics Python package.
โจ Features
- ๐ High Performance - Pure Rust implementation with zero-cost abstractions
- ๐ฏ Ultralytics API Compatible -
Results,Boxes,Masks,Keypoints,Probsclasses matching Python - ๐ง Multiple Backends - CPU, CUDA, TensorRT, CoreML, OpenVINO, and more via ONNX Runtime
- ๐ฆ Dual Use - Library for Rust projects + standalone CLI application
- ๐ท๏ธ Auto Metadata - Automatically reads class names, task type, and input size from ONNX models
- โฌ๏ธ Auto Download - Automatically downloads YOLO11 and YOLO26 ONNX models (all sizes: n/s/m/l/x) when not found locally
- ๐ผ๏ธ Multiple Sources - Images, directories, glob patterns, video files, webcams, and streams
- ๐ชถ Lean Runtime - No PyTorch, TensorFlow, or Python runtime required
๐ Quick Start
Prerequisites
- Rust 1.88+ (install via rustup)
- A YOLO ONNX model (export from Ultralytics:
yolo export model=yolo26n.pt format=onnx)
Installation
# Install CLI globally from crates.io
# Install CLI globally with custom features
# Minimal build (no default features)
# Enable video support
# Enable multiple accelerators
Development install
# Install CLI directly from the git repository
# Or clone, build, and install from source
# Install from local checkout
cargo install places binaries in Cargo's default bin directory:
- macOS/Linux:
~/.cargo/bin - Windows:
%USERPROFILE%\\.cargo\\bin
Ensure this directory is in your PATH, then run from anywhere:
Export a YOLO Model to ONNX
# Using Ultralytics CLI
# Or with Python
)
)
Run Inference
# With defaults (auto-downloads yolo26n.onnx and sample images)
# Select task โ auto-downloads the nano model for that task
# With explicit model (task is read from model metadata)
# Auto-download any supported size (n/s/m/l/x)
# On a directory of images
# With custom thresholds
# Filter by class IDs
# With visualization and custom image size
# Save individual frames for video input
# Rectangular inference
Example Output
# ultralytics-inference predict
WARNING โ ๏ธ 'model' argument is missing. Using default '--model=yolo26n.onnx'.
WARNING โ ๏ธ 'source' argument is missing. Using default images: https://ultralytics.com/images/bus.jpg, https://ultralytics.com/images/zidane.jpg
Ultralytics 0.0.12 ๐ Rust ONNX FP32 CPU
Using ONNX Runtime CPUExecutionProvider
YOLO26n summary: 80 classes, imgsz=(640, 640)
image 1/2 /home/ultralytics/inference/bus.jpg: 640x480 640x480 4 persons, 1 bus, 36.4ms
image 2/2 /home/ultralytics/inference/zidane.jpg: 384x640 2 persons, 1 tie, 28.6ms
Speed: 1.5ms preprocess, 32.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)
Results saved to runs/detect/predict1
๐ก Learn more at https://docs.ultralytics.com/modes/predict
With --task (auto-downloads the matching nano model):
# ultralytics-inference predict --task segment
)
)
๐ Usage
As a CLI Tool
# Show help
# Show version
# Run inference
--help and --version are also supported as standard flag aliases.
CLI Options:
| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Path to ONNX model file; auto-downloaded if a known YOLO11/YOLO26 name | yolo26n.onnx |
--task |
Task type (detect, segment, pose, obb, classify); selects nano model when --model is omitted |
detect |
|
--source |
-s |
Input source (image, directory, glob, video, webcam index, or URL) | Task dependent Ultralytics URL assets |
--conf |
Confidence threshold | 0.25 |
|
--iou |
IoU threshold for NMS | 0.7 |
|
--max-det |
Maximum number of detections | 300 |
|
--imgsz |
Inference image size | Model metadata |
|
--rect |
Enable rectangular inference (minimal padding) | true |
|
--batch |
Batch size for inference | 1 |
|
--half |
Use FP16 half-precision inference | false |
|
--save |
Save annotated results to runs/<task>/predict | true |
|
--save-frames |
Save individual frames for video | false |
|
--show |
Display results in a window | false |
|
--device |
Device (cpu, cuda:0, mps, coreml, directml:0, openvino, tensorrt:0, xnnpack) | cpu |
|
--verbose |
Show verbose output | true |
|
--classes |
Filter by class IDs, e.g. 0 or "0,1,2" or "[0, 1, 2]" |
all classes |
Task and Model Resolution:
| Invocation | Model used | Notes |
|---|---|---|
predict |
yolo26n.onnx |
Default detect model, auto-downloaded |
predict --task segment |
yolo26n-seg.onnx |
Nano seg model, auto-downloaded |
predict --task pose |
yolo26n-pose.onnx |
Nano pose model, auto-downloaded |
predict --task obb |
yolo26n-obb.onnx |
Nano OBB model, auto-downloaded |
predict --task classify |
yolo26n-cls.onnx |
Nano classify model, auto-downloaded |
predict --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
Task read from model metadata |
predict --task segment --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
--task matches metadata, proceeds normally |
predict --task segment --model yolo26n.onnx |
error | --task conflicts with model metadata (detect), exits with error |
Auto-downloadable models:
All YOLO11 and YOLO26 ONNX models in sizes n / s / m / l / x across all five task variants are supported for auto-download:
| Family | Variants |
|---|---|
| YOLO26 | yolo26{n,s,m,l,x}.onnx, yolo26{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
| YOLO11 | yolo11{n,s,m,l,x}.onnx, yolo11{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
Source Options:
| Source Type | Example Input | Description |
|---|---|---|
| Image | image.jpg |
Single image file |
| Directory | images/ |
Directory of images |
| Glob | images/*.jpg |
Glob pattern for images |
| Video | video.mp4 |
Video file |
| Webcam | 0,1 |
Webcam index (0 = default webcam) |
| URL | https://example.com/image.jpg |
Remote image URL |
As a Rust Library
Add to your Cargo.toml (choose one):
# Stable release from crates.io
[]
= "0.0.12"
# Development version (latest unreleased code from GitHub)
[]
= { = "https://github.com/ultralytics/inference.git" }
Basic Usage:
use ;
With Custom Configuration:
use ;
Accessing Detection Data:
if let Some = result.boxes
Selecting a Device:
use ;
๐๏ธ Project Structure
inference/
โโโ src/
โ โโโ lib.rs # Library entry point and public exports
โ โโโ main.rs # CLI application
โ โโโ model.rs # YOLOModel - ONNX session and inference
โ โโโ results.rs # Results, Boxes, Masks, Keypoints, Probs, Obb
โ โโโ preprocessing.rs # Image preprocessing (letterbox, normalize, SIMD)
โ โโโ postprocessing.rs # Detection post-processing (NMS, decode, SIMD)
โ โโโ metadata.rs # ONNX model metadata parsing
โ โโโ source.rs # Input source handling (images, video, webcam)
โ โโโ task.rs # Task enum (Detect, Segment, Pose, Classify, Obb)
โ โโโ inference.rs # InferenceConfig
โ โโโ batch.rs # Batch processing pipeline
โ โโโ device.rs # Device enum (CPU, CUDA, MPS, CoreML, etc.)
โ โโโ download.rs # Model and asset downloading
โ โโโ annotate.rs # Image annotation (bounding boxes, masks, keypoints)
โ โโโ io.rs # Result saving (images, videos)
โ โโโ logging.rs # Logging macros
โ โโโ error.rs # Error types
โ โโโ utils.rs # Utility functions (NMS, IoU)
โ โโโ cli/ # CLI module
โ โ โโโ mod.rs # CLI module exports
โ โ โโโ args.rs # CLI argument parsing
โ โ โโโ predict.rs # Predict command implementation
โ โโโ visualizer/ # Real-time visualization (minifb)
โโโ tests/
โ โโโ integration_test.rs # Integration tests
โโโ assets/ # Test images
โ โโโ bus.jpg
โ โโโ zidane.jpg
โโโ Cargo.toml # Rust dependencies and features
โโโ LICENSE # AGPL-3.0 License
โโโ README.md # This file
โก Hardware Acceleration
Enable hardware acceleration by adding features to your build:
# NVIDIA GPU (CUDA)
# NVIDIA TensorRT
# Apple CoreML (macOS/iOS)
# Intel OpenVINO
# Multiple features
Available Features:
| Feature | Description |
|---|---|
cuda |
NVIDIA CUDA support |
tensorrt |
NVIDIA TensorRT optimization |
coreml |
Apple CoreML (macOS/iOS) |
openvino |
Intel OpenVINO |
onednn |
Intel oneDNN |
rocm |
AMD ROCm |
directml |
DirectML (Windows) |
nnapi |
Android Neural Networks API |
xnnpack |
XNNPACK (cross-platform) |
nvidia |
Convenience: CUDA + TensorRT |
intel |
Convenience: OpenVINO + oneDNN |
mobile |
Convenience: NNAPI + CoreML + QNN |
๐ฆ Dependencies
One of the key benefits of this library is a Rust/ONNX Runtime stack with no PyTorch, TensorFlow, or Python runtime required.
Core Dependencies (always included)
| Crate | Purpose |
|---|---|
ort |
ONNX Runtime bindings |
ndarray |
N-dimensional arrays |
image |
Image loading/decoding |
jpeg-decoder |
JPEG decoding |
fast_image_resize |
SIMD-optimized resizing |
half |
FP16 support |
lru |
LRU cache for preprocessing LUT |
wide |
SIMD for fast preprocessing |
Optional Dependencies (for --save feature)
| Crate | Purpose |
|---|---|
imageproc |
Drawing boxes and shapes |
ab_glyph |
Text rendering (embedded font) |
Optional Dependencies (for Video & Visualization)
| Crate | Purpose |
|---|---|
minifb |
Window creation and buffer display |
video-rs |
Video decoding/encoding (ffmpeg) |
Video Support (FFmpeg)
Video features require FFmpeg (7 or 8) installed on your system:
# macOS
# Ubuntu/Debian
# Build with video support
To build without annotation support (smaller binary):
๐งช Testing
# Run all tests
# Run with output
# Run specific test
๐ Performance
Benchmarks on Apple M4 MacBook Pro (CPU, ONNX Runtime):
YOLO26n Detection Model (640x640)
| Precision | Model Size | Preprocess | Inference | Postprocess | Total |
|---|---|---|---|---|---|
| FP32 | 10.2 MB | ~9ms | ~21ms | <1ms | ~31ms |
| FP16 | 5.2 MB | ~9ms | ~24ms | <1ms | ~34ms |
Key findings:
- FP16 models are ~50% smaller (5.2 MB vs 10.2 MB)
- FP32 is slightly faster on CPU (~21ms vs ~24ms) due to CPU's native FP32 support
- FP16 requires upcasting to FP32 for computation on most CPUs, adding overhead
- Use FP32 for CPU inference, FP16 for GPU (where it provides speedup)
Threading Optimization
ONNX Runtime threading is set to auto (num_threads: 0) which lets ORT choose optimal thread count:
- Manual threading (4 threads): ~40ms inference
- Auto threading (0 = ORT decides): ~21ms inference
๐ฎ Roadmap
Completed
- Detection, Segmentation, Pose, Classification, OBB inference
- ONNX model metadata parsing (auto-detect classes, task, imgsz)
- Hardware acceleration support (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK)
- Ultralytics-compatible Results API (
Boxes,Masks,Keypoints,Probs,Obb) - Multiple input sources (images, directories, globs, URLs)
- Video file support and webcam/RTSP streaming
- Image annotation and visualization
- FP16 half-precision inference
- Batch inference support
- Rectangular inference support and optimization
- Class filtering support
- Auto-download all YOLO11 and YOLO26 ONNX models (all sizes n/s/m/l/x, all tasks)
-
--taskCLI flag: selects and auto-downloads the matching nano model when--modelis omitted; errors on task/model metadata conflict
In Progress
- Python bindings (PyO3)
- WebAssembly (WASM) support for browser inference
๐ก Contributing
Ultralytics thrives on community collaboration! We deeply value your contributions.
- Report Issues: Found a bug? Open an issue
- Feature Requests: Have an idea? Share it
- Pull Requests: Read our Contributing Guide first
- Feedback: Take our Survey
๐ License
Ultralytics offers two licensing options:
- AGPL-3.0 License: Open-source license for students, researchers, and enthusiasts. See LICENSE.
- Enterprise License: For commercial applications. Contact Ultralytics Licensing.
๐ฎ Contact
- GitHub Issues: Bug reports and feature requests
- Discord: Join our community
- Documentation: docs.ultralytics.com