๐ฆ Ultralytics YOLO Rust Inference
High-performance YOLO inference library written in Rust. This library provides a fast, safe, and efficient interface for running YOLO models using ONNX Runtime, with an API designed to match the Ultralytics Python package.
โจ Features
- ๐ High Performance - Pure Rust implementation with zero-cost abstractions
- ๐ฏ Ultralytics API Compatible -
Results,Boxes,Masks,Keypoints,Probsclasses matching Python - ๐ง Multiple Backends - CPU, CUDA, TensorRT, CoreML, OpenVINO, and more via ONNX Runtime
- ๐ฆ Dual Use - Library for Rust projects + standalone CLI application
- ๐ท๏ธ Auto Metadata - Automatically reads class names, task type, and input size from ONNX models
- โฌ๏ธ Auto Download - Automatically downloads YOLO11 and YOLO26 ONNX models (all sizes: n/s/m/l/x) when not found locally
- ๐ผ๏ธ Multiple Sources - Images, directories, glob patterns, video files, webcams, and streams
- ๐ชถ Lean Runtime - No PyTorch, TensorFlow, or Python runtime required
๐ Quick Start
Prerequisites
- Rust 1.88+ (install via rustup)
- A YOLO ONNX model (export from Ultralytics:
yolo export model=yolo26n.pt format=onnx)
Installation
# Install CLI globally from crates.io
# Install CLI globally with custom features
# Minimal build (no default features)
# Enable video support
# Enable multiple accelerators
Development install
# Install CLI directly from the git repository
# Or clone, build, and install from source
# Install from local checkout
cargo install places binaries in Cargo's default bin directory:
- macOS/Linux:
~/.cargo/bin - Windows:
%USERPROFILE%\\.cargo\\bin
Ensure this directory is in your PATH, then run from anywhere:
Export a YOLO Model to ONNX
# Using Ultralytics CLI
# Or with Python
)
)
Run Inference
# With defaults (auto-downloads yolo26n.onnx and sample images)
# Select task โ auto-downloads the nano model for that task
# With explicit model (task is read from model metadata)
# Auto-download any supported size (n/s/m/l/x)
# On a directory of images
# With custom thresholds
# Filter by class IDs
# With visualization and custom image size
# Save individual frames for video input
# Rectangular inference
Example Output
# ultralytics-inference predict
WARNING โ ๏ธ 'model' argument is missing. Using default '--model=yolo26n.onnx'.
WARNING โ ๏ธ 'source' argument is missing. Using default images: https://ultralytics.com/images/bus.jpg, https://ultralytics.com/images/zidane.jpg
Ultralytics 0.0.13 ๐ Rust ONNX FP32 CPU
Using ONNX Runtime CPUExecutionProvider
YOLO26n summary: 80 classes, imgsz=(640, 640)
image 1/2 /home/ultralytics/inference/bus.jpg: 640x480 640x480 4 persons, 1 bus, 36.4ms
image 2/2 /home/ultralytics/inference/zidane.jpg: 384x640 2 persons, 1 tie, 28.6ms
Speed: 1.5ms preprocess, 32.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)
Results saved to runs/detect/predict1
๐ก Learn more at https://docs.ultralytics.com/modes/predict
With --task (auto-downloads the matching nano model):
# ultralytics-inference predict --task segment
)
)
๐ Usage
As a CLI Tool
# Show help
# Show version
# Run inference
--help and --version are also supported as standard flag aliases.
CLI Options:
| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Path to ONNX model file; auto-downloaded if a known YOLO11/YOLO26 name | yolo26n.onnx |
--task |
Task type (detect, segment, pose, obb, classify); selects nano model when --model is omitted |
detect |
|
--source |
-s |
Input source (image, directory, glob, video, webcam index, or URL) | Task dependent Ultralytics URL assets |
--conf |
Confidence threshold | 0.25 |
|
--iou |
IoU threshold for NMS | 0.7 |
|
--max-det |
Maximum number of detections | 300 |
|
--imgsz |
Inference image size | Model metadata |
|
--rect |
Enable rectangular inference (minimal padding) | true |
|
--batch |
Batch size for inference | 1 |
|
--half |
Use FP16 half-precision inference | false |
|
--save |
Save annotated results to runs/<task>/predict | true |
|
--save-frames |
Save individual frames for video | false |
|
--show |
Display results in a window | false |
|
--device |
Device (cpu, cuda:0, mps, coreml, directml:0, openvino, tensorrt:0, xnnpack) | cpu |
|
--verbose |
Show verbose output | true |
|
--classes |
Filter by class IDs, e.g. 0 or "0,1,2" or "[0, 1, 2]" |
all classes |
Task and Model Resolution:
| Invocation | Model used | Notes |
|---|---|---|
predict |
yolo26n.onnx |
Default detect model, auto-downloaded |
predict --task segment |
yolo26n-seg.onnx |
Nano seg model, auto-downloaded |
predict --task pose |
yolo26n-pose.onnx |
Nano pose model, auto-downloaded |
predict --task obb |
yolo26n-obb.onnx |
Nano OBB model, auto-downloaded |
predict --task classify |
yolo26n-cls.onnx |
Nano classify model, auto-downloaded |
predict --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
Task read from model metadata |
predict --task segment --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
--task matches metadata, proceeds normally |
predict --task segment --model yolo26n.onnx |
error | --task conflicts with model metadata (detect), exits with error |
Auto-downloadable models:
All YOLO11 and YOLO26 ONNX models in sizes n / s / m / l / x across all five task variants are supported for auto-download:
| Family | Variants |
|---|---|
| YOLO26 | yolo26{n,s,m,l,x}.onnx, yolo26{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
| YOLO11 | yolo11{n,s,m,l,x}.onnx, yolo11{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
Source Options:
| Source Type | Example Input | Description |
|---|---|---|
| Image | image.jpg |
Single image file |
| Directory | images/ |
Directory of images |
| Glob | images/*.jpg |
Glob pattern for images |
| Video | video.mp4 |
Video file |
| Webcam | 0,1 |
Webcam index (0 = default webcam) |
| URL | https://example.com/image.jpg |
Remote image URL |
As a Rust Library
Add to your Cargo.toml (choose one):
# Stable release from crates.io
[]
= "0.0.13"
# Development version (latest unreleased code from GitHub)
[]
= { = "https://github.com/ultralytics/inference.git" }
Basic Usage:
use ;
With Custom Configuration:
use ;
Accessing Detection Data:
if let Some = result.boxes
Selecting a Device:
use ;
๐๏ธ Project Structure
inference/
โโโ src/
โ โโโ lib.rs # Library entry point and public exports
โ โโโ main.rs # CLI application
โ โโโ model.rs # YOLOModel - ONNX session and inference
โ โโโ results.rs # Results, Boxes, Masks, Keypoints, Probs, Obb
โ โโโ preprocessing.rs # Image preprocessing (letterbox, normalize, SIMD)
โ โโโ postprocessing.rs # Detection post-processing (NMS, decode, SIMD)
โ โโโ metadata.rs # ONNX model metadata parsing
โ โโโ source.rs # Input source handling (images, video, webcam)
โ โโโ task.rs # Task enum (Detect, Segment, Pose, Classify, Obb)
โ โโโ inference.rs # InferenceConfig
โ โโโ batch.rs # Batch processing pipeline
โ โโโ device.rs # Device enum (CPU, CUDA, MPS, CoreML, etc.)
โ โโโ download.rs # Model and asset downloading
โ โโโ annotate.rs # Image annotation (bounding boxes, masks, keypoints)
โ โโโ io.rs # Result saving (images, videos)
โ โโโ logging.rs # Logging macros
โ โโโ error.rs # Error types
โ โโโ utils.rs # Utility functions (NMS, IoU)
โ โโโ cli/ # CLI module
โ โ โโโ mod.rs # CLI module exports
โ โ โโโ args.rs # CLI argument parsing
โ โ โโโ predict.rs # Predict command implementation
โ โโโ visualizer/ # Real-time visualization (minifb)
โโโ tests/
โ โโโ integration_test.rs # Integration tests
โโโ assets/ # Test images
โ โโโ bus.jpg
โ โโโ zidane.jpg
โโโ Cargo.toml # Rust dependencies and features
โโโ LICENSE # AGPL-3.0 License
โโโ README.md # This file
โก Hardware Acceleration
Enable hardware acceleration by adding features to your build:
# NVIDIA GPU (CUDA)
# NVIDIA TensorRT
# Apple CoreML (macOS/iOS)
# Intel OpenVINO
# Multiple features
Available Features:
| Feature | Description |
|---|---|
cuda |
NVIDIA CUDA support |
tensorrt |
NVIDIA TensorRT optimization |
coreml |
Apple CoreML (macOS/iOS) |
openvino |
Intel OpenVINO |
onednn |
Intel oneDNN |
rocm |
AMD ROCm |
directml |
DirectML (Windows) |
nnapi |
Android Neural Networks API |
xnnpack |
XNNPACK (cross-platform) |
nvidia |
Convenience: CUDA + TensorRT |
intel |
Convenience: OpenVINO + oneDNN |
mobile |
Convenience: NNAPI + CoreML + QNN |
๐ฆ Dependencies
One of the key benefits of this library is a Rust/ONNX Runtime stack with no PyTorch, TensorFlow, or Python runtime required.
Core Dependencies (always included)
| Crate | Purpose |
|---|---|
ort |
ONNX Runtime bindings |
ndarray |
N-dimensional arrays |
image |
Image loading/decoding |
jpeg-decoder |
JPEG decoding |
fast_image_resize |
SIMD-optimized resizing |
half |
FP16 support |
lru |
LRU cache for preprocessing LUT |
wide |
SIMD for fast preprocessing |
Optional Dependencies (for --save feature)
| Crate | Purpose |
|---|---|
imageproc |
Drawing boxes and shapes |
ab_glyph |
Text rendering (embedded font) |
Optional Dependencies (for Video & Visualization)
| Crate | Purpose |
|---|---|
minifb |
Window creation and buffer display |
video-rs |
Video decoding/encoding (ffmpeg) |
Video Support (FFmpeg)
Video features require FFmpeg (7 or 8) installed on your system:
# macOS
# Ubuntu/Debian
# Build with video support
To build without annotation support (smaller binary):
๐งช Testing
# Run all tests
# Run with output
# Run specific test
๐ Performance
Benchmarks on Apple M4 MacBook Pro (CPU, ONNX Runtime):
YOLO26n Detection Model (640x640)
| Precision | Model Size | Preprocess | Inference | Postprocess | Total |
|---|---|---|---|---|---|
| FP32 | 10.2 MB | ~9ms | ~21ms | <1ms | ~31ms |
| FP16 | 5.2 MB | ~9ms | ~24ms | <1ms | ~34ms |
Key findings:
- FP16 models are ~50% smaller (5.2 MB vs 10.2 MB)
- FP32 is slightly faster on CPU (~21ms vs ~24ms) due to CPU's native FP32 support
- FP16 requires upcasting to FP32 for computation on most CPUs, adding overhead
- Use FP32 for CPU inference, FP16 for GPU (where it provides speedup)
Threading Optimization
ONNX Runtime threading is set to auto (num_threads: 0) which lets ORT choose optimal thread count:
- Manual threading (4 threads): ~40ms inference
- Auto threading (0 = ORT decides): ~21ms inference
๐ฎ Roadmap
Completed
- Detection, Segmentation, Pose, Classification, OBB inference
- ONNX model metadata parsing (auto-detect classes, task, imgsz)
- Hardware acceleration support (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK)
- Ultralytics-compatible Results API (
Boxes,Masks,Keypoints,Probs,Obb) - Multiple input sources (images, directories, globs, URLs)
- Video file support and webcam/RTSP streaming
- Image annotation and visualization
- FP16 half-precision inference
- Batch inference support
- Rectangular inference support and optimization
- Class filtering support
- Auto-download all YOLO11 and YOLO26 ONNX models (all sizes n/s/m/l/x, all tasks)
-
--taskCLI flag: selects and auto-downloads the matching nano model when--modelis omitted; errors on task/model metadata conflict
In Progress
- Python bindings (PyO3)
- WebAssembly (WASM) support for browser inference
๐ก Contributing
Ultralytics thrives on community collaboration, and we deeply value your contributions! Whether it's reporting bugs, suggesting features, or submitting code changes, your involvement is crucial.
- Report Issues: Found a bug? Open an issue
- Feature Requests: Have an idea? Share it
- Pull Requests: Read our Contributing Guide first
- Feedback: Take our Survey
A heartfelt thank you ๐ goes out to all our contributors! Your efforts help make Ultralytics tools better for everyone.
๐ License
Ultralytics offers two licensing options to suit different needs:
- AGPL-3.0 License: This OSI-approved open-source license is perfect for students, researchers, and enthusiasts. It encourages open collaboration and knowledge sharing. See the LICENSE file for full details.
- Ultralytics Enterprise License: Designed for commercial use, this license allows for the seamless integration of Ultralytics software and AI models into commercial products and services, bypassing the open-source requirements of AGPL-3.0. If your use case involves commercial deployment, please contact us via Ultralytics Licensing.
๐ฎ Contact
- GitHub Issues: Bug reports and feature requests
- Discord: Join our community
- Documentation: docs.ultralytics.com
