Expand description
§Ultralytics YOLO Inference Library
High-performance YOLO model inference library written in Rust, providing a safe and efficient interface for running Ultralytics YOLO models on images, videos, and streams.
§Features
- High Performance - Pure Rust with zero-cost abstractions and SIMD-optimized preprocessing
- ONNX Runtime - Leverages ONNX Runtime for cross-platform hardware acceleration
- Supported YOLO Versions -
YOLO26,YOLO11, andYOLOv8(including YOLO26 end-to-end NMS-free exports) - All Tasks - Detection, segmentation, pose estimation, classification, OBB, and semantic segmentation (YOLO26 only)
- Ultralytics API - Results API matches the Python package for easy migration
- Multiple Backends - CPU, CUDA,
TensorRT,CoreML,OpenVINO, and more - Multiple Sources - Images, directories, glob patterns, video, webcam, streams
§Installation
Add to your Cargo.toml:
[dependencies]
ultralytics-inference = "0.0.18"Or install the CLI tool:
cargo install ultralytics-inference§Quick Start (Library)
use ultralytics_inference::{YOLOModel, InferenceConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load model - metadata (classes, task, imgsz) is read automatically
let mut model = YOLOModel::load("yolo26n.onnx")?;
// Run inference
let results = model.predict("image.jpg")?;
// Process results
for result in &results {
if let Some(ref boxes) = result.boxes {
println!("Found {} detections", boxes.len());
for i in 0..boxes.len() {
let cls = boxes.cls()[i] as usize;
let conf = boxes.conf()[i];
let name = result.names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
println!(" {} {:.2}", name, conf);
}
}
}
Ok(())
}§CLI Usage
The ultralytics-inference CLI provides a command-line interface for running YOLO inference:
# Install the CLI
cargo install ultralytics-inference
# Run with defaults (auto-downloads model and sample images)
ultralytics-inference predict
# Select task: auto-downloads the matching nano model
ultralytics-inference predict --task segment
ultralytics-inference predict --task pose
ultralytics-inference predict --task obb
ultralytics-inference predict --task classify
# Run on a specific image
ultralytics-inference predict --model yolo26n.onnx --source image.jpg
# Run on a directory of images
ultralytics-inference predict --model yolo26n.onnx --source images/
# With custom thresholds
ultralytics-inference predict -m yolo26n.onnx -s image.jpg --conf 0.5 --iou 0.7
# Filter by class IDs
ultralytics-inference predict --source image.jpg --classes "0,1,2"
# With visualization window
ultralytics-inference predict --model yolo26n.onnx --source video.mp4 --show
# Save annotated results
ultralytics-inference predict --model yolo26n.onnx --source image.jpg --save
# Save individual frames for video input
ultralytics-inference predict --source video.mp4 --save-frames
# Show help
ultralytics-inference help
# Show version
ultralytics-inference versionCLI Options:
| Option | Short | Description | Default |
|---|---|---|---|
--model | -m | Path to ONNX model file; auto-downloaded if a known YOLO26/YOLO11/YOLOv8 name | yolo26n.onnx |
--task | Task type (detect, segment, pose, obb, classify, semantic*); selects nano model when --model is omitted | detect | |
--source | -s | Input source (image, directory, glob, video, webcam index, or URL) | Task-dependent sample assets |
--conf | Confidence threshold | 0.25 | |
--iou | IoU threshold for NMS | 0.7 | |
--max-det | Maximum number of detections | 300 | |
--imgsz | Inference image size | Model metadata | |
--rect | Enable rectangular inference (minimal padding) | true | |
--batch | Batch size for inference | 1 | |
--half | Use FP16 half-precision inference | false | |
--save | Save annotated results to runs/<task>/predict | true | |
--save-frames | Save individual frames for video input | false | |
--save-json | Save semantic segmentation class-map PNGs for external evaluation | false | |
--show | Display results in a window | false | |
--device | Device (cpu, cuda:0, coreml, directml:0, openvino, tensorrt:0, xnnpack) | cpu | |
--verbose | Show verbose output | true | |
--classes | Filter by class IDs, e.g. 0 or "0,1,2" or "[0, 1, 2]" | all classes |
* semantic (semantic segmentation) is YOLO26-only.
§Task-Specific Examples
The library supports all YOLO tasks. Export models from Python:
# Detection (default)
yolo export model=yolo26n.pt format=onnx
# Segmentation
yolo export model=yolo26n-seg.pt format=onnx
# Pose Estimation
yolo export model=yolo26n-pose.pt format=onnx
# Classification
yolo export model=yolo26n-cls.pt format=onnx
# Oriented Bounding Boxes
yolo export model=yolo26n-obb.pt format=onnx
# Semantic Segmentation (YOLO26 only)
yolo export model=yolo26n-sem.pt format=onnxThe task is auto-detected from ONNX metadata:
use ultralytics_inference::YOLOModel;
// Detection model - returns bounding boxes
let mut model = YOLOModel::load("yolo26n.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref boxes) = results[0].boxes {
println!("Found {} detections", boxes.len());
for i in 0..boxes.len() {
let cls = boxes.cls()[i] as usize;
let conf = boxes.conf()[i];
let name = results[0].names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
println!(" {} {:.2}", name, conf);
}
}
// Segmentation model - returns instance masks
let mut model = YOLOModel::load("yolo26n-seg.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref masks) = results[0].masks {
println!("Found {} instance masks", masks.len());
}
// Pose model - returns keypoints
let mut model = YOLOModel::load("yolo26n-pose.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref keypoints) = results[0].keypoints {
println!("Found {} poses", keypoints.len());
}
// OBB model - returns oriented bounding boxes
let mut model = YOLOModel::load("yolo26n-obb.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref obb) = results[0].obb {
println!("Found {} oriented boxes", obb.len());
for i in 0..obb.len() {
let conf = obb.conf()[i];
let cls = obb.cls()[i] as usize;
let name = results[0].names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
println!(" {} {:.2}", name, conf);
}
}
// Classification model - returns probabilities
let mut model = YOLOModel::load("yolo26n-cls.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref probs) = results[0].probs {
println!("Top-1: class {} ({:.1}%)", probs.top1(), probs.top1conf() * 100.0);
}
// Semantic segmentation model (YOLO26 only) - returns a per-pixel class map
let mut model = YOLOModel::load("yolo26n-sem.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref sem) = results[0].semantic_mask {
println!("Semantic mask shape: {:?}", sem.data.shape());
}§Custom Configuration
Use the builder pattern to customize inference settings:
use ultralytics_inference::InferenceConfig;
let config = InferenceConfig::new()
.with_confidence(0.5) // Confidence threshold
.with_iou(0.45) // NMS IoU threshold
.with_max_det(300) // Max detections per image
.with_imgsz(640, 640); // Input image size§Hardware Acceleration
See the CUDA / TensorRT acceleration guide for setup,
requirements, and the zero-copy GPU preprocess fast path.
Enable hardware acceleration with Cargo features:
# NVIDIA CUDA
cargo build --release --features cuda
# NVIDIA TensorRT
cargo build --release --features tensorrt
# NVIDIA GPU preprocess + zero-copy device input
# (requires CUDA toolkit; see docs/CUDA.md)
cargo build --release --features cuda-preprocess
# Apple CoreML
cargo build --release --features coreml
# Intel OpenVINO
cargo build --release --features openvino§Results API
The Results struct provides access to inference outputs:
Boxes- Bounding boxes withxyxy(),xywh(),xyxyn(),xywhn(),conf(),cls()methodsMasks- Segmentation masks withdata,orig_shapefieldsKeypoints- Pose keypoints withxy(),xyn(),conf()methodsProbs- Classification probabilities withtop1(),top5(),top1conf(),top5conf()methodsObb- Oriented bounding boxes withxyxyxyxy(),xywhr(),conf(),cls()methods
§Module Overview
| Module | Description |
|---|---|
model | Core YOLOModel for loading models and running inference |
results | Output types (Results, Boxes, Masks, etc.) |
inference | InferenceConfig for customizing inference settings |
source | Input source handling (Source, SourceIterator) |
task | YOLO task types (Task: Detect, Segment, Pose, Classify, Obb, Semantic) |
error | Error types (InferenceError, Result) |
preprocessing | Image preprocessing utilities |
postprocessing | Post-processing for all tasks (NMS/decode for detection; argmax for semantic segmentation) |
metadata | ONNX model metadata parsing |
§Feature Flags
Default features (enabled unless --no-default-features is passed): annotate, visualize.
| Feature | Description |
|---|---|
annotate | Image annotation for --save (default) |
visualize | Real-time window display for --show (default) |
video | Video file decoding/encoding (requires FFmpeg) |
cuda | NVIDIA CUDA acceleration |
tensorrt | NVIDIA TensorRT optimization |
coreml | Apple CoreML (macOS/iOS) |
openvino | Intel OpenVINO |
onednn | Intel oneDNN |
rocm | AMD ROCm |
migraphx | AMD MIGraphX |
directml | DirectML (Windows) |
nnapi | Android Neural Networks API |
qnn | Qualcomm Neural Networks |
xnnpack | XNNPACK (cross-platform) |
acl | ARM Compute Library |
armnn | ARM NN |
tvm | Apache TVM |
rknpu | Rockchip NPU |
cann | Huawei CANN |
webgpu | WebGPU |
azure | Azure |
nvidia | Convenience: cuda + tensorrt |
amd | Convenience: rocm + migraphx |
intel | Convenience: openvino + onednn |
mobile | Convenience: nnapi + coreml + qnn |
all | Convenience: annotate + visualize + video |
§License
This project is dual-licensed under AGPL-3.0 for open-source use or Ultralytics Enterprise License for commercial applications.
Re-exports§
pub use device::Device;pub use error::InferenceError;pub use error::Result;pub use inference::InferenceConfig;pub use model::YOLOModel;pub use results::Boxes;pub use results::Keypoints;pub use results::Masks;pub use results::Obb;pub use results::Probs;pub use results::Results;pub use results::SemanticMask;pub use results::Speed;pub use source::Source;pub use source::SourceIterator;pub use source::SourceMeta;pub use task::Task;pub use metadata::ModelMetadata;pub use preprocessing::PreprocessResult;pub use preprocessing::preprocess_image;pub use preprocessing::preprocess_image_with_precision;
Modules§
- annotate
annotate - Image annotation utilities.
- batch
- Batch processing module for YOLO inference.
- cli
- CLI module for running inference.
- cuda_
guide - CUDA /
TensorRTacceleration guide rendered fromdocs/CUDA.md. - device
- Hardware device support and abstraction.
- download
- Model downloading utilities.
- error
- Error types for the inference library.
- inference
- Inference configuration and common types.
- io
- I/O utilities for saving results including video encoding.
- logging
- Logging utilities.
- metadata
- ONNX model metadata parsing.
- model
- YOLO model loading and inference.
- postprocessing
- Post-processing for YOLO model outputs.
- preprocessing
- Image preprocessing for YOLO inference.
- results
- Results classes for YOLO inference output.
- source
- Input source handling for YOLO inference.
- task
- Task definitions for YOLO models.
- utils
- Utility functions for the inference library
- visualizer
- Visualization tools for inference results.
Macros§
- error
- Macro for error messages.
- info
- Macro for standard info messages.
- section
- Macro for section headers.
- success
- Macro for success messages.
- verbose
- Macro for verbose messages.
- warn
- Macro for warning messages.
Constants§
- DISPLAY_
NAME - Application display name.
- NAME
- Library name.
- VERSION
- Library version.