Crate ultralytics_inference

Expand description

§Ultralytics YOLO Inference Library

High-performance YOLO model inference library written in Rust, providing a safe and efficient interface for running Ultralytics YOLO models on images, videos, and streams.

§Features

High Performance - Pure Rust with zero-cost abstractions and SIMD-optimized preprocessing
ONNX Runtime - Leverages ONNX Runtime for cross-platform hardware acceleration
Supported YOLO Versions - YOLO26, YOLO11, and YOLOv8 (including YOLO26 end-to-end NMS-free exports)
All Tasks - Detection, segmentation, pose estimation, classification, OBB, and semantic segmentation (YOLO26 only)
Ultralytics API - Results API matches the Python package for easy migration
Multiple Backends - CPU, CUDA, TensorRT, CoreML, OpenVINO, and more
Multiple Sources - Images, directories, glob patterns, video, webcam, streams

§Installation

Add to your Cargo.toml:

[dependencies]
ultralytics-inference = "0.0.18"

Or install the CLI tool:

cargo install ultralytics-inference

§Quick Start (Library)

use ultralytics_inference::{YOLOModel, InferenceConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load model - metadata (classes, task, imgsz) is read automatically
    let mut model = YOLOModel::load("yolo26n.onnx")?;

    // Run inference
    let results = model.predict("image.jpg")?;

    // Process results
    for result in &results {
        if let Some(ref boxes) = result.boxes {
            println!("Found {} detections", boxes.len());
            for i in 0..boxes.len() {
                let cls = boxes.cls()[i] as usize;
                let conf = boxes.conf()[i];
                let name = result.names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
                println!("  {} {:.2}", name, conf);
            }
        }
    }

    Ok(())
}

§CLI Usage

The ultralytics-inference CLI provides a command-line interface for running YOLO inference:

# Install the CLI
cargo install ultralytics-inference

# Run with defaults (auto-downloads model and sample images)
ultralytics-inference predict

# Select task: auto-downloads the matching nano model
ultralytics-inference predict --task segment
ultralytics-inference predict --task pose
ultralytics-inference predict --task obb
ultralytics-inference predict --task classify

# Run on a specific image
ultralytics-inference predict --model yolo26n.onnx --source image.jpg

# Run on a directory of images
ultralytics-inference predict --model yolo26n.onnx --source images/

# With custom thresholds
ultralytics-inference predict -m yolo26n.onnx -s image.jpg --conf 0.5 --iou 0.7

# Filter by class IDs
ultralytics-inference predict --source image.jpg --classes "0,1,2"

# With visualization window
ultralytics-inference predict --model yolo26n.onnx --source video.mp4 --show

# Save annotated results
ultralytics-inference predict --model yolo26n.onnx --source image.jpg --save

# Save individual frames for video input
ultralytics-inference predict --source video.mp4 --save-frames

# Show help
ultralytics-inference help

# Show version
ultralytics-inference version

CLI Options:

Option	Short	Description	Default
`--model`	`-m`	Path to ONNX model file; auto-downloaded if a known YOLO26/YOLO11/YOLOv8 name	`yolo26n.onnx`
`--task`		Task type (`detect`, `segment`, `pose`, `obb`, `classify`, `semantic`*); selects nano model when `--model` is omitted	`detect`
`--source`	`-s`	Input source (image, directory, glob, video, webcam index, or URL)	Task-dependent sample assets
`--conf`		Confidence threshold	`0.25`
`--iou`		`IoU` threshold for NMS	`0.7`
`--max-det`		Maximum number of detections	`300`
`--imgsz`		Inference image size	Model metadata
`--rect`		Enable rectangular inference (minimal padding)	`true`
`--batch`		Batch size for inference	`1`
`--half`		Use FP16 half-precision inference	`false`
`--save`		Save annotated results to runs/<task>/predict	`true`
`--save-frames`		Save individual frames for video input	`false`
`--save-json`		Save semantic segmentation class-map PNGs for external evaluation	`false`
`--show`		Display results in a window	`false`
`--device`		Device (cpu, cuda:0, coreml, directml:0, openvino, tensorrt:0, xnnpack)	`cpu`
`--verbose`		Show verbose output	`true`
`--classes`		Filter by class IDs, e.g. `0` or `"0,1,2"` or `"[0, 1, 2]"`	all classes

* semantic (semantic segmentation) is YOLO26-only.

§Task-Specific Examples

The library supports all YOLO tasks. Export models from Python:

# Detection (default)
yolo export model=yolo26n.pt format=onnx

# Segmentation
yolo export model=yolo26n-seg.pt format=onnx

# Pose Estimation
yolo export model=yolo26n-pose.pt format=onnx

# Classification
yolo export model=yolo26n-cls.pt format=onnx

# Oriented Bounding Boxes
yolo export model=yolo26n-obb.pt format=onnx

# Semantic Segmentation (YOLO26 only)
yolo export model=yolo26n-sem.pt format=onnx

The task is auto-detected from ONNX metadata:

use ultralytics_inference::YOLOModel;

// Detection model - returns bounding boxes
let mut model = YOLOModel::load("yolo26n.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref boxes) = results[0].boxes {
    println!("Found {} detections", boxes.len());
    for i in 0..boxes.len() {
        let cls = boxes.cls()[i] as usize;
        let conf = boxes.conf()[i];
        let name = results[0].names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
        println!("  {} {:.2}", name, conf);
    }
}

// Segmentation model - returns instance masks
let mut model = YOLOModel::load("yolo26n-seg.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref masks) = results[0].masks {
    println!("Found {} instance masks", masks.len());
}

// Pose model - returns keypoints
let mut model = YOLOModel::load("yolo26n-pose.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref keypoints) = results[0].keypoints {
    println!("Found {} poses", keypoints.len());
}

// OBB model - returns oriented bounding boxes
let mut model = YOLOModel::load("yolo26n-obb.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref obb) = results[0].obb {
    println!("Found {} oriented boxes", obb.len());
    for i in 0..obb.len() {
        let conf = obb.conf()[i];
        let cls = obb.cls()[i] as usize;
        let name = results[0].names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
        println!("  {} {:.2}", name, conf);
    }
}

// Classification model - returns probabilities
let mut model = YOLOModel::load("yolo26n-cls.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref probs) = results[0].probs {
    println!("Top-1: class {} ({:.1}%)", probs.top1(), probs.top1conf() * 100.0);
}

// Semantic segmentation model (YOLO26 only) - returns a per-pixel class map
let mut model = YOLOModel::load("yolo26n-sem.onnx")?;
let results = model.predict("image.jpg")?;
if let Some(ref sem) = results[0].semantic_mask {
    println!("Semantic mask shape: {:?}", sem.data.shape());
}

§Custom Configuration

Use the builder pattern to customize inference settings:

use ultralytics_inference::InferenceConfig;

let config = InferenceConfig::new()
    .with_confidence(0.5)    // Confidence threshold
    .with_iou(0.45)          // NMS IoU threshold
    .with_max_det(300) // Max detections per image
    .with_imgsz(640, 640);   // Input image size

§Hardware Acceleration

See the CUDA / TensorRT acceleration guide for setup, requirements, and the zero-copy GPU preprocess fast path.

Enable hardware acceleration with Cargo features:

# NVIDIA CUDA
cargo build --release --features cuda

# NVIDIA TensorRT
cargo build --release --features tensorrt

# NVIDIA GPU preprocess + zero-copy device input
# (requires CUDA toolkit; see docs/CUDA.md)
cargo build --release --features cuda-preprocess

# Apple CoreML
cargo build --release --features coreml

# Intel OpenVINO
cargo build --release --features openvino

§Results API

The Results struct provides access to inference outputs:

Boxes - Bounding boxes with xyxy(), xywh(), xyxyn(), xywhn(), conf(), cls() methods
Masks - Segmentation masks with data, orig_shape fields
Keypoints - Pose keypoints with xy(), xyn(), conf() methods
Probs - Classification probabilities with top1(), top5(), top1conf(), top5conf() methods
Obb - Oriented bounding boxes with xyxyxyxy(), xywhr(), conf(), cls() methods

§Module Overview

Module	Description
`model`	Core `YOLOModel` for loading models and running inference
`results`	Output types (`Results`, `Boxes`, `Masks`, etc.)
`inference`	`InferenceConfig` for customizing inference settings
`source`	Input source handling (`Source`, `SourceIterator`)
`task`	YOLO task types (`Task`: Detect, Segment, Pose, Classify, Obb, Semantic)
`error`	Error types (`InferenceError`, `Result`)
`preprocessing`	Image preprocessing utilities
`postprocessing`	Post-processing for all tasks (NMS/decode for detection; argmax for semantic segmentation)
`metadata`	ONNX model metadata parsing

§Feature Flags

Default features (enabled unless --no-default-features is passed): annotate, visualize.

Feature	Description
`annotate`	Image annotation for `--save` (default)
`visualize`	Real-time window display for `--show` (default)
`video`	Video file decoding/encoding (requires `FFmpeg`)
`cuda`	NVIDIA CUDA acceleration
`tensorrt`	NVIDIA `TensorRT` optimization
`coreml`	Apple `CoreML` (macOS/iOS)
`openvino`	Intel `OpenVINO`
`onednn`	Intel oneDNN
`rocm`	AMD `ROCm`
`migraphx`	AMD `MIGraphX`
`directml`	`DirectML` (Windows)
`nnapi`	Android Neural Networks API
`qnn`	Qualcomm Neural Networks
`xnnpack`	XNNPACK (cross-platform)
`acl`	ARM Compute Library
`armnn`	ARM NN
`tvm`	Apache TVM
`rknpu`	Rockchip NPU
`cann`	Huawei CANN
`webgpu`	WebGPU
`azure`	Azure
`nvidia`	Convenience: `cuda` + `tensorrt`
`amd`	Convenience: `rocm` + `migraphx`
`intel`	Convenience: `openvino` + `onednn`
`mobile`	Convenience: `nnapi` + `coreml` + `qnn`
`all`	Convenience: `annotate` + `visualize` + `video`

§License

This project is dual-licensed under AGPL-3.0 for open-source use or Ultralytics Enterprise License for commercial applications.

Re-exports§

pub use device::Device;
pub use error::InferenceError;
pub use error::Result;
pub use inference::InferenceConfig;
pub use model::YOLOModel;
pub use results::Boxes;
pub use results::Keypoints;
pub use results::Masks;
pub use results::Obb;
pub use results::Probs;
pub use results::Results;
pub use results::SemanticMask;
pub use results::Speed;
pub use source::Source;
pub use source::SourceIterator;
pub use source::SourceMeta;
pub use task::Task;
pub use metadata::ModelMetadata;
pub use preprocessing::PreprocessResult;
pub use preprocessing::preprocess_image;
pub use preprocessing::preprocess_image_with_precision;

Modules§

annotateannotate: Image annotation utilities.
batch: Batch processing module for YOLO inference.
cli: CLI module for running inference.
cuda_guide: CUDA / TensorRT acceleration guide rendered from docs/CUDA.md.
device: Hardware device support and abstraction.
download: Model downloading utilities.
error: Error types for the inference library.
inference: Inference configuration and common types.
io: I/O utilities for saving results including video encoding.
logging: Logging utilities.
metadata: ONNX model metadata parsing.
model: YOLO model loading and inference.
postprocessing: Post-processing for YOLO model outputs.
preprocessing: Image preprocessing for YOLO inference.
results: Results classes for YOLO inference output.
source: Input source handling for YOLO inference.
task: Task definitions for YOLO models.
utils: Utility functions for the inference library
visualizer: Visualization tools for inference results.

Macros§

error: Macro for error messages.
info: Macro for standard info messages.
section: Macro for section headers.
success: Macro for success messages.
verbose: Macro for verbose messages.
warn: Macro for warning messages.

Constants§

DISPLAY_NAME: Application display name.
NAME: Library name.
VERSION: Library version.