Crate usls

Expand description

usls is a Rust library integrated with ONNXRuntime that provides a collection of state-of-the-art models for Computer Vision and Vision-Language tasks, including:


Refer to All Runnable Demos

§Quick Start

The following demo shows how to build a YOLO with Options, load image(s), video and stream with DataLoader, and annotate the model’s inference results with Annotator.

use usls::{models::YOLO, Annotator, DataLoader, Options, Vision, YOLOTask, YOLOVersion};

fn main() -> anyhow::Result<()> {
    // Build model with Options
    let options = Options::new()
        .with_yolo_version(YOLOVersion::V8) // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
        .with_yolo_task(YOLOTask::Detect) // YOLOTask: Classify, Detect, Pose, Segment, Obb
        .with_i00((1, 1, 4).into())
        .with_i02((0, 640, 640).into())
        .with_i03((0, 640, 640).into())
    let mut model = YOLO::new(options)?;

    // Build DataLoader to load image(s), video, stream
    let dl = DataLoader::new(
        "./assets/bus.jpg", // local image
        // "images/bus.jpg",  // remote image
        // "../set-negs",  // local images (from folder)
        // "../hall.mp4",  // local video
        // "",  // remote video
        // "rtsp://admin:kkasd1234@",  // stream
    .with_batch(3)  // iterate with batch_size = 3

    // Build annotator
    let annotator = Annotator::new().with_saveout("YOLO-Demo");

    // Run and Annotate images
    for (xs, _) in dl {
        let ys = model.forward(&xs, false)?;
        annotator.annotate(&xs, &ys);


§What’s More

This guide covers the process of using provided models for inference, including how to build a model, load data, annotate results, and retrieve the outputs. Click the sections below to expand for detailed instructions.

Build the Model

To build a model, you can use the provided models with Options:

use usls::{models::YOLO, Annotator, DataLoader, Options, Vision};

let options = Options::default()
    .with_yolo_version(YOLOVersion::V8)  // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
    .with_yolo_task(YOLOTask::Detect)    // YOLOTask: Classify, Detect, Pose, Segment, Obb
let mut model = YOLO::new(options)?;

And there’re many options provided by Options

  • Choose Execution Provider:
    Select CUDA (default), TensorRT, or CoreML:
let options = Options::default()
    // .with_trt(0)
    // .with_coreml(0)
    // .with_cpu();
  • Dynamic Input Shapes:
    Specify dynamic shapes with MinOptMax:
let options = Options::default()
    .with_i00((1, 2, 4).into()) // batch(min=1, opt=2, max=4)
    .with_i02((416, 640, 800).into()) // height(min=416, opt=640, max=800)
    .with_i03((416, 640, 800).into()); // width(min=416, opt=640, max=800)
  • Set Confidence Thresholds:
    Adjust thresholds for each category:
let options = Options::default()
    .with_confs(&[0.4, 0.15]); // class_0: 0.4, others: 0.15
  • Set Class Names:
    Provide class names if needed:
let options = Options::default()

More options are detailed in the Options documentation.

Load Images, Video and Stream
let x = DataLoader::try_read("./assets/bus.jpg")?; // from local
let x = DataLoader::try_read("images/bus.jpg")?; // from remote

Alternatively, use image::ImageReader directly:

let x = image::ImageReader::open("myimage.png")?.decode()?;
  • Load Multiple Images, Videos, or Streams
    Create a DataLoader instance for batch processing:
let dl = DataLoader::new(
    "./assets/bus.jpg", // local image
    // "images/bus.jpg",  // remote image
    // "../set-negs",  // local images (from folder)
    // "../hall.mp4",  // local video
    // "",  // remote video
    // "rtsp://admin:kkasd1234@",  // stream
.with_batch(3)  // iterate with batch_size = 3

// Iterate through the data
for (xs, _) in dl {}
  • Convert Images to Video
    Use DataLoader::is2v to create a video from a sequence of images:
let fps = 24;
let image_folder = "runs/YOLO-DataLoader";
let saveout = ["runs", "is2v"];
DataLoader::is2v(image_folder, &saveout, fps)?;
Annotate Inference Results
  • Create an Annotator Instance
let annotator = Annotator::default();
  • Set Saveout Name:
let annotator = Annotator::default()
  • Set Bounding Box Line Width:
let annotator = Annotator::default()
  • Disable Mask Plotting
let annotator = Annotator::default()
  • Perform Inference and nnotate the results
for (xs, _paths) in dl {
    let ys =;
    annotator.annotate(&xs, &ys);

More options are detailed in the Annotator documentation.

Retrieve Model's Inference Results

Retrieve the inference outputs, which are saved in a Vec<Y>:

  • Get Detection Bounding Boxes
let ys =;
for y in ys {
    // bboxes
    if let Some(bboxes) = y.bboxes() {
        for bbox in bboxes {
                "Bbox: {}, {}, {}, {}, {}, {}",
Custom Model Implementation

You can also implement your own model using OrtEngine and Options. OrtEngine supports ONNX model loading, metadata parsing, dry_run, inference, and other functions, with execution providers such as CUDA, TensorRT, CoreML, etc.

For more details, refer to the Demo: Depth-Anything.




  • Annotator for struct Y
  • Bounding Box 2D.
  • A structure designed to load and manage image, video, or stream data. It handles local file paths, remote URLs, and live streams, supporting both batch processing and optional progress bar display. The structure also supports video decoding through video_rs for video and stream data.
  • Dynamic Confidences
  • Embedding for image or text.
  • Manages interactions with a GitHub repository’s releases
  • A struct for input composed of the i-th input, the ii-th dimension, and the value.
  • Keypoint 2D.
  • Logits Sampler
  • Mask: Gray Image.
  • Minimum Bounding Rectangle.
  • A value composed of Min-Opt-Max
  • Options for building models
  • ONNXRuntime Backend
  • A struct for tensor attrs composed of the names, the dtypes, and the dimensions.
  • Polygon.
  • Probabilities for classification.
  • This is a wrapper around a tokenizer to ensure that tokens can be returned to the user in a streaming way rather than having to wait for the full decoding.
  • Model input, wrapper over Array<f32, IxDyn>
  • Container for inference results for each image.



