Crate usls

Expand description

usls is a Rust library integrated with ONNXRuntime that provides a collection of state-of-the-art models for Computer Vision and Vision-Language tasks, including:

YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10
SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
Vision Models: RTDETR, RTMO, DB, SVTR, Depth-Anything-v1-v2, DINOv2, MODNet, Sapiens
Vision-Language Models: CLIP, BLIP, GroundingDINO, YOLO-World, Florence2

§Examples

Refer to All Runnable Demos

§Quick Start

The following demo shows how to build a YOLO with Options, load image(s), video and stream with DataLoader, and annotate the model’s inference results with Annotator.

use usls::{models::YOLO, Annotator, DataLoader, Options, Vision, YOLOTask, YOLOVersion};

fn main() -> anyhow::Result<()> {
    // Build model with Options
    let options = Options::new()
        .with_trt(0)
        .with_model("yolo/v8-m-dyn.onnx")?
        .with_yolo_version(YOLOVersion::V8) // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
        .with_yolo_task(YOLOTask::Detect) // YOLOTask: Classify, Detect, Pose, Segment, Obb
        .with_i00((1, 1, 4).into())
        .with_i02((0, 640, 640).into())
        .with_i03((0, 640, 640).into())
        .with_confs(&[0.2]);
    let mut model = YOLO::new(options)?;

    // Build DataLoader to load image(s), video, stream
    let dl = DataLoader::new(
        "./assets/bus.jpg", // local image
        // "images/bus.jpg",  // remote image
        // "../set-negs",  // local images (from folder)
        // "../hall.mp4",  // local video
        // "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4",  // remote video
        // "rtsp://admin:kkasd1234@192.168.2.217:554/h264/ch1/",  // stream
    )?
    .with_batch(3)  // iterate with batch_size = 3
    .build()?;

    // Build annotator
    let annotator = Annotator::new().with_saveout("YOLO-Demo");

    // Run and Annotate images
    for (xs, _) in dl {
        let ys = model.forward(&xs, false)?;
        annotator.annotate(&xs, &ys);
    }

    Ok(())
}

§What’s More

This guide covers the process of using provided models for inference, including how to build a model, load data, annotate results, and retrieve the outputs. Click the sections below to expand for detailed instructions.

Build the Model

To build a model, you can use the provided models with Options:

use usls::{models::YOLO, Annotator, DataLoader, Options, Vision};

let options = Options::default()
    .with_yolo_version(YOLOVersion::V8)  // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
    .with_yolo_task(YOLOTask::Detect)    // YOLOTask: Classify, Detect, Pose, Segment, Obb
    .with_model("xxxx.onnx")?;
let mut model = YOLO::new(options)?;

And there’re many options provided by Options

Choose Execution Provider:
Select CUDA (default), TensorRT, or CoreML:

let options = Options::default()
    .with_cuda(0)
    // .with_trt(0)
    // .with_coreml(0)
    // .with_cpu();

Dynamic Input Shapes:
Specify dynamic shapes with MinOptMax:

let options = Options::default()
    .with_i00((1, 2, 4).into()) // batch(min=1, opt=2, max=4)
    .with_i02((416, 640, 800).into()) // height(min=416, opt=640, max=800)
    .with_i03((416, 640, 800).into()); // width(min=416, opt=640, max=800)

Set Confidence Thresholds:
Adjust thresholds for each category:

let options = Options::default()
    .with_confs(&[0.4, 0.15]); // class_0: 0.4, others: 0.15

Set Class Names:
Provide class names if needed:

let options = Options::default()
    .with_names(&COCO_CLASS_NAMES_80);

More options are detailed in the Options documentation.

Load Images, Video and Stream

Load a Single Image
Use DataLoader::try_read to load an image from a local file or remote source:

let x = DataLoader::try_read("./assets/bus.jpg")?; // from local
let x = DataLoader::try_read("images/bus.jpg")?; // from remote

Alternatively, use image::ImageReader directly:

let x = image::ImageReader::open("myimage.png")?.decode()?;

Load Multiple Images, Videos, or Streams
Create a DataLoader instance for batch processing:

let dl = DataLoader::new(
    "./assets/bus.jpg", // local image
    // "images/bus.jpg",  // remote image
    // "../set-negs",  // local images (from folder)
    // "../hall.mp4",  // local video
    // "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4",  // remote video
    // "rtsp://admin:kkasd1234@192.168.2.217:554/h264/ch1/",  // stream
)?
.with_batch(3)  // iterate with batch_size = 3
.build()?;

// Iterate through the data
for (xs, _) in dl {}

Convert Images to Video
Use DataLoader::is2v to create a video from a sequence of images:

let fps = 24;
let image_folder = "runs/YOLO-DataLoader";
let saveout = ["runs", "is2v"];
DataLoader::is2v(image_folder, &saveout, fps)?;

Annotate Inference Results

Create an Annotator Instance

let annotator = Annotator::default();

Set Saveout Name:

let annotator = Annotator::default()
    .with_saveout("YOLOs");

Set Bounding Box Line Width:

let annotator = Annotator::default()
    .with_bboxes_thickness(4);

Disable Mask Plotting

let annotator = Annotator::default()
    .without_masks(true);

Perform Inference and nnotate the results

for (xs, _paths) in dl {
    let ys = model.run(&xs)?;
    annotator.annotate(&xs, &ys);
}

More options are detailed in the Annotator documentation.

Retrieve Model's Inference Results

Retrieve the inference outputs, which are saved in a Vec<Y>:

Get Detection Bounding Boxes

let ys = model.run(&xs)?;
for y in ys {
    // bboxes
    if let Some(bboxes) = y.bboxes() {
        for bbox in bboxes {
            println!(
                "Bbox: {}, {}, {}, {}, {}, {}",
                bbox.xmin(),
                bbox.ymin(),
                bbox.xmax(),
                bbox.ymax(),
                bbox.confidence(),
                bbox.id(),
            );
        }
    }
}

Custom Model Implementation

You can also implement your own model using OrtEngine and Options. OrtEngine supports ONNX model loading, metadata parsing, dry_run, inference, and other functions, with execution providers such as CUDA, TensorRT, CoreML, etc.

For more details, refer to the Demo: Depth-Anything.

Re-exports§

pub use models::*;

Modules§

colormap256: Some colormap: TURBO, INFERNO, PLASMA, VIRIDIS, MAGMA, BENTCOOLWARM, BLACKBODY, EXTENDEDKINDLMANN, KINDLMANN, SMOOTHCOOLWARM.
models: Models provided: Blip, Clip, YOLO, DepthAnything, …
names: Some constants releated with COCO dataset: COCO_SKELETONS_16, COCO_KEYPOINTS_17, COCO_CLASS_NAMES_80
onnx: ONNX file generated by prost-build.
ops: Some processing functions to image and ndarray.

Structs§

Annotator: Annotator for struct Y
Bbox: Bounding Box 2D.
DataLoader: A structure designed to load and manage image, video, or stream data. It handles local file paths, remote URLs, and live streams, supporting both batch processing and optional progress bar display. The structure also supports video decoding through video_rs for video and stream data.
DynConf: Dynamic Confidences
Embedding: Embedding for image or text.
Hub: Manages interactions with a GitHub repository’s releases
Iiix: A struct for input composed of the i-th input, the ii-th dimension, and the value.
Keypoint: Keypoint 2D.
LogitsSampler: Logits Sampler
Mask: Mask: Gray Image.
Mbr: Minimum Bounding Rectangle.
MinOptMax: A value composed of Min-Opt-Max
Options: Options for building models
OrtEngine: ONNXRuntime Backend
OrtTensorAttr: A struct for tensor attrs composed of the names, the dtypes, and the dimensions.
Polygon: Polygon.
Prob: Probabilities for classification.
Quantizer
TokenizerStream: This is a wrapper around a tokenizer to ensure that tokens can be returned to the user in a streaming way rather than having to wait for the full decoding.
Ts
Viewer
X: Model input, wrapper over Array<f32, IxDyn>
Xs
Y: Container for inference results for each image.

Enums§

Device
Dir: Represents various directories on the system, including Home, Cache, Config, and more.
Key: Key is used by the get key functions to check if some keys on the keyboard has been pressed
Location
MediaType
Metric
Ops
StreamType
Task

Constants§

BENTCOOLWARM
BLACKBODY
BODY_PARTS_28
COCO_CLASS_NAMES_80
COCO_KEYPOINTS_17
COCO_SKELETONS_16
EXTENDEDKINDLMANN
INFERNO
KINDLMANN
MAGMA
PLASMA
SMOOTHCOOLWARM
TURBO
VIRIDIS

Traits§

Nms
Vision

Functions§

build_progress_bar
human_bytes

Crate uslsCopy item path