Expand description
usls is a Rust library integrated with ONNXRuntime that provides a collection of state-of-the-art models for Computer Vision and Vision-Language tasks, including:
- YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10
- SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
- Vision Models: RTDETR, RTMO, DB, SVTR, Depth-Anything-v1-v2, DINOv2, MODNet, Sapiens
- Vision-Language Models: CLIP, BLIP, GroundingDINO, YOLO-World, Florence2
§Examples
Refer to All Runnable Demos
§Quick Start
The following demo shows how to build a YOLO
with Options
, load image(s)
, video
and stream
with DataLoader
, and annotate the model’s inference results with Annotator
.
use usls::{models::YOLO, Annotator, DataLoader, Options, Vision, YOLOTask, YOLOVersion};
fn main() -> anyhow::Result<()> {
// Build model with Options
let options = Options::new()
.with_trt(0)
.with_model("yolo/v8-m-dyn.onnx")?
.with_yolo_version(YOLOVersion::V8) // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
.with_yolo_task(YOLOTask::Detect) // YOLOTask: Classify, Detect, Pose, Segment, Obb
.with_i00((1, 1, 4).into())
.with_i02((0, 640, 640).into())
.with_i03((0, 640, 640).into())
.with_confs(&[0.2]);
let mut model = YOLO::new(options)?;
// Build DataLoader to load image(s), video, stream
let dl = DataLoader::new(
"./assets/bus.jpg", // local image
// "images/bus.jpg", // remote image
// "../set-negs", // local images (from folder)
// "../hall.mp4", // local video
// "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4", // remote video
// "rtsp://admin:kkasd1234@192.168.2.217:554/h264/ch1/", // stream
)?
.with_batch(3) // iterate with batch_size = 3
.build()?;
// Build annotator
let annotator = Annotator::new().with_saveout("YOLO-Demo");
// Run and Annotate images
for (xs, _) in dl {
let ys = model.forward(&xs, false)?;
annotator.annotate(&xs, &ys);
}
Ok(())
}
§What’s More
This guide covers the process of using provided models for inference, including how to build a model, load data, annotate results, and retrieve the outputs. Click the sections below to expand for detailed instructions.
Build the Model
To build a model, you can use the provided models with Options:
use usls::{models::YOLO, Annotator, DataLoader, Options, Vision};
let options = Options::default()
.with_yolo_version(YOLOVersion::V8) // YOLOVersion: V5, V6, V7, V8, V9, V10, RTDETR
.with_yolo_task(YOLOTask::Detect) // YOLOTask: Classify, Detect, Pose, Segment, Obb
.with_model("xxxx.onnx")?;
let mut model = YOLO::new(options)?;
And there’re many options provided by Options
- Choose Execution Provider:
SelectCUDA
(default),TensorRT
, orCoreML
:
let options = Options::default()
.with_cuda(0)
// .with_trt(0)
// .with_coreml(0)
// .with_cpu();
- Dynamic Input Shapes:
Specify dynamic shapes with MinOptMax:
let options = Options::default()
.with_i00((1, 2, 4).into()) // batch(min=1, opt=2, max=4)
.with_i02((416, 640, 800).into()) // height(min=416, opt=640, max=800)
.with_i03((416, 640, 800).into()); // width(min=416, opt=640, max=800)
- Set Confidence Thresholds:
Adjust thresholds for each category:
let options = Options::default()
.with_confs(&[0.4, 0.15]); // class_0: 0.4, others: 0.15
- Set Class Names:
Provide class names if needed:
let options = Options::default()
.with_names(&COCO_CLASS_NAMES_80);
More options are detailed in the Options documentation.
Load Images, Video and Stream
- Load a Single Image
Use DataLoader::try_read to load an image from a local file or remote source:
let x = DataLoader::try_read("./assets/bus.jpg")?; // from local
let x = DataLoader::try_read("images/bus.jpg")?; // from remote
Alternatively, use image::ImageReader directly:
let x = image::ImageReader::open("myimage.png")?.decode()?;
- Load Multiple Images, Videos, or Streams
Create a DataLoader instance for batch processing:
let dl = DataLoader::new(
"./assets/bus.jpg", // local image
// "images/bus.jpg", // remote image
// "../set-negs", // local images (from folder)
// "../hall.mp4", // local video
// "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4", // remote video
// "rtsp://admin:kkasd1234@192.168.2.217:554/h264/ch1/", // stream
)?
.with_batch(3) // iterate with batch_size = 3
.build()?;
// Iterate through the data
for (xs, _) in dl {}
- Convert Images to Video
Use DataLoader::is2v to create a video from a sequence of images:
let fps = 24;
let image_folder = "runs/YOLO-DataLoader";
let saveout = ["runs", "is2v"];
DataLoader::is2v(image_folder, &saveout, fps)?;
Annotate Inference Results
- Create an Annotator Instance
let annotator = Annotator::default();
- Set Saveout Name:
let annotator = Annotator::default()
.with_saveout("YOLOs");
- Set Bounding Box Line Width:
let annotator = Annotator::default()
.with_bboxes_thickness(4);
- Disable Mask Plotting
let annotator = Annotator::default()
.without_masks(true);
- Perform Inference and nnotate the results
for (xs, _paths) in dl {
let ys = model.run(&xs)?;
annotator.annotate(&xs, &ys);
}
More options are detailed in the Annotator documentation.
Retrieve Model's Inference Results
Retrieve the inference outputs, which are saved in a Vec<Y>
:
- Get Detection Bounding Boxes
let ys = model.run(&xs)?;
for y in ys {
// bboxes
if let Some(bboxes) = y.bboxes() {
for bbox in bboxes {
println!(
"Bbox: {}, {}, {}, {}, {}, {}",
bbox.xmin(),
bbox.ymin(),
bbox.xmax(),
bbox.ymax(),
bbox.confidence(),
bbox.id(),
);
}
}
}
Custom Model Implementation
You can also implement your own model using OrtEngine and Options. OrtEngine supports ONNX model loading, metadata parsing, dry_run, inference, and other functions, with execution providers such as CUDA, TensorRT, CoreML, etc.
For more details, refer to the Demo: Depth-Anything.
Re-exports§
pub use models::*;
Modules§
- Some colormap:
TURBO
,INFERNO
,PLASMA
,VIRIDIS
,MAGMA
,BENTCOOLWARM
,BLACKBODY
,EXTENDEDKINDLMANN
,KINDLMANN
,SMOOTHCOOLWARM
. - Some constants releated with COCO dataset:
COCO_SKELETONS_16
,COCO_KEYPOINTS_17
,COCO_CLASS_NAMES_80
- ONNX file generated by prost-build.
- Some processing functions to image and ndarray.
Structs§
- Annotator for struct
Y
- Bounding Box 2D.
- A structure designed to load and manage image, video, or stream data. It handles local file paths, remote URLs, and live streams, supporting both batch processing and optional progress bar display. The structure also supports video decoding through
video_rs
for video and stream data. - Dynamic Confidences
- Embedding for image or text.
- Manages interactions with a GitHub repository’s releases
- A struct for input composed of the i-th input, the ii-th dimension, and the value.
- Keypoint 2D.
- Logits Sampler
- Mask: Gray Image.
- Minimum Bounding Rectangle.
- A value composed of Min-Opt-Max
- Options for building models
- ONNXRuntime Backend
- A struct for tensor attrs composed of the names, the dtypes, and the dimensions.
- Polygon.
- Probabilities for classification.
- This is a wrapper around a tokenizer to ensure that tokens can be returned to the user in a streaming way rather than having to wait for the full decoding.
- Model input, wrapper over
Array<f32, IxDyn>
- Container for inference results for each image.
Enums§
- Represents various directories on the system, including Home, Cache, Config, and more.
- Key is used by the get key functions to check if some keys on the keyboard has been pressed