usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.
- SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
- Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
- Easy Data Handling: Easily read images, video streams, and folders with iterator support.
- Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
- Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's
imshow().
π§© Supported Models
- YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12
- SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
- Vision Models: RT-DETR, RTMO, Depth-Anything, DINOv2, MODNet, Sapiens, DepthPro, FastViT, BEiT, MobileOne
- Vision-Language Models: CLIP, jina-clip-v1-v2, BLIP, GroundingDINO, YOLO-World, Florence2, Moondream2
- OCR-Related Models: FAST, DB(PaddleOCR-Det), SVTR(PaddleOCR-Rec), SLANet, TrOCR, DocLayout-YOLO
| Model | Task / Description | Example | CoreML | CUDAFP32 | CUDAFP16 | TensorRTFP32 | TensorRTFP16 |
|---|---|---|---|---|---|---|---|
| BEiT | Image Classification | demo | β | β | β | ||
| ConvNeXt | Image Classification | demo | β | β | β | ||
| FastViT | Image Classification | demo | β | β | β | ||
| MobileOne | Image Classification | demo | β | β | β | ||
| DeiT | Image Classification | demo | β | β | β | ||
| DINOv2 | VisionΒ Embedding | demo | β | β | β | β | β |
| YOLOv5 | Image ClassificationObject DetectionInstance Segmentation | demo | β | β | β | β | β |
| YOLOv6 | Object Detection | demo | β | β | β | β | β |
| YOLOv7 | Object Detection | demo | β | β | β | β | β |
| YOLOv8YOLO11 | Object DetectionInstance SegmentationImage ClassificationOriented Object DetectionKeypoint Detection | demo | β | β | β | β | β |
| YOLOv9 | Object Detection | demo | β | β | β | β | β |
| YOLOv10 | Object Detection | demo | β | β | β | β | β |
| YOLOv12 | Object Detection | demo | β | β | β | β | β |
| RT-DETR | Object Detection | demo | β | β | β | ||
| RF-DETR | Object Detection | demo | β | β | β | ||
| PP-PicoDet | Object Detection | demo | β | β | β | ||
| DocLayout-YOLO | Object Detection | demo | β | β | β | ||
| D-FINE | Object Detection | demo | β | β | β | ||
| DEIM | Object Detection | demo | β | β | β | ||
| RTMO | Keypoint Detection | demo | β | β | β | β | β |
| SAM | Segment Anything | demo | β | β | β | ||
| SAM2 | Segment Anything | demo | β | β | β | ||
| MobileSAM | Segment Anything | demo | β | β | β | ||
| EdgeSAM | Segment Anything | demo | β | β | β | ||
| SAM-HQ | Segment Anything | demo | β | β | β | ||
| FastSAM | Instance Segmentation | demo | β | β | β | β | β |
| YOLO-World | Open-Set Detection With Language | demo | β | β | β | β | β |
| GroundingDINO | Open-Set Detection With Language | demo | β | β | β | ||
| CLIP | Vision-Language Embedding | demo | β | β | β | β | β |
| jina-clip-v1 | Vision-Language Embedding | demo | β | β | β | β | β |
| jina-clip-v2 | Vision-Language Embedding | demo | β | β | β | β | β |
| mobileclip | Vision-Language Embedding | demo | β | β | β | β | β |
| BLIP | Image Captioning | demo | β | β | β | β | β |
| DB(PaddleOCR-Det) | Text Detection | demo | β | β | β | β | β |
| FAST | Text Detection | demo | β | β | β | β | β |
| LinkNet | Text Detection | demo | β | β | β | β | β |
| SVTR(PaddleOCR-Rec) | Text Recognition | demo | β | β | β | β | β |
| SLANet | Tabel Recognition | demo | β | β | β | ||
| TrOCR | Text Recognition | demo | β | β | β | ||
| YOLOPv2 | Panoptic Driving Perception | demo | β | β | β | β | β |
| DepthAnything v1DepthAnything v2 | Monocular Depth Estimation | demo | β | β | β | β | β |
| DepthPro | Monocular Depth Estimation | demo | β | β | β | ||
| MODNet | Image Matting | demo | β | β | β | β | β |
| Sapiens | Foundation for Human Vision Models | demo | β | β | β | ||
| Florence2 | a Variety of Vision Tasks | demo | β | β | β | ||
| Moondream2 | Open-Set Object DetectionOpen-Set Keypoints DetectionImageΒ CaptionVisual Question Answering | demo | β | β | β | ||
| OWLv2 | Open-Set Object Detection | demo | β | β | β | ||
| SmolVLM(256M, 500M) | Visual Question Answering | demo | β | β | β | ||
| RMBG(1.4, 2.0) | Image SegmentationBackground Removal | demo | β | β | β | ||
| BEN2 | Image SegmentationBackground Removal | demo | β | β | β |
π οΈ Installation
To get started, you'll need:
1. Protocol Buffers Compiler (protoc)
Required for building the project. Official installation guide
# Linux (apt)
sudo apt install -y protobuf-compiler
# macOS (Homebrew)
brew install protobuf
# Windows (Winget)
winget install protobuf
# Verify installation
protoc --version # Should be 3.x or higher
2. Rust Toolchain
# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
3. Add usls to Your Project
Add the following to your Cargo.toml:
[]
# Recommended: Use GitHub version
= { = "https://github.com/jamjamjon/usls" }
# Alternative: Use crates.io version
= "latest-version"
Note: The GitHub version is recommended as it contains the latest updates.
β‘ Cargo Features
-
ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:
-
ort-download-binaries(default): Automatically downloads prebuiltONNXRuntimebinaries for supported platforms. Provides core model loading and inference capabilities using theCPUexecution provider. -
ort-load-dynamicDynamic linking. You'll need to compileONNXRuntimefrom source or download a precompiled package, and then link it manually. See the guide here. -
cuda: Enables the NVIDIACUDAprovider. RequiresCUDAtoolkit andcuDNNinstalled. -
trt: Enables the NVIDIATensorRTprovider. RequiresTensorRTlibraries installed. -
mps: Enables the AppleCoreMLprovider for macOS.
-
-
If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:
usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
β¨ Example
-
Model Inference
cargo run -r --example yolo # CPU cargo run -r -F cuda --example yolo -- --device cuda:0 # GPU -
Reading Images
// Read a single image let image = try_read_one?; // Read multiple images let images = try_read_n?; // Read all images in a folder let images = try_read_folder?; // Read images matching a pattern (glob) let images = try_read_pattern?; // Load images and iterate let dl = new?.with_batch.build?; for images in dl.iter -
Reading Video
let dl = new? .with_batch .with_nf_skip .with_progress_bar .build?; for images in dl.iter -
Annotate
let annotator = default; let image = try_read_one?; // hbb let hbb = default .with_xyxy .with_id .with_name .with_confidence; let _ = annotator.annotate?; // keypoints let keypoints: = vec!; let _ = annotator.annotate?; -
Visualizing Inference Results and Exporting Video
let dl = new?.build?; let mut viewer = default.with_window_scale; for images in &dl
All examples are located in the examples directory.
β FAQ
See issues or open a new discussion.
π€ Contributing
Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.
π License
This project is licensed under LICENSE.