usls 0.1.0-beta.2

usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.

SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
Easy Data Handling: Easily read images, video streams, and folders with iterator support.
Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's imshow().

🧩 Supported Models

YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12
SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
Vision Models: RT-DETR, RTMO, Depth-Anything, DINOv2, MODNet, Sapiens, DepthPro, FastViT, BEiT, MobileOne
Vision-Language Models: CLIP, jina-clip-v1, BLIP, GroundingDINO, YOLO-World, Florence2, Moondream2
OCR-Related Models: FAST, DB(PaddleOCR-Det), SVTR(PaddleOCR-Rec), SLANet, TrOCR, DocLayout-YOLO

Model	Task / Description	Example	CoreML	CUDAFP32	CUDAFP16	TensorRTFP32	TensorRTFP16
BEiT	Image Classification	demo	✅	✅	✅
ConvNeXt	Image Classification	demo	✅	✅	✅
FastViT	Image Classification	demo	✅	✅	✅
MobileOne	Image Classification	demo	✅	✅	✅
DeiT	Image Classification	demo	✅	✅	✅
DINOv2	Vision Embedding	demo	✅	✅	✅	✅	✅
YOLOv5	Image ClassificationObject DetectionInstance Segmentation	demo	✅	✅	✅	✅	✅
YOLOv6	Object Detection	demo	✅	✅	✅	✅	✅
YOLOv7	Object Detection	demo	✅	✅	✅	✅	✅
YOLOv8YOLO11	Object DetectionInstance SegmentationImage ClassificationOriented Object DetectionKeypoint Detection	demo	✅	✅	✅	✅	✅
YOLOv9	Object Detection	demo	✅	✅	✅	✅	✅
YOLOv10	Object Detection	demo	✅	✅	✅	✅	✅
YOLOv12	Object Detection	demo	✅	✅	✅	✅	✅
RT-DETR	Object Detection	demo	✅	✅	✅
RF-DETR	Object Detection	demo	✅	✅	✅
PP-PicoDet	Object Detection	demo	✅	✅	✅
DocLayout-YOLO	Object Detection	demo	✅	✅	✅
D-FINE	Object Detection	demo	✅	✅	✅
DEIM	Object Detection	demo	✅	✅	✅
RTMO	Keypoint Detection	demo	✅	✅	✅	❌	❌
SAM	Segment Anything	demo	✅	✅	✅
SAM2	Segment Anything	demo	✅	✅	✅
MobileSAM	Segment Anything	demo	✅	✅	✅
EdgeSAM	Segment Anything	demo	✅	✅	✅
SAM-HQ	Segment Anything	demo	✅	✅	✅
FastSAM	Instance Segmentation	demo	✅	✅	✅	✅	✅
YOLO-World	Open-Set Detection With Language	demo	✅	✅	✅	✅	✅
GroundingDINO	Open-Set Detection With Language	demo	✅	✅	✅
CLIP	Vision-Language Embedding	demo	✅	✅	✅	❌	❌
jina-clip-v1	Vision-Language Embedding	demo	✅	✅	✅	❌	❌
BLIP	Image Captioning	demo	✅	✅	✅	❌	❌
DB(PaddleOCR-Det)	Text Detection	demo	✅	✅	✅	✅	✅
FAST	Text Detection	demo	✅	✅	✅	✅	✅
LinkNet	Text Detection	demo	✅	✅	✅	✅	✅
SVTR(PaddleOCR-Rec)	Text Recognition	demo	✅	✅	✅	✅	✅
SLANet	Tabel Recognition	demo	✅	✅	✅
TrOCR	Text Recognition	demo	✅	✅	✅
YOLOPv2	Panoptic Driving Perception	demo	✅	✅	✅	✅	✅
DepthAnything v1DepthAnything v2	Monocular Depth Estimation	demo	✅	✅	✅	❌	❌
DepthPro	Monocular Depth Estimation	demo	✅	✅	✅
MODNet	Image Matting	demo	✅	✅	✅	✅	✅
Sapiens	Foundation for Human Vision Models	demo	✅	✅	✅
Florence2	a Variety of Vision Tasks	demo	✅	✅	✅
Moondream2	Open-Set Object DetectionOpen-Set Keypoints DetectionImage CaptionVisual Question Answering	demo	✅	✅	✅
OWLv2	Open-Set Object Detection	demo	✅	✅	✅
SmolVLM(256M, 500M)	Visual Question Answering	demo	✅	✅	✅
RMBG(1.4, 2.0)	Image SegmentationBackground Removal	demo	✅	✅	✅
BEN2	Image SegmentationBackground Removal	demo	✅	✅	✅

🛠️ Installation

To get started, you'll need:

1. Protocol Buffers Compiler (`protoc`)

Required for building the project. Official installation guide

# Linux (apt)
sudo apt install -y protobuf-compiler

# macOS (Homebrew)
brew install protobuf

# Windows (Winget)
winget install protobuf

# Verify installation
protoc --version  # Should be 3.x or higher

2. Rust Toolchain

# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

3. Add usls to Your Project

Add the following to your Cargo.toml:

[dependencies]
# Recommended: Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls" }

# Alternative: Use crates.io version
usls = "latest-version"

Note: The GitHub version is recommended as it contains the latest updates.

⚡ Cargo Features

ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:
- ort-download-binaries (default): Automatically downloads prebuilt ONNXRuntime binaries for supported platforms. Provides core model loading and inference capabilities using the CPU execution provider.
- ort-load-dynamic Dynamic linking. You'll need to compile ONNXRuntime from source or download a precompiled package, and then link it manually. See the guide here.
- cuda: Enables the NVIDIA CUDA provider. Requires CUDA toolkit and cuDNN installed.
- trt: Enables the NVIDIA TensorRT provider. Requires TensorRT libraries installed.
- mps: Enables the Apple CoreML provider for macOS.
If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:
```
usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
```
- video : Enable video stream reading, and video writing.(Note: Powered by video-rs and minifb. Check their repositories for potential issues.)

✨ Example

Model Inference

cargo run -r --example yolo   # CPU
cargo run -r -F cuda --example yolo -- --device cuda:0  # GPU

Reading Images

// Read a single image
let image = DataLoader::try_read_one("./assets/bus.jpg")?;

// Read multiple images
let images = DataLoader::try_read_n(&["./assets/bus.jpg", "./assets/cat.png"])?;

// Read all images in a folder
let images = DataLoader::try_read_folder("./assets")?;

// Read images matching a pattern (glob)
let images = DataLoader::try_read_pattern("./assets/*.Jpg")?;

// Load images and iterate
let dl = DataLoader::new("./assets")?.with_batch(2).build()?;
for images in dl.iter() {
    // Code here
}

Reading Video

let dl = DataLoader::new("http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4")?
    .with_batch(1)
    .with_nf_skip(2)
    .with_progress_bar(true)
    .build()?;
for images in dl.iter() {
    // Code here
}

Annotate

let annotator = Annotator::default();
let image = DataLoader::try_read_one("./assets/bus.jpg")?;
// hbb
let hbb = Hbb::default()
        .with_xyxy(669.5233, 395.4491, 809.0367, 878.81226)
        .with_id(0)
        .with_name("person")
        .with_confidence(0.87094545);
let _ = annotator.annotate(&image, &hbb)?;

// keypoints
let keypoints: Vec<Keypoint> = vec![
    Keypoint::default()
        .with_xy(139.35767, 443.43655)
        .with_id(0)
        .with_name("nose")
        .with_confidence(0.9739332),
    Keypoint::default()
        .with_xy(147.38545, 434.34055)
        .with_id(1)
        .with_name("left_eye")
        .with_confidence(0.9098319),
    Keypoint::default()
        .with_xy(128.5701, 434.07516)
        .with_id(2)
        .with_name("right_eye")
        .with_confidence(0.9320564),
];
let _ = annotator.annotate(&image, &keypoints)?;

Visualizing Inference Results and Exporting Video

let dl = DataLoader::new(args.source.as_str())?.build()?;
let mut viewer = Viewer::default().with_window_scale(0.5);

for images in &dl {
    // Check if the window exists and is open
    if viewer.is_window_exist() && !viewer.is_window_open() {
        break;
    }

    // Show image in window
    viewer.imshow(&images[0])?;

    // Handle key events and delay
    if let Some(key) = viewer.wait_key(1) {
        if key == usls::Key::Escape {
            break;
        }
    }

    // Your custom code here

    // Write video frame (requires video feature)
    // if args.save_video {
    //     viewer.write_video_frame(&images[0])?;
    // }
}

All examples are located in the examples directory.

❓ FAQ

See issues or open a new discussion.

🤝 Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.

📜 License

This project is licensed under LICENSE.