usls 0.1.0-beta.2

A Rust library integrated with ONNXRuntime, providing a collection of ML models.
Documentation

usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.

  • SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
  • Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
  • Easy Data Handling: Easily read images, video streams, and folders with iterator support.
  • Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
  • Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's imshow().

🧩 Supported Models

Model Task / Description Example CoreML CUDAFP32 CUDAFP16 TensorRTFP32 TensorRTFP16
BEiT Image Classification demo βœ… βœ… βœ…
ConvNeXt Image Classification demo βœ… βœ… βœ…
FastViT Image Classification demo βœ… βœ… βœ…
MobileOne Image Classification demo βœ… βœ… βœ…
DeiT Image Classification demo βœ… βœ… βœ…
DINOv2 VisionΒ Embedding demo βœ… βœ… βœ… βœ… βœ…
YOLOv5 Image ClassificationObject DetectionInstance Segmentation demo βœ… βœ… βœ… βœ… βœ…
YOLOv6 Object Detection demo βœ… βœ… βœ… βœ… βœ…
YOLOv7 Object Detection demo βœ… βœ… βœ… βœ… βœ…
YOLOv8YOLO11 Object DetectionInstance SegmentationImage ClassificationOriented Object DetectionKeypoint Detection demo βœ… βœ… βœ… βœ… βœ…
YOLOv9 Object Detection demo βœ… βœ… βœ… βœ… βœ…
YOLOv10 Object Detection demo βœ… βœ… βœ… βœ… βœ…
YOLOv12 Object Detection demo βœ… βœ… βœ… βœ… βœ…
RT-DETR Object Detection demo βœ… βœ… βœ…
RF-DETR Object Detection demo βœ… βœ… βœ…
PP-PicoDet Object Detection demo βœ… βœ… βœ…
DocLayout-YOLO Object Detection demo βœ… βœ… βœ…
D-FINE Object Detection demo βœ… βœ… βœ…
DEIM Object Detection demo βœ… βœ… βœ…
RTMO Keypoint Detection demo βœ… βœ… βœ… ❌ ❌
SAM Segment Anything demo βœ… βœ… βœ…
SAM2 Segment Anything demo βœ… βœ… βœ…
MobileSAM Segment Anything demo βœ… βœ… βœ…
EdgeSAM Segment Anything demo βœ… βœ… βœ…
SAM-HQ Segment Anything demo βœ… βœ… βœ…
FastSAM Instance Segmentation demo βœ… βœ… βœ… βœ… βœ…
YOLO-World Open-Set Detection With Language demo βœ… βœ… βœ… βœ… βœ…
GroundingDINO Open-Set Detection With Language demo βœ… βœ… βœ…
CLIP Vision-Language Embedding demo βœ… βœ… βœ… ❌ ❌
jina-clip-v1 Vision-Language Embedding demo βœ… βœ… βœ… ❌ ❌
BLIP Image Captioning demo βœ… βœ… βœ… ❌ ❌
DB(PaddleOCR-Det) Text Detection demo βœ… βœ… βœ… βœ… βœ…
FAST Text Detection demo βœ… βœ… βœ… βœ… βœ…
LinkNet Text Detection demo βœ… βœ… βœ… βœ… βœ…
SVTR(PaddleOCR-Rec) Text Recognition demo βœ… βœ… βœ… βœ… βœ…
SLANet Tabel Recognition demo βœ… βœ… βœ…
TrOCR Text Recognition demo βœ… βœ… βœ…
YOLOPv2 Panoptic Driving Perception demo βœ… βœ… βœ… βœ… βœ…
DepthAnything v1DepthAnything v2 Monocular Depth Estimation demo βœ… βœ… βœ… ❌ ❌
DepthPro Monocular Depth Estimation demo βœ… βœ… βœ…
MODNet Image Matting demo βœ… βœ… βœ… βœ… βœ…
Sapiens Foundation for Human Vision Models demo βœ… βœ… βœ…
Florence2 a Variety of Vision Tasks demo βœ… βœ… βœ…
Moondream2 Open-Set Object DetectionOpen-Set Keypoints DetectionImageΒ CaptionVisual Question Answering demo βœ… βœ… βœ…
OWLv2 Open-Set Object Detection demo βœ… βœ… βœ…
SmolVLM(256M, 500M) Visual Question Answering demo βœ… βœ… βœ…
RMBG(1.4, 2.0) Image SegmentationBackground Removal demo βœ… βœ… βœ…
BEN2 Image SegmentationBackground Removal demo βœ… βœ… βœ…

πŸ› οΈ Installation

To get started, you'll need:

1. Protocol Buffers Compiler (protoc)

Required for building the project. Official installation guide

# Linux (apt)
sudo apt install -y protobuf-compiler

# macOS (Homebrew)
brew install protobuf

# Windows (Winget)
winget install protobuf

# Verify installation
protoc --version  # Should be 3.x or higher

2. Rust Toolchain

# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

3. Add usls to Your Project

Add the following to your Cargo.toml:

[dependencies]
# Recommended: Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls" }

# Alternative: Use crates.io version
usls = "latest-version"

Note: The GitHub version is recommended as it contains the latest updates.

⚑ Cargo Features

  • ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:

    • ort-download-binaries (default): Automatically downloads prebuilt ONNXRuntime binaries for supported platforms. Provides core model loading and inference capabilities using the CPU execution provider.

    • ort-load-dynamic Dynamic linking. You'll need to compile ONNXRuntime from source or download a precompiled package, and then link it manually. See the guide here.

    • cuda: Enables the NVIDIA CUDA provider. Requires CUDA toolkit and cuDNN installed.

    • trt: Enables the NVIDIA TensorRT provider. Requires TensorRT libraries installed.

    • mps: Enables the Apple CoreML provider for macOS.

  • If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:

    usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
    
    • video : Enable video stream reading, and video writing.(Note: Powered by video-rs and minifb. Check their repositories for potential issues.)

✨ Example

  • Model Inference

    cargo run -r --example yolo   # CPU
    cargo run -r -F cuda --example yolo -- --device cuda:0  # GPU
    
  • Reading Images

    // Read a single image
    let image = DataLoader::try_read_one("./assets/bus.jpg")?;
    
    // Read multiple images
    let images = DataLoader::try_read_n(&["./assets/bus.jpg", "./assets/cat.png"])?;
    
    // Read all images in a folder
    let images = DataLoader::try_read_folder("./assets")?;
    
    // Read images matching a pattern (glob)
    let images = DataLoader::try_read_pattern("./assets/*.Jpg")?;
    
    // Load images and iterate
    let dl = DataLoader::new("./assets")?.with_batch(2).build()?;
    for images in dl.iter() {
        // Code here
    }
    
  • Reading Video

    let dl = DataLoader::new("http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4")?
        .with_batch(1)
        .with_nf_skip(2)
        .with_progress_bar(true)
        .build()?;
    for images in dl.iter() {
        // Code here
    }
    
  • Annotate

    let annotator = Annotator::default();
    let image = DataLoader::try_read_one("./assets/bus.jpg")?;
    // hbb
    let hbb = Hbb::default()
            .with_xyxy(669.5233, 395.4491, 809.0367, 878.81226)
            .with_id(0)
            .with_name("person")
            .with_confidence(0.87094545);
    let _ = annotator.annotate(&image, &hbb)?;
    
    // keypoints
    let keypoints: Vec<Keypoint> = vec![
        Keypoint::default()
            .with_xy(139.35767, 443.43655)
            .with_id(0)
            .with_name("nose")
            .with_confidence(0.9739332),
        Keypoint::default()
            .with_xy(147.38545, 434.34055)
            .with_id(1)
            .with_name("left_eye")
            .with_confidence(0.9098319),
        Keypoint::default()
            .with_xy(128.5701, 434.07516)
            .with_id(2)
            .with_name("right_eye")
            .with_confidence(0.9320564),
    ];
    let _ = annotator.annotate(&image, &keypoints)?;
    
  • Visualizing Inference Results and Exporting Video

    let dl = DataLoader::new(args.source.as_str())?.build()?;
    let mut viewer = Viewer::default().with_window_scale(0.5);
    
    for images in &dl {
        // Check if the window exists and is open
        if viewer.is_window_exist() && !viewer.is_window_open() {
            break;
        }
    
        // Show image in window
        viewer.imshow(&images[0])?;
    
        // Handle key events and delay
        if let Some(key) = viewer.wait_key(1) {
            if key == usls::Key::Escape {
                break;
            }
        }
    
        // Your custom code here
    
        // Write video frame (requires video feature)
        // if args.save_video {
        //     viewer.write_video_frame(&images[0])?;
        // }
    }
    

All examples are located in the examples directory.

❓ FAQ

See issues or open a new discussion.

🀝 Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.

πŸ“œ License

This project is licensed under LICENSE.