usls 0.1.11

A Rust library integrated with ONNXRuntime, providing a collection of ML models.
Documentation
<h2 align="center">usls</h2>
<p align="center">
<a href="https://github.com/jamjamjon/usls/actions/workflows/rust-ci.yml">
        <img src="https://github.com/jamjamjon/usls/actions/workflows/rust-ci.yml/badge.svg" alt="Rust CI">
    </a>
    <a href='https://crates.io/crates/usls'>
        <img src='https://img.shields.io/crates/v/usls?logo=rust&logoColor=white' alt='Crates.io Version'>
    </a>
    <a href='https://github.com/microsoft/onnxruntime/releases'>
        <img src='https://img.shields.io/badge/onnxruntime-%3E%3D%201.22.0-3399FF?logo=onnx&logoColor=white' alt='ONNXRuntime MSRV'>
    </a>
    <a href='https://crates.io/crates/usls'>
        <img src='https://img.shields.io/crates/msrv/usls?color=yellow&logo=rust' alt='Rust MSRV'>
    </a>
</p>

**usls** is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

## 📚 Documentation
- [API Documentation]https://docs.rs/usls/latest/usls/
- [Examples]./examples


## 🚀 Quick Start

Run the **YOLO demo** to explore various YOLO-Series models with different tasks, precision, and execution providers:

- **Tasks**: `detect`, `segment`, `pose`, `classify`, `obb`
- **Versions**: `YOLOv5`, `YOLOv6`, `YOLOv7`, `YOLOv8`, `YOLOv9`, `YOLOv10`, `YOLO11`, `YOLOv12`, `YOLOv13`
- **Scales**: `n`, `s`, `m`, `l`, `x`
- **Precision**: `fp32`, `fp16`, `q8`, `q4`, `q4f16`, `bnb4`
- **Execution Providers**: `CPU`, `CUDA`, `TensorRT`, `CoreML`, `OpenVINO`, and more

```bash
# CPU: Object detection, YOLOv8n, FP16
cargo run -r --example yolo -- --task detect --ver 8 --scale n --dtype fp16

# NVIDIA CUDA: Instance segmentation, YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0

# NVIDIA TensorRT
cargo run -r -F tensorrt --example yolo -- --device tensorrt:0

# Apple Silicon CoreML
cargo run -r -F coreml --example yolo -- --device coreml

# Intel OpenVINO: CPU/GPU/VPU acceleration
cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino:CPU

# Show all available options
cargo run -r --example yolo -- --help
```

See [YOLO Examples](./examples/yolo/README.md) for more details and use cases.


## ⚙️ Installation
Add the following to your `Cargo.toml`:

```toml
[dependencies]
# Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls", features = [ "cuda" ] }

# Alternative: Use crates.io version
usls = { version = "latest-version", features = [ "cuda" ] }
```

## 📦 Cargo Features

> ❕ Features in ***italics*** are enabled by default.

- ### Runtime & Utilities
  - ***`ort-download-binaries`***: Auto-download ONNX Runtime binaries from [pyke]https://ort.pyke.io/perf/execution-providers.
  - **`ort-load-dynamic`**: Linking ONNX Runtime by your self. Use this if `pyke` doesn't provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See [Linking Guide]https://ort.pyke.io/setup/linking#static-linking for more details.
  - **`viewer`**: Image/video visualization ([minifb]https://github.com/emoon/rust_minifb). Similar to OpenCV `imshow()`. See [example]./examples/imshow.rs.
  - **`video`**: Video I/O support ([video-rs]https://github.com/oddity-ai/video-rs). Enable this to read/write video streams. See [example]./examples/read_video.rs
  - **`hf-hub`**: Hugging Face Hub support for downloading models from Hugging Face repositories.
  - **`tokenizers`**: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).
  - **`slsl`**: SLSL tensor library support. Automatically enabled when using `yolo` or `clip` features.

- ### Execution Providers
  Hardware acceleration for inference. 

  - **`cuda`**, **`tensorrt`**: NVIDIA GPU acceleration
  - **`coreml`**: Apple Silicon acceleration
  - **`openvino`**: Intel CPU/GPU/VPU acceleration
  - **`onednn`**, **`directml`**, **`xnnpack`**, **`rocm`**, **`cann`**, **`rknpu`**, **`acl`**, **`nnapi`**, **`armnn`**, **`tvm`**, **`qnn`**, **`migraphx`**, **`vitis`**, **`azure`**: Various hardware/platform support

  See [ONNX Runtime docs]https://onnxruntime.ai/docs/execution-providers/ and [ORT performance guide]https://ort.pyke.io/perf/execution-providers for details.

- ### Model Selection
  Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.

  - *`yolo`*, `sam`, `clip`, `image-classifier`, `dino`, `rtmpose`, `rtdetr`, `db`, ...
  - **All models**: `all-models` (enables all model features)

  See [Supported Models]#-supported-models for the complete list with feature names.


## ⚡ Supported Models


<details>
<summary><b> 👀 View all models (Click to expand)</b></summary>

| Model | Task / Description | Feature | Example |
| ----- | ----------------- | ------- | ------- |
| [BEiT]https://github.com/microsoft/unilm/tree/master/beit | Image Classification | `image-classifier` | [demo]examples/beit |
| [ConvNeXt]https://github.com/facebookresearch/ConvNeXt | Image Classification | `image-classifier` | [demo]examples/convnext |
| [FastViT]https://github.com/apple/ml-fastvit | Image Classification | `image-classifier` | [demo]examples/fastvit |
| [MobileOne]https://github.com/apple/ml-mobileone | Image Classification | `image-classifier` | [demo]examples/mobileone |
| [DeiT]https://github.com/facebookresearch/deit | Image Classification | `image-classifier` | [demo]examples/deit |
| [DINOv2]https://github.com/facebookresearch/dinov2 | Vision Embedding | `dino` | [demo]examples/dinov2 |
| [DINOv3]https://github.com/facebookresearch/dinov3 | Vision Embedding | `dino` | [demo]examples/dinov3 |
| [YOLOv5]https://github.com/ultralytics/yolov5 | Image Classification<br />Object Detection<br />Instance Segmentation | `yolo` | [demo]examples/yolo |
| [YOLOv6]https://github.com/meituan/YOLOv6 | Object Detection | `yolo` | [demo]examples/yolo |
| [YOLOv7]https://github.com/WongKinYiu/yolov7 | Object Detection | `yolo` | [demo]examples/yolo |
| [YOLOv8<br />YOLO11]https://github.com/ultralytics/ultralytics | Object Detection<br />Instance Segmentation<br />Image Classification<br />Oriented Object Detection<br />Keypoint Detection | `yolo` | [demo]examples/yolo |
| [YOLOv9]https://github.com/WongKinYiu/yolov9 | Object Detection | `yolo` | [demo]examples/yolo |
| [YOLOv10]https://github.com/THU-MIG/yolov10 | Object Detection | `yolo` | [demo]examples/yolo |
| [YOLOv12]https://github.com/sunsmarterjie/yolov12 | Image Classification<br />Object Detection<br />Instance Segmentation | `yolo` | [demo]examples/yolo |
| [YOLOv13]https://github.com/iMoonLab/yolov13 | Object Detection | `yolo` | [demo]examples/yolo |
| [RT-DETRv1, v2]https://github.com/lyuwenyu/RT-DETR | Object Detection | `rtdetr` | [demo]examples/rtdetr |
| [RT-DETRv4]https://github.com/RT-DETRs/RT-DETRv4 | Object Detection | `rtdetr` | [demo]examples/rtdetr |
| [RF-DETR]https://github.com/roboflow/rf-detr | Object Detection | `rfdetr` | [demo]examples/rfdetr |
| [PP-PicoDet]https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.8/configs/picodet | Object Detection | `picodet` | [demo]examples/picodet-layout |
| [DocLayout-YOLO]https://github.com/opendatalab/DocLayout-YOLO | Object Detection | `picodet` | [demo]examples/picodet-layout |
| [D-FINE]https://github.com/manhbd-22022602/D-FINE | Object Detection | `rtdetr` | [demo]examples/d-fine |
| [DEIM]https://github.com/ShihuaHuang95/DEIM | Object Detection | `rtdetr` | [demo]examples/deim |
| [DEIMv2]https://github.com/Intellindust-AI-Lab/DEIMv2 | Object Detection | `rtdetr` | [demo]examples/deimv2 |
| [RTMPose]https://github.com/open-mmlab/mmpose/tree/dev-1.x/projects/rtmpose | Keypoint Detection | `rtmpose` | [demo]examples/rtmpose |
| [DWPose]https://github.com/IDEA-Research/DWPose | Keypoint Detection | `rtmpose` | [demo]examples/dwpose |
| [RTMW]https://arxiv.org/abs/2407.08634 | Keypoint Detection | `rtmpose` | [demo]examples/rtmw |
| [RTMO]https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo | Keypoint Detection | `rtmo` | [demo]examples/rtmo |
| [SAM]https://github.com/facebookresearch/segment-anything | Segment Anything | `sam` | [demo]examples/sam |
| [SAM2]https://github.com/facebookresearch/segment-anything-2 | Segment Anything | `sam2` | [demo]examples/sam2 |
| [SAM3]https://github.com/facebookresearch/segment-anything-3 | Segment Anything | `sam3` | [demo]examples/sam3 |
| [MobileSAM]https://github.com/ChaoningZhang/MobileSAM | Segment Anything | `sam` | [demo]examples/sam |
| [EdgeSAM]https://github.com/chongzhou96/EdgeSAM | Segment Anything | `sam` | [demo]examples/sam |
| [SAM-HQ]https://github.com/SysCV/sam-hq | Segment Anything | `sam` | [demo]examples/sam |
| [FastSAM]https://github.com/CASIA-IVA-Lab/FastSAM | Instance Segmentation | `yolo` | [demo]examples/yolo |
| [YOLO-World]https://github.com/AILab-CVC/YOLO-World | Open-Set Detection With Language | `yolo` | [demo]examples/yolo |
| [YOLOE]https://github.com/THU-MIG/yoloe | Open-Set Detection And Segmentation | `yoloe` | [demo-prompt-free]examples/yoloe-prompt-free<br />[demo-prompt(visual & textual)]examples/yoloe-prompt |
| [GroundingDINO]https://github.com/IDEA-Research/GroundingDINO | Open-Set Detection With Language | `grounding-dino` | [demo]examples/grounding-dino |
| [MM-GDINO]https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/README.md | Open-Set Detection With Language | `grounding-dino` | [demo]examples/grounding-dino |
| [LLMDet]https://github.com/iSEE-Laboratory/LLMDet | Open-Set Detection With Language | `grounding-dino` | [demo]examples/grounding-dino |
| [CLIP]https://github.com/openai/CLIP | Vision-Language Embedding | `clip` | [demo]examples/clip |
| [jina-clip-v1]https://huggingface.co/jinaai/jina-clip-v1 | Vision-Language Embedding | `clip` | [demo]examples/clip |
| [jina-clip-v2]https://huggingface.co/jinaai/jina-clip-v2 | Vision-Language Embedding | `clip` | [demo]examples/clip |
| [mobileclip & mobileclip2]https://github.com/apple/ml-mobileclip | Vision-Language Embedding | `clip` | [demo]examples/clip |
| [BLIP]https://github.com/salesforce/BLIP | Image Captioning | `blip` | [demo]examples/blip |
| [DB(PaddleOCR-Det)]https://arxiv.org/abs/1911.08947 | Text Detection | `db` | [demo]examples/db |
| [FAST]https://github.com/czczup/FAST | Text Detection | `db` | [demo]examples/fast |
| [LinkNet]https://arxiv.org/abs/1707.03718 | Text Detection | `db` | [demo]examples/linknet |
| [SVTR(PaddleOCR-Rec)]https://arxiv.org/abs/2205.00159 | Text Recognition | `svtr` | [demo]examples/svtr |
| [SLANet]https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/table_recognition/algorithm_table_slanet.html | Tabel Recognition | `slanet` | [demo]examples/slanet |
| [TrOCR]https://huggingface.co/microsoft/trocr-base-printed | Text Recognition | `trocr` | [demo]examples/trocr |
| [YOLOPv2]https://arxiv.org/abs/2208.11434 | Panoptic Driving Perception | `yolop` | [demo]examples/yolop |
| [DepthAnything v1<br />DepthAnything v2]https://github.com/LiheYoung/Depth-Anything | Monocular Depth Estimation | `depth-anything` | [demo]examples/depth-anything |
| [DepthPro]https://github.com/apple/ml-depth-pro | Monocular Depth Estimation | `depth-pro` | [demo]examples/depth-pro |
| [MODNet]https://github.com/ZHKKKe/MODNet | Image Matting | `modnet` | [demo]examples/modnet |
| [Sapiens]https://github.com/facebookresearch/sapiens/tree/main | Foundation for Human Vision Models | `sapiens` | [demo]examples/sapiens |
| [Florence2]https://arxiv.org/abs/2311.06242 | A Variety of Vision Tasks | `florence2` | [demo]examples/florence2 |
| [Moondream2]https://github.com/vikhyat/moondream/tree/main | Open-Set Object Detection<br />Open-Set Keypoints Detection<br />Image Caption<br />Visual Question Answering | `moondream2` | [demo]examples/moondream2 |
| [OWLv2]https://huggingface.co/google/owlv2-base-patch16-ensemble | Open-Set Object Detection | `owl` | [demo]examples/owlv2 |
| [SmolVLM(256M, 500M)]https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct | Visual Question Answering | `smolvlm` | [demo]examples/smolvlm |
| [FastVLM(0.5B)]https://github.com/apple/ml-fastvlm | Vision Language Models | `fastvlm` | [demo]examples/fastvlm |
| [RMBG(1.4, 2.0)]https://huggingface.co/briaai/RMBG-2.0 | Image Segmentation<br />Background Removal | `rmbg` | [demo]examples/rmbg |
| [BEN2]https://huggingface.co/PramaLLC/BEN2 | Image Segmentation<br />Background Removal | `ben2` | [demo]examples/rmbg |
| [MediaPipe: Selfie-segmentation]https://ai.google.dev/edge/mediapipe/solutions/vision/image_segmenter | Image Segmentation | `mediapipe-segmenter` | [demo]examples/mediapipe-selfie-segmentation |
| [Swin2SR]https://github.com/mv-lab/swin2sr | Image Super-Resolution and Restoration | `swin2sr` | [demo]examples/swin2sr |
| [APISR]https://github.com/Kiteretsu77/APISR | Real-World Anime Super-Resolution | `apisr` | [demo]examples/apisr |
| [RAM & RAM++]https://github.com/xinyu1205/recognize-anything | Image Tagging | `ram` | [demo]examples/ram |

</details>


## ❓ FAQ
See [issues](https://github.com/jamjamjon/usls/issues) or open a new discussion.

## 🤝 Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.

## 🙏 Acknowledgments

This project is built on top of [ort (ONNX Runtime for Rust)](https://github.com/pykeio/ort), which provides seamless Rust bindings for [ONNX Runtime](https://github.com/microsoft/onnxruntime). Special thanks to the `ort` maintainers.

Thanks to all the open-source libraries and their maintainers that make this project possible. See [Cargo.toml](Cargo.toml) for a complete list of dependencies.

## 📜 License

This project is licensed under [LICENSE](LICENSE).