Expand description
§usls
usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).
§📚 Documentation
§⚡ Cargo Features
❕ Features in italics are enabled by default.
§Runtime & Utilities
ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.ort-load-dynamic: Linking ONNX Runtime by your self. Use this ifpykedoesn’t provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.viewer: Image/video visualization (minifb). Similar to OpenCVimshow(). See example.video: Video I/O support (video-rs). Enable this to read/write video streams. See examplehf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).slsl: SLSL tensor library support. Automatically enabled when usingyoloorclipfeatures.
§Execution Providers
Hardware acceleration for inference.
cuda,tensorrt: NVIDIA GPU accelerationcoreml: Apple Silicon accelerationopenvino: Intel CPU/GPU/VPU accelerationonednn,directml,xnnpack,rocm,cann,rknpu,acl,nnapi,armnn,tvm,qnn,migraphx,vitis,azure: Various hardware/platform support
See ONNX Runtime docs and ORT performance guide for details.
§Model Selection
Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.
yolo,sam,clip,image-classifier,dino,rtmpose,rtdetr,db, …- All models:
all-models(enables all model features)
See Supported Models for the complete list with feature names.
Modules§
- core
- Core functionality for vision and vision-language model inference.
- models
- Pre-built models for various vision and vision-language tasks.
- viz
- Visualization utilities for rendering and displaying ML model results
Structs§
- Config
- Configuration for model inference including engines, processors, and task settings.
- Data
Loader - A structure designed to load and manage image, video, or stream data.
- DynConf
- Dynamic Confidences
- Engine
- ONNX Runtime inference engine with configuration and session management.
- Hardware
Config - Unified hardware configuration containing all execution provider configs.
- Hbb
- Horizontal bounding box with position, size, and metadata.
- Hub
- Manages interactions with GitHub repository releases and Hugging Face repositories
- Image
- Image wrapper with metadata and transformation capabilities.
- Instance
Meta - Metadata for detection instances including ID, confidence, and name.
- Keypoint
- Represents a keypoint in a 2D space with optional metadata.
- Logits
Sampler - Logits sampler for text generation with temperature and nucleus sampling.
- Mask
- Mask: Gray Image.
- ORTConfig
- ONNX Runtime configuration with device and optimization settings.
- Obb
- Oriented bounding box with four vertices and metadata.
- OrtEngine
- ONNX Runtime inference engine with high-performance tensor operations.
- Polygon
- Polygon with metadata.
- Prob
- Probability result with classification metadata.
- Processor
- Image and text processing pipeline with tokenization and transformation capabilities.
- Processor
Config - Configuration for image and text processing pipelines.
- Skeleton
- Skeleton structure containing keypoint connections.
- Text
- Text detection result with content and metadata.
- Version
- Version representation with major, minor, and optional patch numbers.
- X
- Tensor: wrapper over
Array<f32, IxDyn> - Xs
- Collection of named tensors with associated images and texts.
- Y
- Container for inference results for each image.
Enums§
- DType
- Data type enumeration for tensor elements.
- Device
- Device types for model execution.
- Dir
- Represents various directories on the system, including Home, Cache, Config, and more.
- Image
Tensor Layout - Image tensor layout formats for organizing image data in memory.
- Location
- Media location type indicating local or remote source.
- Media
Type - Media type classification for different content formats.
- Resize
Mode - Image resize modes for different scaling strategies.
- Scale
- Model scale variants for different model sizes.
- Task
- Task types for various vision and vision-language model inference tasks.
Constants§
- NAMES_
COCO_ 80 - COCO 80-class object categories (common split).
- NAMES_
COCO_ 91 - COCO 91-class extended categories (keeps original index mapping with gaps).
- NAMES_
COCO_ KEYPOINTS_ 17 - COCO 17 human keypoints (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles).
- NAMES_
COCO_ KEYPOINTS_ 133 - COCO-WholeBody 133 keypoints (body, face, and hands combined).
- NAMES_
DOTA_ V1_ 5_ 16 - DOTA v1.5 (16 classes, includes
container crane). - NAMES_
DOTA_ V1_ 15 - DOTA v1.0 (15 classes, excludes
container crane). - NAMES_
HAND_ KEYPOINTS_ 21 - 21 hand keypoints from wrist to fingertips (index-aligned, base→tip).
- NAMES_
YOLO_ DOCLAYOUT_ 10 - YOLO DocStructBench 10-class document layout categories.
- SKELETON_
COCO_ 19 - Defines the keypoint connections for the COCO person skeleton with 19 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
- SKELETON_
COCO_ 65 - Defines the keypoint connections for the COCO-133 person skeleton with 65 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
- SKELETON_
COLOR_ COCO_ 19 - Defines colors for visualizing each connection in the COCO person skeleton. Colors are grouped by body parts:
- SKELETON_
COLOR_ COCO_ 65 - Defines colors for visualizing each connection in the COCO-133 person skeleton. Colors are grouped by body parts:
- SKELETON_
COLOR_ HALPE_ 27 - Defines colors for visualizing each connection in the HALPE person skeleton. Colors are grouped by body parts and sides:
- SKELETON_
COLOR_ HAND_ 21 - Defines colors for visualizing each connection in the hand skeleton. Colors are grouped by fingers:
- SKELETON_
HALPE_ 27 - Defines the keypoint connections for the HALPE person skeleton with 27 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
- SKELETON_
HAND_ 21 - Defines the keypoint connections for the hand skeleton with 20 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
Statics§
- NAMES_
IMAGENET_ 1K - ImageNet-1K classification labels (1000 categories). Lazily loaded from an embedded text file to keep compile time low.
- NAMES_
OBJEC T365 - Object365 dataset labels without the leading
backgroundclass (365 categories). - NAMES_
OBJEC T365_ 366 - Object365 dataset labels including
background(366 categories). Built by prependingbackgroundtoNAMES_OBJECT365for compatibility.