Expand description
§usls
usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).
§📚 Documentation
§⚡ Cargo Features
❕ Features in italics are enabled by default.
§Runtime & Utilities
ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.ort-load-dynamic: Linking ONNX Runtime by your self. Use this ifpykedoesn’t provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.viewer: Image/video visualization (minifb). Similar to OpenCVimshow(). See example.video: Video I/O support (video-rs). Enable this to read/write video streams. See examplehf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).slsl: SLSL tensor library support. Automatically enabled when usingyoloorclipfeatures.
§Execution Providers
Hardware acceleration for inference.
cuda,tensorrt: NVIDIA GPU accelerationcoreml: Apple Silicon accelerationopenvino: Intel CPU/GPU/VPU accelerationonednn,directml,xnnpack,rocm,cann,rknpu,acl,nnapi,armnn,tvm,qnn,migraphx,vitis,azure: Various hardware/platform support
See ONNX Runtime docs and ORT performance guide for details.
§Model Selection
Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.
yolo,sam,clip,image-classifier,dino,rtmpose,rtdetr,db, …- All models:
all-models(enables all model features)
See Supported Models for the complete list with feature names.
Modules§
- core
- Core functionality for vision and vision-language model inference.
- models
- Pre-built models for various vision and vision-language tasks.
- viz
- Visualization utilities for rendering and displaying ML model results
Structs§
- Config
- Configuration for model inference including engines, processors, and task settings.
- Data
Loader - A structure designed to load and manage image, video, or stream data.
- DynConf
- Dynamic Confidences
- Engine
- ONNX Runtime inference engine with configuration and session management.
- Hardware
Config - Unified hardware configuration containing all execution provider configs.
- Hbb
- Horizontal bounding box with position, size, and metadata.
- Hub
- Manages interactions with GitHub repository releases and Hugging Face repositories
- Image
- Image wrapper with metadata and transformation capabilities.
- Instance
Meta - Metadata for detection instances including ID, confidence, and name.
- Keypoint
- Represents a keypoint in a 2D space with optional metadata.
- Logits
Sampler - Logits sampler for text generation with temperature and nucleus sampling.
- Mask
- Mask: Gray Image.
- ORTConfig
- ONNX Runtime configuration with device and optimization settings.
- Obb
- Oriented bounding box with four vertices and metadata.
- OrtEngine
- ONNX Runtime inference engine with high-performance tensor operations.
- Polygon
- Polygon with metadata.
- Prob
- Probability result with classification metadata.
- Processor
- Image and text processing pipeline with tokenization and transformation capabilities.
- Processor
Config - Configuration for image and text processing pipelines.
- Skeleton
- Skeleton structure containing keypoint connections.
- Text
- Text detection result with content and metadata.
- Version
- Version representation with major, minor, and optional patch numbers.
- X
- Tensor: wrapper over
Array<f32, IxDyn> - Xs
- Collection of named tensors with associated images and texts.
- Y
- Container for inference results for each image.
Enums§
- DType
- Data type enumeration for tensor elements.
- Device
- Device types for model execution.
- Dir
- Represents various directories on the system, including Home, Cache, Config, and more.
- Image
Tensor Layout - Image tensor layout formats for organizing image data in memory.
- Location
- Media location type indicating local or remote source.
- Media
Type - Media type classification for different content formats.
- Resize
Mode - Image resize modes for different scaling strategies.
- Scale
- Model scale variants for different model sizes.
- Task
- Task types for various vision and vision-language model inference tasks.
Constants§
- NAMES_
BODY_ PARTS_ 28 - Human body parts segmentation labels with 28 categories.
- NAMES_
COCO_ 80 - COCO dataset object detection labels with 80 categories.
- NAMES_
COCO_ 91 - Extended COCO dataset labels with 91 categories including background and unused slots.
- NAMES_
COCO_ 133 - A comprehensive list of keypoints used in the COCO-133 person pose estimation model. The keypoints are organized into several groups:
- NAMES_
COCO_ KEYPOINTS_ 17 - COCO dataset keypoint labels for human pose estimation with 17 points.
- NAMES_
DOTA_ V1_ 5_ 16 - Labels for DOTA (Dataset for Object deTection in Aerial images) v1.5 with 16 categories.
- NAMES_
DOTA_ V1_ 15 - Labels for DOTA (Dataset for Object deTection in Aerial images) v1.0 with 15 categories.
- NAMES_
HALPE_ KEYPOINTS_ 26 - A constant array containing the 26 keypoint labels used in the HALPE human pose estimation model.
- NAMES_
HAND_ 21 - Hand keypoint labels used in hand pose estimation with 21 points. The keypoints are organized by fingers:
- NAMES_
IMAGENET_ 1K - ImageNet ILSVRC 1000-class classification labels.
- NAMES_
OBJEC T365_ 366 - Object365 dataset class names (366 classes including background)
- NAMES_
PICODET_ LAYOUT_ 3 - Simplified PicoDet document layout labels with 3 basic categories.
- NAMES_
PICODET_ LAYOUT_ 5 - Core PicoDet document layout labels with 5 essential categories.
- NAMES_
PICODET_ LAYOUT_ 17 - Labels for PicoDet document layout analysis with 17 categories.
- NAMES_
YOLO_ DOCLAYOUT_ 10 - Labels for document layout analysis using YOLO with 10 categories.
- SKELETON_
COCO_ 19 - Defines the keypoint connections for the COCO person skeleton with 19 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
- SKELETON_
COCO_ 65 - Defines the keypoint connections for the COCO-133 person skeleton with 65 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
- SKELETON_
COLOR_ COCO_ 19 - Defines colors for visualizing each connection in the COCO person skeleton. Colors are grouped by body parts:
- SKELETON_
COLOR_ COCO_ 65 - Defines colors for visualizing each connection in the COCO-133 person skeleton. Colors are grouped by body parts:
- SKELETON_
COLOR_ HALPE_ 27 - Defines colors for visualizing each connection in the HALPE person skeleton. Colors are grouped by body parts and sides:
- SKELETON_
COLOR_ HAND_ 21 - Defines colors for visualizing each connection in the hand skeleton. Colors are grouped by fingers:
- SKELETON_
HALPE_ 27 - Defines the keypoint connections for the HALPE person skeleton with 27 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
- SKELETON_
HAND_ 21 - Defines the keypoint connections for the hand skeleton with 20 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
Statics§
- NAMES_
YOLOE_ 4585 - A comprehensive list of 4585 object categories used in the YOLOE model. A comprehensive list of 4585 object categories used in the YOLOE object detection model.