Crate usls

Expand description

§usls

usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

§📚 Documentation

§⚡ Cargo Features

❕ Features in italics are enabled by default.

§Runtime & Utilities
- ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.
- ort-load-dynamic: Linking ONNX Runtime by your self. Use this if pyke doesn’t provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.
- viewer: Image/video visualization (minifb). Similar to OpenCV imshow(). See example.
- video: Video I/O support (video-rs). Enable this to read/write video streams. See example
- hf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.
- tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).
- slsl: SLSL tensor library support. Automatically enabled when using yolo or clip features.
§Execution Providers

Hardware acceleration for inference.
- cuda, tensorrt: NVIDIA GPU acceleration
- coreml: Apple Silicon acceleration
- openvino: Intel CPU/GPU/VPU acceleration
- onednn, directml, xnnpack, rocm, cann, rknpu, acl, nnapi, armnn, tvm, qnn, migraphx, vitis, azure: Various hardware/platform support
See ONNX Runtime docs and ORT performance guide for details.
§Model Selection

Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.
- yolo, sam, clip, image-classifier, dino, rtmpose, rtdetr, db, …
- All models: all-models (enables all model features)
See Supported Models for the complete list with feature names.

Modules§

core: Core functionality for vision and vision-language model inference.
models: Pre-built models for various vision and vision-language tasks.
viz: Visualization utilities for rendering and displaying ML model results

Structs§

Config: Configuration for model inference including engines, processors, and task settings.
DataLoader: A structure designed to load and manage image, video, or stream data.
DynConf: Dynamic Confidences
Engine: ONNX Runtime inference engine with configuration and session management.
HardwareConfig: Unified hardware configuration containing all execution provider configs.
Hbb: Horizontal bounding box with position, size, and metadata.
Hub: Manages interactions with GitHub repository releases and Hugging Face repositories
Image: Image wrapper with metadata and transformation capabilities.
InstanceMeta: Metadata for detection instances including ID, confidence, and name.
Keypoint: Represents a keypoint in a 2D space with optional metadata.
LogitsSampler: Logits sampler for text generation with temperature and nucleus sampling.
Mask: Mask: Gray Image.
ORTConfig: ONNX Runtime configuration with device and optimization settings.
Obb: Oriented bounding box with four vertices and metadata.
OrtEngine: ONNX Runtime inference engine with high-performance tensor operations.
Polygon: Polygon with metadata.
Prob: Probability result with classification metadata.
Processor: Image and text processing pipeline with tokenization and transformation capabilities.
ProcessorConfig: Configuration for image and text processing pipelines.
Skeleton: Skeleton structure containing keypoint connections.
Text: Text detection result with content and metadata.
Version: Version representation with major, minor, and optional patch numbers.
X: Tensor: wrapper over Array<f32, IxDyn>
Xs: Collection of named tensors with associated images and texts.
Y: Container for inference results for each image.

Enums§

DType: Data type enumeration for tensor elements.
Device: Device types for model execution.
Dir: Represents various directories on the system, including Home, Cache, Config, and more.
ImageTensorLayout: Image tensor layout formats for organizing image data in memory.
Location: Media location type indicating local or remote source.
MediaType: Media type classification for different content formats.
ResizeMode: Image resize modes for different scaling strategies.
Scale: Model scale variants for different model sizes.
Task: Task types for various vision and vision-language model inference tasks.

Constants§

NAMES_BODY_PARTS_28: Human body parts segmentation labels with 28 categories.
NAMES_COCO_80: COCO dataset object detection labels with 80 categories.
NAMES_COCO_91: Extended COCO dataset labels with 91 categories including background and unused slots.
NAMES_COCO_133: A comprehensive list of keypoints used in the COCO-133 person pose estimation model. The keypoints are organized into several groups:
NAMES_COCO_KEYPOINTS_17: COCO dataset keypoint labels for human pose estimation with 17 points.
NAMES_DOTA_V1_5_16: Labels for DOTA (Dataset for Object deTection in Aerial images) v1.5 with 16 categories.
NAMES_DOTA_V1_15: Labels for DOTA (Dataset for Object deTection in Aerial images) v1.0 with 15 categories.
NAMES_HALPE_KEYPOINTS_26: A constant array containing the 26 keypoint labels used in the HALPE human pose estimation model.
NAMES_HAND_21: Hand keypoint labels used in hand pose estimation with 21 points. The keypoints are organized by fingers:
NAMES_IMAGENET_1K: ImageNet ILSVRC 1000-class classification labels.
NAMES_OBJECT365_366: Object365 dataset class names (366 classes including background)
NAMES_PICODET_LAYOUT_3: Simplified PicoDet document layout labels with 3 basic categories.
NAMES_PICODET_LAYOUT_5: Core PicoDet document layout labels with 5 essential categories.
NAMES_PICODET_LAYOUT_17: Labels for PicoDet document layout analysis with 17 categories.
NAMES_YOLO_DOCLAYOUT_10: Labels for document layout analysis using YOLO with 10 categories.
SKELETON_COCO_19: Defines the keypoint connections for the COCO person skeleton with 19 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_COCO_65: Defines the keypoint connections for the COCO-133 person skeleton with 65 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
SKELETON_COLOR_COCO_19: Defines colors for visualizing each connection in the COCO person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_COCO_65: Defines colors for visualizing each connection in the COCO-133 person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_HALPE_27: Defines colors for visualizing each connection in the HALPE person skeleton. Colors are grouped by body parts and sides:
SKELETON_COLOR_HAND_21: Defines colors for visualizing each connection in the hand skeleton. Colors are grouped by fingers:
SKELETON_HALPE_27: Defines the keypoint connections for the HALPE person skeleton with 27 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_HAND_21: Defines the keypoint connections for the hand skeleton with 20 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:

Statics§

NAMES_YOLOE_4585: A comprehensive list of 4585 object categories used in the YOLOE model. A comprehensive list of 4585 object categories used in the YOLOE object detection model.

Crate usls

Crate usls Copy item path

§usls

§📚 Documentation

§⚡ Cargo Features

§Runtime & Utilities

§Execution Providers

§Model Selection

Modules§

Structs§

Enums§

Constants§

Statics§

Crate usls