Crate usls

Crate usls 

Source
Expand description

§usls

usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

§📚 Documentation

§⚡ Cargo Features

❕ Features in italics are enabled by default.

  • §Runtime & Utilities

    • ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.
    • ort-load-dynamic: Linking ONNX Runtime by your self. Use this if pyke doesn’t provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.
    • viewer: Image/video visualization (minifb). Similar to OpenCV imshow(). See example.
    • video: Video I/O support (video-rs). Enable this to read/write video streams. See example
    • hf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.
    • tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).
    • slsl: SLSL tensor library support. Automatically enabled when using yolo or clip features.
  • §Execution Providers

    Hardware acceleration for inference.

    • cuda, tensorrt: NVIDIA GPU acceleration
    • coreml: Apple Silicon acceleration
    • openvino: Intel CPU/GPU/VPU acceleration
    • onednn, directml, xnnpack, rocm, cann, rknpu, acl, nnapi, armnn, tvm, qnn, migraphx, vitis, azure: Various hardware/platform support

    See ONNX Runtime docs and ORT performance guide for details.

  • §Model Selection

    Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.

    • yolo, sam, clip, image-classifier, dino, rtmpose, rtdetr, db, …
    • All models: all-models (enables all model features)

    See Supported Models for the complete list with feature names.

Modules§

core
Core functionality for vision and vision-language model inference.
models
Pre-built models for various vision and vision-language tasks.
viz
Visualization utilities for rendering and displaying ML model results

Structs§

Config
Configuration for model inference including engines, processors, and task settings.
DataLoader
A structure designed to load and manage image, video, or stream data.
DynConf
Dynamic Confidences
Engine
ONNX Runtime inference engine with configuration and session management.
HardwareConfig
Unified hardware configuration containing all execution provider configs.
Hbb
Horizontal bounding box with position, size, and metadata.
Hub
Manages interactions with GitHub repository releases and Hugging Face repositories
Image
Image wrapper with metadata and transformation capabilities.
InstanceMeta
Metadata for detection instances including ID, confidence, and name.
Keypoint
Represents a keypoint in a 2D space with optional metadata.
LogitsSampler
Logits sampler for text generation with temperature and nucleus sampling.
Mask
Mask: Gray Image.
ORTConfig
ONNX Runtime configuration with device and optimization settings.
Obb
Oriented bounding box with four vertices and metadata.
OrtEngine
ONNX Runtime inference engine with high-performance tensor operations.
Polygon
Polygon with metadata.
Prob
Probability result with classification metadata.
Processor
Image and text processing pipeline with tokenization and transformation capabilities.
ProcessorConfig
Configuration for image and text processing pipelines.
Skeleton
Skeleton structure containing keypoint connections.
Text
Text detection result with content and metadata.
Version
Version representation with major, minor, and optional patch numbers.
X
Tensor: wrapper over Array<f32, IxDyn>
Xs
Collection of named tensors with associated images and texts.
Y
Container for inference results for each image.

Enums§

DType
Data type enumeration for tensor elements.
Device
Device types for model execution.
Dir
Represents various directories on the system, including Home, Cache, Config, and more.
ImageTensorLayout
Image tensor layout formats for organizing image data in memory.
Location
Media location type indicating local or remote source.
MediaType
Media type classification for different content formats.
ResizeMode
Image resize modes for different scaling strategies.
Scale
Model scale variants for different model sizes.
Task
Task types for various vision and vision-language model inference tasks.

Constants§

NAMES_BODY_PARTS_28
Human body parts segmentation labels with 28 categories.
NAMES_COCO_80
COCO dataset object detection labels with 80 categories.
NAMES_COCO_91
Extended COCO dataset labels with 91 categories including background and unused slots.
NAMES_COCO_133
A comprehensive list of keypoints used in the COCO-133 person pose estimation model. The keypoints are organized into several groups:
NAMES_COCO_KEYPOINTS_17
COCO dataset keypoint labels for human pose estimation with 17 points.
NAMES_DOTA_V1_5_16
Labels for DOTA (Dataset for Object deTection in Aerial images) v1.5 with 16 categories.
NAMES_DOTA_V1_15
Labels for DOTA (Dataset for Object deTection in Aerial images) v1.0 with 15 categories.
NAMES_HALPE_KEYPOINTS_26
A constant array containing the 26 keypoint labels used in the HALPE human pose estimation model.
NAMES_HAND_21
Hand keypoint labels used in hand pose estimation with 21 points. The keypoints are organized by fingers:
NAMES_IMAGENET_1K
ImageNet ILSVRC 1000-class classification labels.
NAMES_OBJECT365_366
Object365 dataset class names (366 classes including background)
NAMES_PICODET_LAYOUT_3
Simplified PicoDet document layout labels with 3 basic categories.
NAMES_PICODET_LAYOUT_5
Core PicoDet document layout labels with 5 essential categories.
NAMES_PICODET_LAYOUT_17
Labels for PicoDet document layout analysis with 17 categories.
NAMES_YOLO_DOCLAYOUT_10
Labels for document layout analysis using YOLO with 10 categories.
SKELETON_COCO_19
Defines the keypoint connections for the COCO person skeleton with 19 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_COCO_65
Defines the keypoint connections for the COCO-133 person skeleton with 65 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
SKELETON_COLOR_COCO_19
Defines colors for visualizing each connection in the COCO person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_COCO_65
Defines colors for visualizing each connection in the COCO-133 person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_HALPE_27
Defines colors for visualizing each connection in the HALPE person skeleton. Colors are grouped by body parts and sides:
SKELETON_COLOR_HAND_21
Defines colors for visualizing each connection in the hand skeleton. Colors are grouped by fingers:
SKELETON_HALPE_27
Defines the keypoint connections for the HALPE person skeleton with 27 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_HAND_21
Defines the keypoint connections for the hand skeleton with 20 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:

Statics§

NAMES_YOLOE_4585
A comprehensive list of 4585 object categories used in the YOLOE model. A comprehensive list of 4585 object categories used in the YOLOE object detection model.