Crate usls

Crate usls 

Source
Expand description

§usls

usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

§📚 Documentation

§⚡ Cargo Features

❕ Features in italics are enabled by default.

  • §Runtime & Utilities

    • ort-download-binaries: Auto-download ONNX Runtime binaries from pyke.
    • ort-load-dynamic: Linking ONNX Runtime by your self. Use this if pyke doesn’t provide prebuilt binaries for your platform or you want to link your local ONNX Runtime library. See Linking Guide for more details.
    • viewer: Image/video visualization (minifb). Similar to OpenCV imshow(). See example.
    • video: Video I/O support (video-rs). Enable this to read/write video streams. See example
    • hf-hub: Hugging Face Hub support for downloading models from Hugging Face repositories.
    • tokenizers: Tokenizer support for vision-language models. Automatically enabled when using vision-language model features (blip, clip, florence2, grounding-dino, fastvlm, moondream2, owl, smolvlm, trocr, yoloe).
    • slsl: SLSL tensor library support. Automatically enabled when using yolo or clip features.
  • §Execution Providers

    Hardware acceleration for inference.

    • cuda, tensorrt: NVIDIA GPU acceleration
    • coreml: Apple Silicon acceleration
    • openvino: Intel CPU/GPU/VPU acceleration
    • onednn, directml, xnnpack, rocm, cann, rknpu, acl, nnapi, armnn, tvm, qnn, migraphx, vitis, azure: Various hardware/platform support

    See ONNX Runtime docs and ORT performance guide for details.

  • §Model Selection

    Almost each model is a separate feature. Enable only what you need to reduce compile time and binary size.

    • yolo, sam, clip, image-classifier, dino, rtmpose, rtdetr, db, …
    • All models: all-models (enables all model features)

    See Supported Models for the complete list with feature names.

Modules§

core
Core functionality for vision and vision-language model inference.
models
Pre-built models for various vision and vision-language tasks.
viz
Visualization utilities for rendering and displaying ML model results

Structs§

Config
Configuration for model inference including engines, processors, and task settings.
DataLoader
A structure designed to load and manage image, video, or stream data.
DynConf
Dynamic Confidences
Engine
ONNX Runtime inference engine with configuration and session management.
HardwareConfig
Unified hardware configuration containing all execution provider configs.
Hbb
Horizontal bounding box with position, size, and metadata.
Hub
Manages interactions with GitHub repository releases and Hugging Face repositories
Image
Image wrapper with metadata and transformation capabilities.
InstanceMeta
Metadata for detection instances including ID, confidence, and name.
Keypoint
Represents a keypoint in a 2D space with optional metadata.
LogitsSampler
Logits sampler for text generation with temperature and nucleus sampling.
Mask
Mask: Gray Image.
ORTConfig
ONNX Runtime configuration with device and optimization settings.
Obb
Oriented bounding box with four vertices and metadata.
OrtEngine
ONNX Runtime inference engine with high-performance tensor operations.
Polygon
Polygon with metadata.
Prob
Probability result with classification metadata.
Processor
Image and text processing pipeline with tokenization and transformation capabilities.
ProcessorConfig
Configuration for image and text processing pipelines.
Skeleton
Skeleton structure containing keypoint connections.
Text
Text detection result with content and metadata.
Version
Version representation with major, minor, and optional patch numbers.
X
Tensor: wrapper over Array<f32, IxDyn>
Xs
Collection of named tensors with associated images and texts.
Y
Container for inference results for each image.

Enums§

DType
Data type enumeration for tensor elements.
Device
Device types for model execution.
Dir
Represents various directories on the system, including Home, Cache, Config, and more.
ImageTensorLayout
Image tensor layout formats for organizing image data in memory.
Location
Media location type indicating local or remote source.
MediaType
Media type classification for different content formats.
ResizeMode
Image resize modes for different scaling strategies.
Scale
Model scale variants for different model sizes.
Task
Task types for various vision and vision-language model inference tasks.

Constants§

NAMES_COCO_80
COCO 80-class object categories (common split).
NAMES_COCO_91
COCO 91-class extended categories (keeps original index mapping with gaps).
NAMES_COCO_KEYPOINTS_17
COCO 17 human keypoints (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles).
NAMES_COCO_KEYPOINTS_133
COCO-WholeBody 133 keypoints (body, face, and hands combined).
NAMES_DOTA_V1_5_16
DOTA v1.5 (16 classes, includes container crane).
NAMES_DOTA_V1_15
DOTA v1.0 (15 classes, excludes container crane).
NAMES_HAND_KEYPOINTS_21
21 hand keypoints from wrist to fingertips (index-aligned, base→tip).
NAMES_YOLO_DOCLAYOUT_10
YOLO DocStructBench 10-class document layout categories.
SKELETON_COCO_19
Defines the keypoint connections for the COCO person skeleton with 19 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_COCO_65
Defines the keypoint connections for the COCO-133 person skeleton with 65 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:
SKELETON_COLOR_COCO_19
Defines colors for visualizing each connection in the COCO person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_COCO_65
Defines colors for visualizing each connection in the COCO-133 person skeleton. Colors are grouped by body parts:
SKELETON_COLOR_HALPE_27
Defines colors for visualizing each connection in the HALPE person skeleton. Colors are grouped by body parts and sides:
SKELETON_COLOR_HAND_21
Defines colors for visualizing each connection in the hand skeleton. Colors are grouped by fingers:
SKELETON_HALPE_27
Defines the keypoint connections for the HALPE person skeleton with 27 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following body parts:
SKELETON_HAND_21
Defines the keypoint connections for the hand skeleton with 20 connections. Each tuple (a, b) represents a connection between keypoint indices a and b. The connections define the following parts:

Statics§

NAMES_IMAGENET_1K
ImageNet-1K classification labels (1000 categories). Lazily loaded from an embedded text file to keep compile time low.
NAMES_OBJECT365
Object365 dataset labels without the leading background class (365 categories).
NAMES_OBJECT365_366
Object365 dataset labels including background (366 categories). Built by prepending background to NAMES_OBJECT365 for compatibility.