docs.rs failed to build libinfer-0.0.5
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

`libinfer`

This library provides a simple Rust interface to a TensorRT engine using cxx

Overview

libinfer allows for seamless integration of TensorRT models into Rust applications with minimal overhead. The library handles the complex C++ interaction with TensorRT while exposing a simple, idiomatic Rust API.

Installation

To use this library, you'll need:

CUDA and TensorRT installed on your system
Environment variables set properly:
- TENSORRT_LIBRARIES: Path to TensorRT libraries
- CUDA_LIBRARIES: Path to CUDA libraries
- CUDA_INCLUDE_DIRS: Path to CUDA include directories

Add to your Cargo.toml:

[dependencies]
libinfer = "0.0.3"

Usage

The goal of the API is to keep as much processing in Rust land as possible. Here is a sample usage:

let options = Options {
    path: "yolov8n.engine".into(),
    device_index: 0,
};
let mut engine = Engine::new(&options).unwrap();

// Get input dimensions of the engine as [Channels, Height, Width]
let dims = engine.get_input_dims();

// Construct a dummy input (uint8 or float32 depending on model)
let input_size = dims.iter().fold(1, |acc, &e| acc * e as usize);
let input = InputTensor {
    name: "input".to_string();
    data: vec![0u8; input_size];

// Run inference
let output = engine.pin_mut().infer(&input).unwrap();

// Postprocess the output according to your model's output format
// ...

This library is intended to be used with pre-built TensorRT engines created by the Python API or the trtexec CLI tool for the target device.

Features

Support for both fixed and dynamic batch sizes
Automatic handling of different input data types (UINT8, FP32)
Direct access to model dimensions and parameters
Error handling via Rust's Result type
Logging integration with RUST_LOG environment variable

Examples

Check the examples/ directory for working examples:

basic.rs: Simple inference example
benchmark.rs: Performance benchmarking with various batch sizes
dynamic.rs: Working with dynamic batch sizes
functional_test.rs: Testing correctness of model outputs

Run an example with:

cargo run --example basic -- --path /path/to/model.engine

Example Requirements

You must provide your own TensorRT engine files (.engine)
For the functional_test example, you'll need input.bin and features.txt files
To create engine files, use NVIDIA's TensorRT tools such as:
- TensorRT Python API
- trtexec command-line tool
- ONNX -> TensorRT conversion tools

See the documentation in each example file for specific requirements.

Synchronization Model

No cudaStreamSynchronize is needed between H2D copies and enqueueV3. This is safe for several reasons:

Stream ordering — all H2D copies and enqueueV3 are submitted to the same CUDA stream, which guarantees in-order execution. Copies complete before kernels begin.
Pageable host memory — input data comes from Rust Vec<u8> on the regular heap (not pinned memory). cudaMemcpyAsync with pageable memory blocks the CPU until the copy is staged, making a subsequent sync redundant.
TensorRT auxiliary streams — TRT may use auxiliary streams internally during enqueueV3, but it inserts event synchronizations so all auxiliary work waits on the main stream at entry and the main stream waits on all auxiliary work at exit.
CPU-side calls — setInputShape, allInputDimensionsSpecified, and setTensorAddress are pure CPU operations with no stream interaction.

A post-inference cudaStreamSynchronize is still required to ensure D2H output copies are complete before reading results. infer() handles this internally.

Current Limitations

The underlying engine code is not threadsafe (and the Rust binding does not implement Sync)
Engine instances are Send but not Sync
Input and output data transfers happen on the CPU-GPU boundary

Future Work

Allow passing device pointers and CUDA streams for stream synchronization events
Async execution support

Credits

Much of the C++ code is based on the tensorrt-cpp-api repo.

libinfer 0.0.5

libinfer