Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
libinfer
This library provides a simple Rust interface to a TensorRT engine using cxx
Overview
libinfer allows for seamless integration of TensorRT models into Rust applications with minimal overhead. The library handles the complex C++ interaction with TensorRT while exposing a simple, idiomatic Rust API.
Installation
To use this library, you'll need:
- CUDA and TensorRT installed on your system
- Environment variables set properly:
TENSORRT_LIBRARIES: Path to TensorRT librariesCUDA_LIBRARIES: Path to CUDA librariesCUDA_INCLUDE_DIRS: Path to CUDA include directories
Add to your Cargo.toml:
[]
= "0.0.3"
Usage
The goal of the API is to keep as much processing in Rust land as possible. Here is a sample usage:
let options = Options ;
let mut engine = new.unwrap;
// Get input dimensions of the engine as [Channels, Height, Width]
let dims = engine.get_input_dims;
// Construct a dummy input (uint8 or float32 depending on model)
let input_size = dims.iter.fold;
let input = InputTensor
This library is intended to be used with pre-built TensorRT engines created by the Python API or the trtexec CLI tool for the target device.
Features
- Support for both fixed and dynamic batch sizes
- Automatic handling of different input data types (UINT8, FP32)
- Direct access to model dimensions and parameters
- Error handling via Rust's
Resulttype - Logging integration with
RUST_LOGenvironment variable
Examples
Check the examples/ directory for working examples:
basic.rs: Simple inference examplebenchmark.rs: Performance benchmarking with various batch sizesdynamic.rs: Working with dynamic batch sizesfunctional_test.rs: Testing correctness of model outputs
Run an example with:
cargo run --example basic -- --path /path/to/model.engine
Example Requirements
- You must provide your own TensorRT engine files (.engine)
- For the functional_test example, you'll need input.bin and features.txt files
- To create engine files, use NVIDIA's TensorRT tools such as:
- TensorRT Python API
- trtexec command-line tool
- ONNX -> TensorRT conversion tools
See the documentation in each example file for specific requirements.
Synchronization Model
No cudaStreamSynchronize is needed between H2D copies and enqueueV3. This is safe for several reasons:
- Stream ordering — all H2D copies and
enqueueV3are submitted to the same CUDA stream, which guarantees in-order execution. Copies complete before kernels begin. - Pageable host memory — input data comes from Rust
Vec<u8>on the regular heap (not pinned memory).cudaMemcpyAsyncwith pageable memory blocks the CPU until the copy is staged, making a subsequent sync redundant. - TensorRT auxiliary streams — TRT may use auxiliary streams internally during
enqueueV3, but it inserts event synchronizations so all auxiliary work waits on the main stream at entry and the main stream waits on all auxiliary work at exit. - CPU-side calls —
setInputShape,allInputDimensionsSpecified, andsetTensorAddressare pure CPU operations with no stream interaction.
A post-inference cudaStreamSynchronize is still required to ensure D2H output copies are complete before reading results. infer() handles this internally.
Current Limitations
- The underlying engine code is not threadsafe (and the Rust binding does not implement
Sync) - Engine instances are
Sendbut notSync - Input and output data transfers happen on the CPU-GPU boundary
Future Work
- Allow passing device pointers and CUDA streams for stream synchronization events
- Async execution support
Credits
Much of the C++ code is based on the tensorrt-cpp-api repo.