Expand description
Safe Rust bindings to NVIDIA TensorRT-RTX
⚠️ EXPERIMENTAL - NOT FOR PRODUCTION USE
This crate is in early experimental development. The API is unstable and will change. This is NOT production-ready software. Use at your own risk.
This crate provides safe, ergonomic Rust bindings to the TensorRT-RTX library for high-performance deep learning inference on NVIDIA GPUs.
§Overview
TensorRT-RTX enables efficient inference by:
- Optimizing neural network graphs
- Fusing layers and operations
- Selecting optimal kernels for your hardware
- Supporting dynamic shapes and batching
§Workflow
Using TensorRT-RTX typically follows two phases:
§Build Phase (Ahead-of-Time)
- Create a
Loggerto capture TensorRT messages - Create a
Builderto construct an optimized engine - Define your network using
NetworkDefinition - Configure optimization with
BuilderConfig - Build and serialize the engine to disk
§Inference Phase (Runtime)
- Create a
Runtimewith a logger - Deserialize the engine using
Runtime::deserialize_cuda_engine - Create an
ExecutionContextfrom the engine - Bind input/output tensors
- Execute inference with
ExecutionContext::enqueue_v3
§Example
use trtx::{Logger, Builder, Runtime};
use trtx::builder::{BuilderConfig, MemoryPoolType, network_flags};
// Dynamically load TensorRT with optional path
// when using the crate's dlopen_tensorrt_rtx feature (the default, optional and a no-op when link_tensorrt_rtx is also enabled)
trtx::dynamically_load_tensorrt(None::<String>).unwrap();
// Create logger
let logger = Logger::stderr()?;
// Build phase
let mut builder = Builder::new(&logger)?;
let mut network = builder.create_network(network_flags::EXPLICIT_BATCH)?;
let mut config = builder.create_config()?;
// Configure memory
config.set_memory_pool_limit(MemoryPoolType::kWORKSPACE, 1 << 30);
// Build and serialize
let engine_data = builder.build_serialized_network(&mut network, &mut config)?;
std::fs::write("model.engine", &engine_data)?;
// Inference phase
let mut runtime = Runtime::new(&logger)?;
let mut engine = runtime.deserialize_cuda_engine(&engine_data)?;
let context = engine.create_execution_context()?;
// List I/O tensors
let num_tensors = engine.nb_io_tensors()?;
for i in 0..num_tensors {
let name = engine.io_tensor_name(i)?;
println!("Tensor {}: {}", i, name);
}§Safety
This crate provides safe abstractions over the underlying C++ API. However,
some operations (like setting tensor addresses and enqueueing inference)
require careful management of CUDA memory and are marked as unsafe.
§Required (building)
- libclang < 22: Required for autocxx. On Windows:
winget install LLVM.LLVM -v 20.1.2
Important: libclang version 22 or greater will cause a compilation error
You can steer discovery of libclang using LIBCLANG_PATH environment variable if auto-discovery discovers a wrong version of libclang, e.g.
$env:LIBCLANG_PATH=“D:\programs\LLVM\bin” # powershell windows export LIBCLANG_PATH=/usr/lib/llvm-19/lib # linux
See https://rust-lang.github.io/rust-bindgen/requirements.html (note that autocxx uses an older fork of bindgen)
TensorRT is by default dynamically loaded. So, the TensorRT SDK is only required for building
with Cargo features link_tensorrt_rtx/ link_tensorrt_onnxparser which would link the TensorRT libraries.
Use TENSORRT_RTX_DIR to point to the TensorRT SDK root directory (the path that contains the lib folder with the shared libraries).
§Required (GPU execution)
-
NVIDIA TensorRT-RTX: Download and install from NVIDIA Developer
- The TensorRT libraries should be in a location where they can be dynamically loaded. (e.g. by setting PATH on Windows or LD_LIBRARY_PATH on Linux)
- This crate currently requires TensorRT RTX version 1.3, 1.4 or 1.5 (see Cargo feature
v_1_3,v_1_4,v_1_5). Usedefault-features = falseplus version feature to select version. You will also have to either enabledlopen_tensorrt_rtxorlink_tensorrt_rtx.
-
NVIDIA GPU: Compatible with TensorRT-RTX requirements
§C++ API reference
Rust types in this crate wrap TensorRT for RTX C++ interfaces. The authoritative class list and
method documentation is the
TensorRT for RTX C++ API (annotated).
Each wrapper’s docs also link the Rust FFI type in trtx_sys::nvinfer1 or trtx_sys::nvonnxparser
alongside the matching C++ class on NVIDIA’s site.
Re-exports§
pub use axes::Axes;pub use builder::Builder;pub use builder::BuilderConfig;pub use cuda::default_stream;pub use cuda::synchronize;pub use cuda::DeviceBuffer;pub use error::Error;pub use error::Result;pub use executor::run_onnx_with_tensorrt;pub use executor::run_onnx_zeroed;pub use executor::TensorInput;pub use executor::TensorOutput;pub use logger::LogHandler;pub use logger::Logger;pub use logger::Severity;pub use logger::StderrLogger;pub use network::ConvWeights;pub use network::NetworkDefinition;pub use network::OwnedConvWeights;pub use network::OwnedWeights;pub use network::Tensor;pub use onnx_parser::OnnxParser;pub use refitter::Refitter;pub use runtime::RuntimeCache;pub use runtime::CudaEngine;pub use runtime::EngineInspector;pub use runtime::ExecutionContext;pub use runtime::Runtime;pub use runtime::RuntimeConfig;pub use trtx_sys;
Modules§
- axes
- Axis mask type for operations that reduce or normalize over selected axes.
- builder
- Builder for creating TensorRT engines.
- builder_
config - Builder configuration for TensorRT engine builds.
- cuda
- CUDA memory management utilities
- cuda_
engine - CUDA engine and serialization config.
- engine_
inspector - Engine inspector (layer / engine introspection as text or JSON).
- error
- Error types for TensorRT-RTX operations (Rust-only; no single TensorRT C++ counterpart).
- execution_
context - executor
- Executor module providing a rustnn-compatible interface.
- host_
memory - Host memory buffer returned by the TensorRT builder (serialized engines, etc.).
- interfaces
- Rust implementations of TensorRT callback / allocator interfaces (bridged to C++).
- logger
- Logger interface for TensorRT-RTX.
- network
- Network definition for building TensorRT engines.
- onnx_
parser - ONNX model parser for TensorRT.
- optimization_
profile - Optimization profile for dynamic input shapes (min / opt / max dimensions).
- refitter
- Engine refitter (update weights in a deserialized engine).
- runtime
- Runtime for deserializing and managing TensorRT engines.
- runtime_
cache - Runtime cache for TensorRT JIT compilation (serialize / deserialize).
- runtime_
config - Runtime configuration for execution context creation.
- tensor
Enums§
- Activation
Type - ! ! \enum ActivationType ! ! \brief Enumerates the types of activation to perform in an activation layer. !
- Compute
Capability - ! ! \enum ComputeCapability ! ! \brief Describes compute capability that an engine will be built for. !
- Cuda
Graph Strategy - ! ! \enum CudaGraphStrategy ! ! \brief Strategies available for CUDA graphs optimizations for JIT (Just-In-Time) inference. ! ! \see IRuntimeConfig::setCudaGraphStrategy(), IRuntimeConfig::getCudaGraphStrategy() !
- Cumulative
Operation - ! ! \enum CumulativeOperation ! ! \brief Enumerates the cumulative operations that may be performed by a Cumulative layer. ! ! The table shows the initial value of each Cumulative operation. ! ! Operation | kFLOAT, kHALF, kBF16 | kINT32, kINT64 | ! ——— | –––––––––– | ––––––– | ! kSUM | +0.0 | 0 | !
- Data
Type - ! ! \enum DataType ! \brief The type of weights and tensors. ! The datatypes other than kBOOL, kINT32, and kINT64 are “activation datatypes,” ! as they often represent values corresponding to inference results. !
- Dynamic
Shapes Kernel Specialization Strategy - ! ! \enum DynamicShapesKernelSpecializationStrategy ! ! \brief Different kernel specialization strategies for dynamic shapes. ! ! Compilation behavior of dynamic shape kernels specialized for a new shape can be controlled at runtime. The user ! can either let the implementation compile a specialized kernel in the background, immediately or not choose to ! compile specialized kernels at all. ! ! \see IRuntimeConfig !
- Element
Wise Operation - ! ! \enum ElementWiseOperation ! ! \brief Enumerates the binary operations that may be performed by an ElementWise layer. ! ! Operations kAND, kOR, and kXOR must have inputs of DataType::kBOOL. ! ! All other operations must have inputs of floating-point type, DataType::kINT8, DataType::kINT32, or ! DataType::kINT64. ! ! \see IElementWiseLayer !
- Execution
Context Allocation Strategy - ! ! \enum ExecutionContextAllocationStrategy ! ! \brief Different memory allocation behaviors for IExecutionContext. ! ! IExecutionContext requires a block of device memory for internal activation tensors during inference. The user can ! either let the execution context manage the memory in various ways or allocate the memory themselves. ! ! \see ICudaEngine::createExecutionContext() ! \see IExecutionContext::setDeviceMemory() !
- Gather
Mode - ! ! \brief Control form of IGatherLayer ! ! \see IGatherLayer !
- Interpolation
Mode - ! \enum InterpolationMode ! ! \brief Enumerates various modes of interpolation ! !
- Layer
Information Format - ! ! \enum LayerInformationFormat ! ! \brief The format in which the IEngineInspector prints the layer information. ! ! \see IEngineInspector::getLayerInformation(), IEngineInspector::getEngineInformation() !
- Layer
Type - ! ! \enum LayerType ! ! \brief The type values of layer classes. ! ! \see ILayer::getType() !
- Matrix
Operation - ! ! \enum MatrixOperation ! ! \brief Enumerates the operations that may be performed on a tensor ! by IMatrixMultiplyLayer before multiplication. !
- Padding
Mode - ! ! \enum PaddingMode ! ! \brief Enumerates the modes of padding to perform in convolution, deconvolution and pooling layer, ! padding mode takes precedence if setPaddingMode() and setPrePadding() are also used. ! ! There are two padding styles, EXPLICIT and SAME with each style having two variants. ! The EXPLICIT style determine if the final sampling location is used or not. ! The SAME style determine if the asymmetry in the padding is on the pre or post padding. ! ! \code ! Shorthand: ! I = dimensions of input image. ! B = prePadding, before the image data. ! A = postPadding, after the image data. ! P = delta between input and output ! S = stride ! F = filter ! O = output ! D = dilation ! M = I + B + A ; The image data plus any padding ! DK = 1 + D * (F - 1) ! \endcode ! ! Formulas for Convolution: ! - EXPLICIT_ROUND_DOWN: ! \code ! O = floor((M - DK) / S) + 1 ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! O = ceil((M - DK) / S) + 1 ! \endcode ! - SAME_UPPER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + DK - I; ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + DK - I; ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Formulas for Deconvolution: ! - EXPLICIT_ROUND_DOWN: ! - EXPLICIT_ROUND_UP: ! \code ! O = (I - 1) * S + DK - (B + A) ! \endcode ! - SAME_UPPER: ! \code ! O = min(I * S, (I - 1) * S + DK) ! P = max(DK - S, 0) ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = min(I * S, (I - 1) * S + DK) ! P = max(DK - S, 0) ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Formulas for Pooling: ! - EXPLICIT_ROUND_DOWN: ! \code ! O = floor((M - F) / S) + 1 ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! O = ceil((M - F) / S) + 1 ! \endcode ! - SAME_UPPER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + F - I; ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + F - I; ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Pooling Example 1: ! \code ! Given I = {6, 6}, B = {3, 3}, A = {2, 2}, S = {2, 2}, F = {3, 3}. What is O? ! (B, A can be calculated for SAME_UPPER and SAME_LOWER mode) ! \endcode ! ! - EXPLICIT_ROUND_DOWN: ! \code ! Computation: ! M = {6, 6} + {3, 3} + {2, 2} ==> {11, 11} ! O ==> floor((M - F) / S) + 1 ! ==> floor(({11, 11} - {3, 3}) / {2, 2}) + {1, 1} ! ==> floor({8, 8} / {2, 2}) + {1, 1} ! ==> {5, 5} ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! Computation: ! M = {6, 6} + {3, 3} + {2, 2} ==> {11, 11} ! O ==> ceil((M - F) / S) + 1 ! ==> ceil(({11, 11} - {3, 3}) / {2, 2}) + {1, 1} ! ==> ceil({8, 8} / {2, 2}) + {1, 1} ! ==> {5, 5} ! \endcode ! The sample points are {0, 2, 4, 6, 8} in each dimension. ! ! - SAME_UPPER: ! \code ! Computation: ! I = {6, 6} ! S = {2, 2} ! O = ceil(I / S) = {3, 3} ! P = floor((I - 1) / S) * S + F - I ! ==> floor(({6, 6} - {1, 1}) / {2, 2}) * {2, 2} + {3, 3} - {6, 6} ! ==> {4, 4} + {3, 3} - {6, 6} ! ==> {1, 1} ! B = floor({1, 1} / {2, 2}) ! ==> {0, 0} ! A = {1, 1} - {0, 0} ! ==> {1, 1} ! \endcode ! - SAME_LOWER: ! \code ! Computation: ! I = {6, 6} ! S = {2, 2} ! O = ceil(I / S) = {3, 3} ! P = floor((I - 1) / S) * S + F - I ! ==> {1, 1} ! A = floor({1, 1} / {2, 2}) ! ==> {0, 0} ! B = {1, 1} - {0, 0} ! ==> {1, 1} ! \endcode ! The sample pointers are {0, 2, 4} in each dimension. ! SAMPLE_UPPER has {O0, O1, O2, pad} in output in each dimension. ! SAMPLE_LOWER has {pad, O0, O1, O2} in output in each dimension. ! ! Pooling Example 2: ! \code ! Given I = {6, 6}, B = {3, 3}, A = {3, 3}, S = {2, 2}, F = {3, 3}. What is O? ! \endcode !
- Pooling
Type - ! ! \enum PoolingType ! ! \brief The type of pooling to perform in a pooling layer. !
- Profiling
Verbosity - ! ! \enum ProfilingVerbosity ! ! \brief List of verbosity levels of layer information exposed in NVTX annotations and in IEngineInspector. ! ! \see IBuilderConfig::setProfilingVerbosity(), ! IBuilderConfig::getProfilingVerbosity(), ! IEngineInspector !
- Reduce
Operation - ! ! \enum ReduceOperation ! ! \brief Enumerates the reduce operations that may be performed by a Reduce layer. ! ! The table shows the result of reducing across an empty volume of a given type. ! ! Operation | kFLOAT and kHALF | kINT32 | kINT8 ! ——— | —————– | —–– | —– ! kSUM | 0 | 0 | 0 ! kPROD | 1 | 1 | 1 ! kMAX | negative infinity | INT_MIN | -128 ! kMIN | positive infinity | INT_MAX | 127 ! kAVG | NaN | 0 | -128 ! kNONE | Undefined | Undefined | Undefined ! ! The current version of TensorRT usually performs reduction for kINT8 via kFLOAT or kHALF. ! The kINT8 values show the quantized representations of the floating-point values. ! \note kNONE is a reduce operation which does not modify the input tensor. ! This is applicable to Multi-Device mode only, ! as a reduce operation is not mandatory for certain collective operations. ! See \ref INetworkDefinition::addDistCollective for more details. !
- Resize
Coordinate Transformation - ! ! \enum ResizeCoordinateTransformation ! ! \brief The resize coordinate transformation function. ! ! \see IResizeLayer::setCoordinateTransformation() !
- Resize
Round Mode - ! ! \enum ResizeRoundMode ! ! \brief The rounding mode for nearest neighbor resize. ! ! \see IResizeLayer::setNearestRounding() !
- Resize
Selector - ! ! \enum ResizeSelector ! ! \brief The coordinate selector when resize to single pixel output. ! ! \see IResizeLayer::setSelectorForSinglePixel() !
- Sample
Mode - ! ! \brief Controls how ISliceLayer and IGridSample handle out-of-bounds coordinates. ! ! \see ISliceLayer and IGridSample !
- Scale
Mode - ! ! \brief Controls how shift, scale and power are applied in a Scale layer. ! ! \see IScaleLayer !
- Scatter
Mode - ! ! \enum ScatterMode ! ! \brief Control form of IScatterLayer ! ! \see IScatterLayer !
- Tensor
Format - ! ! \enum TensorFormat ! ! \brief Format of the input/output tensors. ! ! This enum is used by both plugins and network I/O tensors. ! ! \see IPluginV2::supportsFormat(), safe::ICudaEngine::getBindingFormat() ! ! Many of the formats are vector-major or vector-minor. These formats specify ! a vector dimension and scalars per vector. ! For example, suppose that the tensor has has dimensions [M,N,C,H,W], ! the vector dimension is C and there are V scalars per vector. ! ! * A vector-major format splits the vectorized dimension into two axes in the ! memory layout. The vectorized dimension is replaced by an axis of length ceil(C/V) ! and a new dimension of length V is appended. For the example tensor, the memory layout ! is equivalent to an array with dimensions [M][N][ceil(C/V)][H][W][V]. ! Tensor coordinate (m,n,c,h,w) maps to array location [m][n][c/V][h][w][c%V]. ! ! * A vector-minor format moves the vectorized dimension to become the last axis ! in the memory layout. For the example tensor, the memory layout is equivalent to an ! array with dimensions [M][N][H][W][ceil(C/V)*V]. Tensor coordinate (m,n,c,h,w) maps ! array location subscript [m][n][h][w][c]. ! ! In interfaces that refer to “components per element”, that’s the value of V above. ! ! For more information about data formats, see the topic “Data Format Description” located in the ! TensorRT Developer Guide. ! https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/advanced.html#i-o-formats !
- TensorIO
Mode - ! ! \enum TensorIOMode ! ! \brief Definition of tensor IO Mode. !
- TopK
Operation - ! ! \enum TopKOperation ! ! \brief Enumerates the operations that may be performed by a TopK layer. !
- Unary
Operation - ! ! \enum UnaryOperation ! ! \brief Enumerates the unary operations that may be performed by a Unary layer. ! ! Operations kNOT must have inputs of DataType::kBOOL. ! ! Operation kSIGN and kABS must have inputs of floating-point type, DataType::kINT8, DataType::kINT32 or ! DataType::kINT64. ! ! Operation kISINF must have inputs of floating-point type. ! ! All other operations must have inputs of floating-point type. ! ! \see IUnaryLayer !