Skip to main content

Crate trtx

Crate trtx 

Source
Expand description

Safe Rust bindings to NVIDIA TensorRT-RTX

⚠️ EXPERIMENTAL - NOT FOR PRODUCTION USE

This crate is in early experimental development. The API is unstable and will change. This is NOT production-ready software. Use at your own risk.

This crate provides safe, ergonomic Rust bindings to the TensorRT-RTX library for high-performance deep learning inference on NVIDIA GPUs.

§Overview

TensorRT-RTX enables efficient inference by:

  • Optimizing neural network graphs
  • Fusing layers and operations
  • Selecting optimal kernels for your hardware
  • Supporting dynamic shapes and batching

§Workflow

Using TensorRT-RTX typically follows two phases:

§Build Phase (Ahead-of-Time)

  1. Create a Logger to capture TensorRT messages
  2. Create a Builder to construct an optimized engine
  3. Define your network using NetworkDefinition
  4. Configure optimization with BuilderConfig
  5. Build and serialize the engine to disk

§Inference Phase (Runtime)

  1. Create a Runtime with a logger
  2. Deserialize the engine using Runtime::deserialize_cuda_engine
  3. Create an ExecutionContext from the engine
  4. Bind input/output tensors
  5. Execute inference with ExecutionContext::enqueue_v3

§Example

use trtx::{Logger, Builder, Runtime};
use trtx::builder::{BuilderConfig, MemoryPoolType, network_flags};

// Dynamically load TensorRT with optional path
// when using the crate's dlopen_tensorrt_rtx feature (the default, optional and a no-op when link_tensorrt_rtx is also enabled)
trtx::dynamically_load_tensorrt(None::<String>).unwrap();

// Create logger
let logger = Logger::stderr()?;

// Build phase
let mut builder = Builder::new(&logger)?;
let mut network = builder.create_network(network_flags::EXPLICIT_BATCH)?;
let mut config = builder.create_config()?;

// Configure memory
config.set_memory_pool_limit(MemoryPoolType::kWORKSPACE, 1 << 30);

// Build and serialize
let engine_data = builder.build_serialized_network(&mut network, &mut config)?;
std::fs::write("model.engine", &engine_data)?;

// Inference phase
let mut runtime = Runtime::new(&logger)?;
let mut engine = runtime.deserialize_cuda_engine(&engine_data)?;
let context = engine.create_execution_context()?;

// List I/O tensors
let num_tensors = engine.nb_io_tensors()?;
for i in 0..num_tensors {
    let name = engine.io_tensor_name(i)?;
    println!("Tensor {}: {}", i, name);
}

§Safety

This crate provides safe abstractions over the underlying C++ API. However, some operations (like setting tensor addresses and enqueueing inference) require careful management of CUDA memory and are marked as unsafe.

§Required (building)

  1. libclang < 22: Required for autocxx. On Windows: winget install LLVM.LLVM -v 20.1.2

Important: libclang version 22 or greater will cause a compilation error

You can steer discovery of libclang using LIBCLANG_PATH environment variable if auto-discovery discovers a wrong version of libclang, e.g.

$env:LIBCLANG_PATH=“D:\programs\LLVM\bin” # powershell windows export LIBCLANG_PATH=/usr/lib/llvm-19/lib # linux

See https://rust-lang.github.io/rust-bindgen/requirements.html (note that autocxx uses an older fork of bindgen)

TensorRT is by default dynamically loaded. So, the TensorRT SDK is only required for building with Cargo features link_tensorrt_rtx/ link_tensorrt_onnxparser which would link the TensorRT libraries. Use TENSORRT_RTX_DIR to point to the TensorRT SDK root directory (the path that contains the lib folder with the shared libraries).

§Required (GPU execution)

  1. NVIDIA TensorRT-RTX: Download and install from NVIDIA Developer

    • The TensorRT libraries should be in a location where they can be dynamically loaded. (e.g. by setting PATH on Windows or LD_LIBRARY_PATH on Linux)
    • This crate currently requires TensorRT RTX version 1.3, 1.4 or 1.5 (see Cargo feature v_1_3, v_1_4, v_1_5). Use default-features = false plus version feature to select version. You will also have to either enable dlopen_tensorrt_rtx or link_tensorrt_rtx.
  2. NVIDIA GPU: Compatible with TensorRT-RTX requirements

§C++ API reference

Rust types in this crate wrap TensorRT for RTX C++ interfaces. The authoritative class list and method documentation is the TensorRT for RTX C++ API (annotated). Each wrapper’s docs also link the Rust FFI type in trtx_sys::nvinfer1 or trtx_sys::nvonnxparser alongside the matching C++ class on NVIDIA’s site.

Re-exports§

pub use axes::Axes;
pub use builder::Builder;
pub use builder::BuilderConfig;
pub use cuda::default_stream;
pub use cuda::synchronize;
pub use cuda::DeviceBuffer;
pub use error::Error;
pub use error::Result;
pub use executor::run_onnx_with_tensorrt;
pub use executor::run_onnx_zeroed;
pub use executor::TensorInput;
pub use executor::TensorOutput;
pub use logger::LogHandler;
pub use logger::Logger;
pub use logger::Severity;
pub use logger::StderrLogger;
pub use network::ConvWeights;
pub use network::NetworkDefinition;
pub use network::OwnedConvWeights;
pub use network::OwnedWeights;
pub use network::Tensor;
pub use onnx_parser::OnnxParser;
pub use refitter::Refitter;
pub use runtime::RuntimeCache;
pub use runtime::CudaEngine;
pub use runtime::EngineInspector;
pub use runtime::ExecutionContext;
pub use runtime::Runtime;
pub use runtime::RuntimeConfig;
pub use trtx_sys;

Modules§

axes
Axis mask type for operations that reduce or normalize over selected axes.
builder
Builder for creating TensorRT engines.
builder_config
Builder configuration for TensorRT engine builds.
cuda
CUDA memory management utilities
cuda_engine
CUDA engine and serialization config.
engine_inspector
Engine inspector (layer / engine introspection as text or JSON).
error
Error types for TensorRT-RTX operations (Rust-only; no single TensorRT C++ counterpart).
execution_context
executor
Executor module providing a rustnn-compatible interface.
host_memory
Host memory buffer returned by the TensorRT builder (serialized engines, etc.).
interfaces
Rust implementations of TensorRT callback / allocator interfaces (bridged to C++).
logger
Logger interface for TensorRT-RTX.
network
Network definition for building TensorRT engines.
onnx_parser
ONNX model parser for TensorRT.
optimization_profile
Optimization profile for dynamic input shapes (min / opt / max dimensions).
refitter
Engine refitter (update weights in a deserialized engine).
runtime
Runtime for deserializing and managing TensorRT engines.
runtime_cache
Runtime cache for TensorRT JIT compilation (serialize / deserialize).
runtime_config
Runtime configuration for execution context creation.
tensor

Enums§

ActivationType
! ! \enum ActivationType ! ! \brief Enumerates the types of activation to perform in an activation layer. !
ComputeCapability
! ! \enum ComputeCapability ! ! \brief Describes compute capability that an engine will be built for. !
CudaGraphStrategy
! ! \enum CudaGraphStrategy ! ! \brief Strategies available for CUDA graphs optimizations for JIT (Just-In-Time) inference. ! ! \see IRuntimeConfig::setCudaGraphStrategy(), IRuntimeConfig::getCudaGraphStrategy() !
CumulativeOperation
! ! \enum CumulativeOperation ! ! \brief Enumerates the cumulative operations that may be performed by a Cumulative layer. ! ! The table shows the initial value of each Cumulative operation. ! ! Operation | kFLOAT, kHALF, kBF16 | kINT32, kINT64 | ! ——— | –––––––––– | ––––––– | ! kSUM | +0.0 | 0 | !
DataType
! ! \enum DataType ! \brief The type of weights and tensors. ! The datatypes other than kBOOL, kINT32, and kINT64 are “activation datatypes,” ! as they often represent values corresponding to inference results. !
DynamicShapesKernelSpecializationStrategy
! ! \enum DynamicShapesKernelSpecializationStrategy ! ! \brief Different kernel specialization strategies for dynamic shapes. ! ! Compilation behavior of dynamic shape kernels specialized for a new shape can be controlled at runtime. The user ! can either let the implementation compile a specialized kernel in the background, immediately or not choose to ! compile specialized kernels at all. ! ! \see IRuntimeConfig !
ElementWiseOperation
! ! \enum ElementWiseOperation ! ! \brief Enumerates the binary operations that may be performed by an ElementWise layer. ! ! Operations kAND, kOR, and kXOR must have inputs of DataType::kBOOL. ! ! All other operations must have inputs of floating-point type, DataType::kINT8, DataType::kINT32, or ! DataType::kINT64. ! ! \see IElementWiseLayer !
ExecutionContextAllocationStrategy
! ! \enum ExecutionContextAllocationStrategy ! ! \brief Different memory allocation behaviors for IExecutionContext. ! ! IExecutionContext requires a block of device memory for internal activation tensors during inference. The user can ! either let the execution context manage the memory in various ways or allocate the memory themselves. ! ! \see ICudaEngine::createExecutionContext() ! \see IExecutionContext::setDeviceMemory() !
GatherMode
! ! \brief Control form of IGatherLayer ! ! \see IGatherLayer !
InterpolationMode
! \enum InterpolationMode ! ! \brief Enumerates various modes of interpolation ! !
LayerInformationFormat
! ! \enum LayerInformationFormat ! ! \brief The format in which the IEngineInspector prints the layer information. ! ! \see IEngineInspector::getLayerInformation(), IEngineInspector::getEngineInformation() !
LayerType
! ! \enum LayerType ! ! \brief The type values of layer classes. ! ! \see ILayer::getType() !
MatrixOperation
! ! \enum MatrixOperation ! ! \brief Enumerates the operations that may be performed on a tensor ! by IMatrixMultiplyLayer before multiplication. !
PaddingMode
! ! \enum PaddingMode ! ! \brief Enumerates the modes of padding to perform in convolution, deconvolution and pooling layer, ! padding mode takes precedence if setPaddingMode() and setPrePadding() are also used. ! ! There are two padding styles, EXPLICIT and SAME with each style having two variants. ! The EXPLICIT style determine if the final sampling location is used or not. ! The SAME style determine if the asymmetry in the padding is on the pre or post padding. ! ! \code ! Shorthand: ! I = dimensions of input image. ! B = prePadding, before the image data. ! A = postPadding, after the image data. ! P = delta between input and output ! S = stride ! F = filter ! O = output ! D = dilation ! M = I + B + A ; The image data plus any padding ! DK = 1 + D * (F - 1) ! \endcode ! ! Formulas for Convolution: ! - EXPLICIT_ROUND_DOWN: ! \code ! O = floor((M - DK) / S) + 1 ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! O = ceil((M - DK) / S) + 1 ! \endcode ! - SAME_UPPER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + DK - I; ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + DK - I; ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Formulas for Deconvolution: ! - EXPLICIT_ROUND_DOWN: ! - EXPLICIT_ROUND_UP: ! \code ! O = (I - 1) * S + DK - (B + A) ! \endcode ! - SAME_UPPER: ! \code ! O = min(I * S, (I - 1) * S + DK) ! P = max(DK - S, 0) ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = min(I * S, (I - 1) * S + DK) ! P = max(DK - S, 0) ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Formulas for Pooling: ! - EXPLICIT_ROUND_DOWN: ! \code ! O = floor((M - F) / S) + 1 ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! O = ceil((M - F) / S) + 1 ! \endcode ! - SAME_UPPER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + F - I; ! B = floor(P / 2) ! A = P - B ! \endcode ! - SAME_LOWER: ! \code ! O = ceil(I / S) ! P = floor((I - 1) / S) * S + F - I; ! A = floor(P / 2) ! B = P - A ! \endcode ! ! Pooling Example 1: ! \code ! Given I = {6, 6}, B = {3, 3}, A = {2, 2}, S = {2, 2}, F = {3, 3}. What is O? ! (B, A can be calculated for SAME_UPPER and SAME_LOWER mode) ! \endcode ! ! - EXPLICIT_ROUND_DOWN: ! \code ! Computation: ! M = {6, 6} + {3, 3} + {2, 2} ==> {11, 11} ! O ==> floor((M - F) / S) + 1 ! ==> floor(({11, 11} - {3, 3}) / {2, 2}) + {1, 1} ! ==> floor({8, 8} / {2, 2}) + {1, 1} ! ==> {5, 5} ! \endcode ! - EXPLICIT_ROUND_UP: ! \code ! Computation: ! M = {6, 6} + {3, 3} + {2, 2} ==> {11, 11} ! O ==> ceil((M - F) / S) + 1 ! ==> ceil(({11, 11} - {3, 3}) / {2, 2}) + {1, 1} ! ==> ceil({8, 8} / {2, 2}) + {1, 1} ! ==> {5, 5} ! \endcode ! The sample points are {0, 2, 4, 6, 8} in each dimension. ! ! - SAME_UPPER: ! \code ! Computation: ! I = {6, 6} ! S = {2, 2} ! O = ceil(I / S) = {3, 3} ! P = floor((I - 1) / S) * S + F - I ! ==> floor(({6, 6} - {1, 1}) / {2, 2}) * {2, 2} + {3, 3} - {6, 6} ! ==> {4, 4} + {3, 3} - {6, 6} ! ==> {1, 1} ! B = floor({1, 1} / {2, 2}) ! ==> {0, 0} ! A = {1, 1} - {0, 0} ! ==> {1, 1} ! \endcode ! - SAME_LOWER: ! \code ! Computation: ! I = {6, 6} ! S = {2, 2} ! O = ceil(I / S) = {3, 3} ! P = floor((I - 1) / S) * S + F - I ! ==> {1, 1} ! A = floor({1, 1} / {2, 2}) ! ==> {0, 0} ! B = {1, 1} - {0, 0} ! ==> {1, 1} ! \endcode ! The sample pointers are {0, 2, 4} in each dimension. ! SAMPLE_UPPER has {O0, O1, O2, pad} in output in each dimension. ! SAMPLE_LOWER has {pad, O0, O1, O2} in output in each dimension. ! ! Pooling Example 2: ! \code ! Given I = {6, 6}, B = {3, 3}, A = {3, 3}, S = {2, 2}, F = {3, 3}. What is O? ! \endcode !
PoolingType
! ! \enum PoolingType ! ! \brief The type of pooling to perform in a pooling layer. !
ProfilingVerbosity
! ! \enum ProfilingVerbosity ! ! \brief List of verbosity levels of layer information exposed in NVTX annotations and in IEngineInspector. ! ! \see IBuilderConfig::setProfilingVerbosity(), ! IBuilderConfig::getProfilingVerbosity(), ! IEngineInspector !
ReduceOperation
! ! \enum ReduceOperation ! ! \brief Enumerates the reduce operations that may be performed by a Reduce layer. ! ! The table shows the result of reducing across an empty volume of a given type. ! ! Operation | kFLOAT and kHALF | kINT32 | kINT8 ! ——— | —————– | —–– | —– ! kSUM | 0 | 0 | 0 ! kPROD | 1 | 1 | 1 ! kMAX | negative infinity | INT_MIN | -128 ! kMIN | positive infinity | INT_MAX | 127 ! kAVG | NaN | 0 | -128 ! kNONE | Undefined | Undefined | Undefined ! ! The current version of TensorRT usually performs reduction for kINT8 via kFLOAT or kHALF. ! The kINT8 values show the quantized representations of the floating-point values. ! \note kNONE is a reduce operation which does not modify the input tensor. ! This is applicable to Multi-Device mode only, ! as a reduce operation is not mandatory for certain collective operations. ! See \ref INetworkDefinition::addDistCollective for more details. !
ResizeCoordinateTransformation
! ! \enum ResizeCoordinateTransformation ! ! \brief The resize coordinate transformation function. ! ! \see IResizeLayer::setCoordinateTransformation() !
ResizeRoundMode
! ! \enum ResizeRoundMode ! ! \brief The rounding mode for nearest neighbor resize. ! ! \see IResizeLayer::setNearestRounding() !
ResizeSelector
! ! \enum ResizeSelector ! ! \brief The coordinate selector when resize to single pixel output. ! ! \see IResizeLayer::setSelectorForSinglePixel() !
SampleMode
! ! \brief Controls how ISliceLayer and IGridSample handle out-of-bounds coordinates. ! ! \see ISliceLayer and IGridSample !
ScaleMode
! ! \brief Controls how shift, scale and power are applied in a Scale layer. ! ! \see IScaleLayer !
ScatterMode
! ! \enum ScatterMode ! ! \brief Control form of IScatterLayer ! ! \see IScatterLayer !
TensorFormat
! ! \enum TensorFormat ! ! \brief Format of the input/output tensors. ! ! This enum is used by both plugins and network I/O tensors. ! ! \see IPluginV2::supportsFormat(), safe::ICudaEngine::getBindingFormat() ! ! Many of the formats are vector-major or vector-minor. These formats specify ! a vector dimension and scalars per vector. ! For example, suppose that the tensor has has dimensions [M,N,C,H,W], ! the vector dimension is C and there are V scalars per vector. ! ! * A vector-major format splits the vectorized dimension into two axes in the ! memory layout. The vectorized dimension is replaced by an axis of length ceil(C/V) ! and a new dimension of length V is appended. For the example tensor, the memory layout ! is equivalent to an array with dimensions [M][N][ceil(C/V)][H][W][V]. ! Tensor coordinate (m,n,c,h,w) maps to array location [m][n][c/V][h][w][c%V]. ! ! * A vector-minor format moves the vectorized dimension to become the last axis ! in the memory layout. For the example tensor, the memory layout is equivalent to an ! array with dimensions [M][N][H][W][ceil(C/V)*V]. Tensor coordinate (m,n,c,h,w) maps ! array location subscript [m][n][h][w][c]. ! ! In interfaces that refer to “components per element”, that’s the value of V above. ! ! For more information about data formats, see the topic “Data Format Description” located in the ! TensorRT Developer Guide. ! https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/advanced.html#i-o-formats !
TensorIOMode
! ! \enum TensorIOMode ! ! \brief Definition of tensor IO Mode. !
TopKOperation
! ! \enum TopKOperation ! ! \brief Enumerates the operations that may be performed by a TopK layer. !
UnaryOperation
! ! \enum UnaryOperation ! ! \brief Enumerates the unary operations that may be performed by a Unary layer. ! ! Operations kNOT must have inputs of DataType::kBOOL. ! ! Operation kSIGN and kABS must have inputs of floating-point type, DataType::kINT8, DataType::kINT32 or ! DataType::kINT64. ! ! Operation kISINF must have inputs of floating-point type. ! ! All other operations must have inputs of floating-point type. ! ! \see IUnaryLayer !

Functions§

dynamically_load_tensorrt
dynamically_load_tensorrt_onnxparser

Type Aliases§

ResizeMode