pub struct IExecutionContext { /* private fields */ }Expand description
IExecutionContext
Context for executing inference using an engine, with functionally unsafe features.
Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously. If the engine supports dynamic shapes, each execution context in concurrent use must use a separate optimization profile.
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.
Implementations§
Source§impl IExecutionContext
impl IExecutionContext
Sourcepub fn setDebugSync(self: Pin<&mut IExecutionContext>, sync: bool)
pub fn setDebugSync(self: Pin<&mut IExecutionContext>, sync: bool)
Set the debug sync flag.
If this flag is set to true, the engine will log the successful execution for each kernel during executeV2(). It has no effect when using enqueueV3().
See [getDebugSync()]
Sourcepub fn getDebugSync(self: &IExecutionContext) -> bool
pub fn getDebugSync(self: &IExecutionContext) -> bool
Get the debug sync flag.
See [setDebugSync()]
Sourcepub unsafe fn setProfiler(
self: Pin<&mut IExecutionContext>,
profiler: *mut IProfiler,
)
pub unsafe fn setProfiler( self: Pin<&mut IExecutionContext>, profiler: *mut IProfiler, )
Set the profiler.
See IProfiler getProfiler()
Sourcepub fn getProfiler(self: &IExecutionContext) -> *mut IProfiler
pub fn getProfiler(self: &IExecutionContext) -> *mut IProfiler
Get the profiler.
See IProfiler setProfiler()
Sourcepub fn getEngine(self: &IExecutionContext) -> &ICudaEngine
pub fn getEngine(self: &IExecutionContext) -> &ICudaEngine
Get the associated engine.
See ICudaEngine
Sourcepub unsafe fn setName(self: Pin<&mut IExecutionContext>, name: *const c_char)
pub unsafe fn setName(self: Pin<&mut IExecutionContext>, name: *const c_char)
Set the name of the execution context.
This method copies the name string.
The string name must be null-terminated, and be at most 4096 bytes including the terminator.
See [getName()]
Sourcepub fn getName(self: &IExecutionContext) -> *const c_char
pub fn getName(self: &IExecutionContext) -> *const c_char
Return the name of the execution context.
See [setName()]
Sourcepub unsafe fn setDeviceMemory(
self: Pin<&mut IExecutionContext>,
memory: *mut c_void,
)
pub unsafe fn setDeviceMemory( self: Pin<&mut IExecutionContext>, memory: *mut c_void, )
Set the device memory for use by this execution context.
The memory must be aligned with CUDA memory alignment property (using cudaGetDeviceProperties()), and its size must be large enough for performing inference with the given network inputs. getDeviceMemorySizeV2() and getDeviceMemorySizeForProfileV2() report upper bounds of the size. Setting memory to nullptr is acceptable if the reported size is 0. If using enqueueV3() to run the network, the memory is in use from the invocation of enqueueV3() until network execution is complete. If using executeV2(), it is in use until executeV2() returns. Releasing or otherwise using the memory for other purposes, including using it in another execution context running in parallel, during this time will result in undefined behavior.
Deprecated in TensorRT 10.1. Superseded by setDeviceMemoryV2().
Weight streaming related scratch memory will be allocated by TensorRT if the memory is set by this API. Please use setDeviceMemoryV2() instead.
See ICudaEngine::getDeviceMemorySizeV2()
See ICudaEngine::getDeviceMemorySizeForProfileV2()
See ExecutionContextAllocationStrategy
See ICudaEngine::createExecutionContext()
Sourcepub unsafe fn setDeviceMemoryV2(
self: Pin<&mut IExecutionContext>,
memory: *mut c_void,
size: i64,
)
pub unsafe fn setDeviceMemoryV2( self: Pin<&mut IExecutionContext>, memory: *mut c_void, size: i64, )
Set the device memory and its corresponding size for use by this execution context.
The memory must be aligned with CUDA memory alignment property (using cudaGetDeviceProperties()), and its size must be large enough for performing inference with the given network inputs. getDeviceMemorySizeV2() and getDeviceMemorySizeForProfileV2() report upper bounds of the size. Setting memory to nullptr is acceptable if the reported size is 0. If using enqueueV3() to run the network, the memory is in use from the invocation of enqueueV3() until network execution is complete. If using executeV2(), it is in use until executeV2() returns. Releasing or otherwise using the memory for other purposes, including using it in another execution context running in parallel, during this time will result in undefined behavior.
See ICudaEngine::getDeviceMemorySizeV2()
See ICudaEngine::getDeviceMemorySizeForProfileV2()
See ExecutionContextAllocationStrategy
See ICudaEngine::createExecutionContext()
Sourcepub unsafe fn getTensorStrides(
self: &IExecutionContext,
tensorName: *const c_char,
) -> Dims64
pub unsafe fn getTensorStrides( self: &IExecutionContext, tensorName: *const c_char, ) -> Dims64
Return the strides of the buffer for the given tensor name.
The strides are in units of elements, not components or bytes. For example, for TensorFormat::kHWC8, a stride of one spans 8 scalars.
Note that strides can be different for different execution contexts with dynamic shapes.
If the provided name does not map to an input or output tensor, or there are dynamic dimensions that have not been set yet, return Dims{-1, {}}
tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub fn getOptimizationProfile(self: &IExecutionContext) -> i32
pub fn getOptimizationProfile(self: &IExecutionContext) -> i32
Get the index of the currently selected optimization profile.
If the profile index has not been set yet (implicitly to 0 if no other execution context has been set to profile 0, or explicitly for all subsequent contexts), an invalid value of -1 will be returned and all calls to enqueueV3()/executeV2() will fail until a valid profile index has been set. This behavior is deprecated in TensorRT 8.6, all profiles will default to optimization profile 0 and -1 will no longer be returned.
Sourcepub unsafe fn setInputShape(
self: Pin<&mut IExecutionContext>,
tensorName: *const c_char,
dims: &Dims64,
) -> bool
pub unsafe fn setInputShape( self: Pin<&mut IExecutionContext>, tensorName: *const c_char, dims: &Dims64, ) -> bool
Set shape of given input.
tensorNameThe name of an input tensor.dimsThe shape of an input tensor.
True on success, false if the provided name does not map to an input tensor, or if some other error occurred.
Each dimension must agree with the network dimension unless the latter was -1.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn getTensorShape(
self: &IExecutionContext,
tensorName: *const c_char,
) -> Dims64
pub unsafe fn getTensorShape( self: &IExecutionContext, tensorName: *const c_char, ) -> Dims64
Return the shape of the given input or output.
tensorNameThe name of an input or output tensor.
Return Dims{-1, {}} if the provided name does not map to an input or output tensor. Otherwise return the shape of the input or output tensor.
A dimension in an input tensor will have a -1 wildcard value if all the following are true:
- setInputShape() has not yet been called for this tensor
- The dimension is a runtime dimension that is not implicitly constrained to be a single value.
A dimension in an output tensor will have a -1 wildcard value if the dimension depends on values of execution tensors OR if all the following are true:
- It is a runtime dimension.
- setInputShape() has NOT been called for some input tensor(s) with a runtime shape.
- setTensorAddress() has NOT been called for some input tensor(s) with isShapeInferenceIO() = true.
An output tensor may also have -1 wildcard dimensions if its shape depends on values of tensors supplied to enqueueV3().
If the request is for the shape of an output tensor with runtime dimensions, all input tensors with isShapeInferenceIO() = true should have their value already set, since these values might be needed to compute the output shape.
Examples of an input dimension that is implicitly constrained to a single value:
- The optimization profile specifies equal min and max values.
- The dimension is named and only one value meets the optimization profile requirements for dimensions with that name.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub fn allInputDimensionsSpecified(self: &IExecutionContext) -> bool
pub fn allInputDimensionsSpecified(self: &IExecutionContext) -> bool
Whether all dynamic dimensions of input tensors have been specified
True if all dynamic dimensions of input tensors have been specified by calling setInputShape().
Trivially true if network has no dynamically shaped input tensors.
Does not work with name-base interfaces eg. IExecutionContext::setInputShape(). Use IExecutionContext::inferShapes() instead.
Sourcepub unsafe fn setErrorRecorder(
self: Pin<&mut IExecutionContext>,
recorder: *mut IErrorRecorder,
)
pub unsafe fn setErrorRecorder( self: Pin<&mut IExecutionContext>, recorder: *mut IErrorRecorder, )
Set the ErrorRecorder for this interface
Assigns the ErrorRecorder to this interface. The ErrorRecorder will track all errors during execution. This function will call incRefCount of the registered ErrorRecorder at least once. Setting recorder to nullptr unregisters the recorder with the interface, resulting in a call to decRefCount if a recorder has been registered.
If an error recorder is not set, messages will be sent to the global log stream.
recorderThe error recorder to register with this interface.
See [getErrorRecorder()]
Sourcepub fn getErrorRecorder(self: &IExecutionContext) -> *mut IErrorRecorder
pub fn getErrorRecorder(self: &IExecutionContext) -> *mut IErrorRecorder
Get the ErrorRecorder assigned to this interface.
Retrieves the assigned error recorder object for the given class. A nullptr will be returned if an error handler has not been set.
A pointer to the IErrorRecorder object that has been registered.
See [setErrorRecorder()]
Sourcepub unsafe fn setOptimizationProfileAsync(
self: Pin<&mut IExecutionContext>,
profileIndex: i32,
stream: *mut CUstream_st,
) -> bool
pub unsafe fn setOptimizationProfileAsync( self: Pin<&mut IExecutionContext>, profileIndex: i32, stream: *mut CUstream_st, ) -> bool
Select an optimization profile for the current context with async semantics.
-
profileIndexIndex of the profile. The value must lie between 0 and getEngine().getNbOptimizationProfiles() - 1 -
streamA CUDA stream on which the cudaMemcpyAsyncs may be enqueued
When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.
The selected profile will be used in subsequent calls to executeV2()/enqueueV3(). If the associated CUDA engine has inputs with dynamic shapes, the optimization profile must be set with its corresponding profileIndex before calling execute or enqueue. The newly created execution context will be assigned optimization profile 0.
If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used.
setOptimizationProfileAsync() must be called before calling setInputShape() for all dynamic input tensors or input shape tensors, which in turn must be called before executeV2()/enqueueV3().
This function will trigger layer resource updates on the next call of executeV2()/enqueueV3(), possibly resulting in performance bottlenecks.
Not synchronizing the stream used at enqueue with the stream used to set optimization profile asynchronously using this API will result in undefined behavior.
true if the call succeeded, else false (e.g. input out of range)
Sourcepub fn setEnqueueEmitsProfile(
self: Pin<&mut IExecutionContext>,
enqueueEmitsProfile: bool,
)
pub fn setEnqueueEmitsProfile( self: Pin<&mut IExecutionContext>, enqueueEmitsProfile: bool, )
Set whether enqueue emits layer timing to the profiler
If set to true (default), enqueue is synchronous and does layer timing profiling implicitly if there is a profiler attached. If set to false, enqueue will be asynchronous if there is a profiler attached. An extra method reportToProfiler() needs to be called to obtain the profiling data and report to the profiler attached.
See IExecutionContext::getEnqueueEmitsProfile()
See IExecutionContext::reportToProfiler()
Sourcepub fn getEnqueueEmitsProfile(self: &IExecutionContext) -> bool
pub fn getEnqueueEmitsProfile(self: &IExecutionContext) -> bool
Get the enqueueEmitsProfile state.
The enqueueEmitsProfile state.
Sourcepub fn reportToProfiler(self: &IExecutionContext) -> bool
pub fn reportToProfiler(self: &IExecutionContext) -> bool
Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch.
If IExecutionContext::getEnqueueEmitsProfile() returns true, the enqueue function will calculate layer timing implicitly if a profiler is provided. This function returns true and does nothing.
If IExecutionContext::getEnqueueEmitsProfile() returns false, the enqueue function will record the CUDA event timers if a profiler is provided. But it will not perform the layer timing calculation. IExecutionContext::reportToProfiler() needs to be called explicitly to calculate layer timing for the previous inference launch.
In the CUDA graph launch scenario, it will record the same set of CUDA events as in regular enqueue functions if the graph is captured from an IExecutionContext with profiler enabled. This function needs to be called after graph launch to report the layer timing info to the profiler.
profiling CUDA graphs is only available from CUDA 11.1 onwards. reportToProfiler uses the stream of the previous enqueue call, so the stream must be live otherwise behavior is undefined.
true if the call succeeded, else false (e.g. profiler not provided, in CUDA graph capture mode, etc.)
See IExecutionContext::setEnqueueEmitsProfile()
See IExecutionContext::getEnqueueEmitsProfile()
Sourcepub unsafe fn setTensorAddress(
self: Pin<&mut IExecutionContext>,
tensorName: *const c_char,
data: *mut c_void,
) -> bool
pub unsafe fn setTensorAddress( self: Pin<&mut IExecutionContext>, tensorName: *const c_char, data: *mut c_void, ) -> bool
Set memory address for given input or output tensor.
tensorNameThe name of an input or output tensor.dataThe pointer (void*) to the data owned by the user.
True on success, false if error occurred.
An address defaults to nullptr. Pass data=nullptr to reset to the default state.
Return false if the provided name does not map to an input or output tensor.
If an input pointer has type (void const*), use setInputTensorAddress() instead.
Before calling enqueueV3(), each input must have a non-null address and each output must have a non-null address or an IOutputAllocator to set it later.
If the TensorLocation of the tensor is kHOST:
- The pointer must point to a host buffer of sufficient size.
- Data representing shape values is not copied until enqueueV3 is invoked.
If the TensorLocation of the tensor is kDEVICE:
- The pointer must point to a device buffer of sufficient size and alignment, or
- Be nullptr if the tensor is an output tensor that will be allocated by IOutputAllocator.
If getTensorShape(name) reports a -1 for any dimension of an output after all input shapes have been set, use setOutputAllocator() to associate an IOutputAllocator to which the dimensions will be reported when known.
Calling both setTensorAddress and setOutputAllocator() for the same output is allowed, and can be useful for preallocating memory, and then reallocating if it’s not big enough.
The pointer must have at least 256-byte alignment.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [setInputTensorAddress()] setOutputTensorAddress() getTensorShape() setOutputAllocator() IOutputAllocator
Sourcepub unsafe fn getTensorAddress(
self: &IExecutionContext,
tensorName: *const c_char,
) -> *const c_void
pub unsafe fn getTensorAddress( self: &IExecutionContext, tensorName: *const c_char, ) -> *const c_void
Get memory address bound to given input or output tensor, or nullptr if the provided name does not map to an input or output tensor.
tensorNameThe name of an input or output tensor.
Use method getOutputTensorAddress() if a non-const pointer for an output tensor is required.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [getOutputTensorAddress()]
Sourcepub unsafe fn setOutputTensorAddress(
self: Pin<&mut IExecutionContext>,
tensorName: *const c_char,
data: *mut c_void,
) -> bool
pub unsafe fn setOutputTensorAddress( self: Pin<&mut IExecutionContext>, tensorName: *const c_char, data: *mut c_void, ) -> bool
Set the memory address for a given output tensor.
tensorNameThe name of an output tensor.dataThe pointer to the buffer to which to write the output.
True on success, false if the provided name does not map to an output tensor, does not meet alignment requirements, or some other error occurred.
Output addresses can also be set using method setTensorAddress. This method is provided for applications which prefer to use different methods for setting input and output tensors.
See setTensorAddress() for alignment and data type constraints.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [setTensorAddress()]
Sourcepub unsafe fn setInputTensorAddress(
self: Pin<&mut IExecutionContext>,
tensorName: *const c_char,
data: *const c_void,
) -> bool
pub unsafe fn setInputTensorAddress( self: Pin<&mut IExecutionContext>, tensorName: *const c_char, data: *const c_void, ) -> bool
Set memory address for given input.
tensorNameThe name of an input tensor.dataThe pointer (void const*) to the const data owned by the user.
True on success, false if the provided name does not map to an input tensor, does not meet alignment requirements, or some other error occurred.
Input addresses can also be set using method setTensorAddress, which requires a (void*).
See description of method setTensorAddress() for alignment and data type constraints.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [setTensorAddress()]
Sourcepub unsafe fn getOutputTensorAddress(
self: &IExecutionContext,
tensorName: *const c_char,
) -> *mut c_void
pub unsafe fn getOutputTensorAddress( self: &IExecutionContext, tensorName: *const c_char, ) -> *mut c_void
Get memory address for given output.
tensorNameThe name of an output tensor.
Raw output data pointer (void*) for given output tensor, or nullptr if the provided name does not map to an output tensor.
If only a (void const*) pointer is needed, an alternative is to call method getTensorAddress().
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [getTensorAddress()]
Sourcepub fn updateDeviceMemorySizeForShapes(
self: Pin<&mut IExecutionContext>,
) -> usize
pub fn updateDeviceMemorySizeForShapes( self: Pin<&mut IExecutionContext>, ) -> usize
Recompute the internal activation buffer sizes based on the current input shapes, and return the total amount of memory required.
Users can allocate the device memory based on the size returned and provided the memory to TRT with IExecutionContext::setDeviceMemory(). Must specify all input shapes and the optimization profile to use before calling this function, otherwise the partition will be invalidated.
Total amount of memory required on success, 0 if error occurred.
Sourcepub unsafe fn setInputConsumedEvent(
self: Pin<&mut IExecutionContext>,
event: *mut CUevent_st,
) -> bool
pub unsafe fn setInputConsumedEvent( self: Pin<&mut IExecutionContext>, event: *mut CUevent_st, ) -> bool
Mark input as consumed.
eventThe CUDA event that is triggered after all input tensors have been consumed.
The set event must be valid during the inference.
True on success, false if error occurred.
Passing event==nullptr removes whatever event was set, if any.
Sourcepub fn getInputConsumedEvent(self: &IExecutionContext) -> *mut CUevent_st
pub fn getInputConsumedEvent(self: &IExecutionContext) -> *mut CUevent_st
The event associated with consuming the input.
The CUDA event. Nullptr will be returned if the event is not set yet.
Sourcepub unsafe fn setOutputAllocator(
self: Pin<&mut IExecutionContext>,
tensorName: *const c_char,
outputAllocator: *mut IOutputAllocator,
) -> bool
pub unsafe fn setOutputAllocator( self: Pin<&mut IExecutionContext>, tensorName: *const c_char, outputAllocator: *mut IOutputAllocator, ) -> bool
Set output allocator to use for output tensor of given name. Pass nullptr to outputAllocator to unset. The allocator is called by enqueueV3().
tensorNameThe name of an output tensor.outputAllocatorIOutputAllocator for the tensors.
True if success, false if the provided name does not map to an output or, if some other error occurred.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [enqueueV3()] IOutputAllocator
Sourcepub unsafe fn getOutputAllocator(
self: &IExecutionContext,
tensorName: *const c_char,
) -> *mut IOutputAllocator
pub unsafe fn getOutputAllocator( self: &IExecutionContext, tensorName: *const c_char, ) -> *mut IOutputAllocator
Get output allocator associated with output tensor of given name, or nullptr if the provided name does not map to an output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See IOutputAllocator
Sourcepub unsafe fn getMaxOutputSize(
self: &IExecutionContext,
tensorName: *const c_char,
) -> i64
pub unsafe fn getMaxOutputSize( self: &IExecutionContext, tensorName: *const c_char, ) -> i64
Get upper bound on an output tensor’s size, in bytes, based on the current optimization profile and input dimensions.
If the profile or input dimensions are not yet set, or the provided name does not map to an output, returns -1.
tensorNameThe name of an output tensor.
Upper bound in bytes.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn setTemporaryStorageAllocator(
self: Pin<&mut IExecutionContext>,
allocator: *mut IGpuAllocator,
) -> bool
pub unsafe fn setTemporaryStorageAllocator( self: Pin<&mut IExecutionContext>, allocator: *mut IGpuAllocator, ) -> bool
Specify allocator to use for internal temporary storage.
This allocator is used only by enqueueV3() for temporary storage whose size cannot be predicted ahead of enqueueV3(). It is not used for output tensors, because memory allocation for those is allocated by the allocator set by setOutputAllocator(). All memory allocated is freed by the time enqueueV3() returns.
allocatorpointer to allocator to use. Pass nullptr to revert to using TensorRT’s default allocator.
True on success, false if error occurred.
See [enqueueV3()] setOutputAllocator()
Sourcepub fn getTemporaryStorageAllocator(
self: &IExecutionContext,
) -> *mut IGpuAllocator
pub fn getTemporaryStorageAllocator( self: &IExecutionContext, ) -> *mut IGpuAllocator
Get allocator set by setTemporaryStorageAllocator.
Returns a nullptr if a nullptr was passed with setTemporaryStorageAllocator().
Sourcepub unsafe fn enqueueV3(
self: Pin<&mut IExecutionContext>,
stream: *mut CUstream_st,
) -> bool
pub unsafe fn enqueueV3( self: Pin<&mut IExecutionContext>, stream: *mut CUstream_st, ) -> bool
Enqueue inference on a stream.
streamA CUDA stream on which the inference kernels will be enqueued.
True if the kernels were enqueued successfully, false otherwise.
Modifying or releasing memory that has been registered for the tensors before stream synchronization or the event passed to setInputConsumedEvent has been being triggered results in undefined behavior. Input tensor can be released after the setInputConsumedEvent whereas output tensors require stream synchronization.
Using default stream may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead.
If the Engine is streaming weights, enqueueV3 will become synchronous, and the graph will not be capturable.
Sourcepub fn setPersistentCacheLimit(self: Pin<&mut IExecutionContext>, size: usize)
pub fn setPersistentCacheLimit(self: Pin<&mut IExecutionContext>, size: usize)
Set the maximum size for persistent cache usage.
This function sets the maximum persistent L2 cache that this execution context may use for activation caching. Activation caching is not supported on all architectures - see “How TensorRT uses Memory” in the developer guide for details
sizethe size of persistent cache limitation in bytes. The default is 0 Bytes.
See [getPersistentCacheLimit]
Sourcepub fn getPersistentCacheLimit(self: &IExecutionContext) -> usize
pub fn getPersistentCacheLimit(self: &IExecutionContext) -> usize
Get the maximum size for persistent cache usage.
- Returns The size of the persistent cache limit
See [setPersistentCacheLimit]
Sourcepub fn setNvtxVerbosity(
self: Pin<&mut IExecutionContext>,
verbosity: ProfilingVerbosity,
) -> bool
pub fn setNvtxVerbosity( self: Pin<&mut IExecutionContext>, verbosity: ProfilingVerbosity, ) -> bool
Set the verbosity of the NVTX markers in the execution context.
Building with kDETAILED verbosity will generally increase latency in enqueueV3(). Call this method to select NVTX verbosity in this execution context at runtime.
The default is the verbosity with which the engine was built, and the verbosity may not be raised above that level.
This function does not affect how IEngineInspector interacts with the engine.
verbosityThe verbosity of the NVTX markers.
True if the NVTX verbosity is set successfully. False if the provided verbosity level is higher than the profiling verbosity of the corresponding engine.
See [getNvtxVerbosity()]
See ICudaEngine::getProfilingVerbosity()
Sourcepub fn getNvtxVerbosity(self: &IExecutionContext) -> ProfilingVerbosity
pub fn getNvtxVerbosity(self: &IExecutionContext) -> ProfilingVerbosity
Get the NVTX verbosity of the execution context.
The current NVTX verbosity of the execution context.
See [setNvtxVerbosity()]
Sourcepub unsafe fn setAuxStreams(
self: Pin<&mut IExecutionContext>,
auxStreams: *mut *mut CUstream_st,
nbStreams: i32,
)
pub unsafe fn setAuxStreams( self: Pin<&mut IExecutionContext>, auxStreams: *mut *mut CUstream_st, nbStreams: i32, )
Set the auxiliary streams that TensorRT should launch kernels on in the next enqueueV3() call.
If set, TensorRT will launch the kernels that are supposed to run on the auxiliary streams using the streams provided by the user with this API. If this API is not called before the enqueueV3() call, then TensorRT will use the auxiliary streams created by TensorRT internally.
TensorRT will always insert event synchronizations between the main stream provided via enqueueV3() call and the auxiliary streams:
-
At the beginning of the enqueueV3() call, TensorRT will make sure that all the auxiliary streams wait on the activities on the main stream.
-
At the end of the enqueueV3() call, TensorRT will make sure that the main stream wait on the activities on all the auxiliary streams.
-
auxStreamsThe pointer to an array of cudaStream_t with the array length equal to nbStreams. -
nbStreamsThe number of auxiliary streams provided. If nbStreams is greater thanengine->getNbAuxStreams(), then only the firstengine->getNbAuxStreams()streams will be used. IfnbStreamsis less thanengine->getNbAuxStreams(), such as settingnbStreamsto 0, then TensorRT will use the provided streams for the firstnbStreamsauxiliary streams, and will create additional streams internally for the rest of the auxiliary streams.
The provided auxiliary streams must not be the default stream and must all be different to avoid deadlocks.
See [enqueueV3()], IBuilderConfig::setMaxAuxStreams(), ICudaEngine::getNbAuxStreams()
Sourcepub unsafe fn setDebugListener(
self: Pin<&mut IExecutionContext>,
listener: *mut IDebugListener,
) -> bool
pub unsafe fn setDebugListener( self: Pin<&mut IExecutionContext>, listener: *mut IDebugListener, ) -> bool
Set DebugListener for this execution context.
listenerDebugListener for this execution context.
true if succeed, false if failure.
Sourcepub fn getDebugListener(
self: Pin<&mut IExecutionContext>,
) -> *mut IDebugListener
pub fn getDebugListener( self: Pin<&mut IExecutionContext>, ) -> *mut IDebugListener
Get the DebugListener of this execution context.
DebugListener of this execution context.
Sourcepub unsafe fn setTensorDebugState(
self: Pin<&mut IExecutionContext>,
name: *const c_char,
flag: bool,
) -> bool
pub unsafe fn setTensorDebugState( self: Pin<&mut IExecutionContext>, name: *const c_char, flag: bool, ) -> bool
Set debug state of tensor given the tensor name.
Turn the debug state of a tensor on or off. A tensor with the parameter tensor name must exist in the network, and the tensor must have been marked as a debug tensor during build time. Otherwise, an error is thrown.
-
nameName of target tensor. -
flagTrue if turning on debug state, false if turning off debug state of tensor The default is off.
True if successful, false otherwise.
Sourcepub unsafe fn getDebugState(
self: &IExecutionContext,
name: *const c_char,
) -> bool
pub unsafe fn getDebugState( self: &IExecutionContext, name: *const c_char, ) -> bool
Get the debug state.
nameName of target tensor.
true if there is a debug tensor with the given name and it has debug state turned on.
Sourcepub fn getRuntimeConfig(self: &IExecutionContext) -> *mut IRuntimeConfig
pub fn getRuntimeConfig(self: &IExecutionContext) -> *mut IRuntimeConfig
Get the runtime config object used during execution context creation.
The runtime config object.
Sourcepub fn setAllTensorsDebugState(
self: Pin<&mut IExecutionContext>,
flag: bool,
) -> bool
pub fn setAllTensorsDebugState( self: Pin<&mut IExecutionContext>, flag: bool, ) -> bool
Turn the debug state of all debug tensors on or off.
flagtrue if turning on debug state, false if turning off debug state.
true if successful, false otherwise.
The default is off.
Sourcepub fn setUnfusedTensorsDebugState(
self: Pin<&mut IExecutionContext>,
flag: bool,
) -> bool
pub fn setUnfusedTensorsDebugState( self: Pin<&mut IExecutionContext>, flag: bool, ) -> bool
Turn the debug state of unfused tensors on or off.
The default is off.
flagtrue if turning on debug state, false if turning off debug state.
true if successful, false otherwise.
Sourcepub fn getUnfusedTensorsDebugState(self: &IExecutionContext) -> bool
pub fn getUnfusedTensorsDebugState(self: &IExecutionContext) -> bool
Get the debug state of unfused tensors.
true if unfused tensors debug state is on. False if unfused tensors debug state is off.
Sourcepub unsafe fn isStreamCapturable(
self: &IExecutionContext,
stream: *mut CUstream_st,
) -> bool
pub unsafe fn isStreamCapturable( self: &IExecutionContext, stream: *mut CUstream_st, ) -> bool
Check if a subsequent call to enqueueV3 is graph-capturable on the provided stream.
streamThe stream to check.
true if a subsequent call to enqueueV3 is graph-capturable on the provided stream. Reasons why graph capture may fail include:
- blocking runtime allocation due to large dynamically sized tensors that cannot be statically allocated,
- dynamically shaped tensors whose size contains on the tensor contents, like the output of an INonZeroLayer,
- conditional control flow depending on the contents of on-device tensors, like an ITripLimitLayer whose input tensor resides on the device,
- engines that have been built for weight streaming.
If this API returns false, enqueueV3 may not be called on a capturable stream (i.e. users may not call cudaStreamBeingCapture before starting inference). Otherwise, inference will fail with an error message.
Sourcepub unsafe fn setCommunicator(
self: Pin<&mut IExecutionContext>,
communicator: *mut c_void,
) -> bool
pub unsafe fn setCommunicator( self: Pin<&mut IExecutionContext>, communicator: *mut c_void, ) -> bool
Set the NCCL communicator for the execution context.
communicatorA pointer to the communicator that is used by the execution context. The communicator is expected to be already initialized withncclCommInitRankand castable toncclComm_t.
The communicator must be uniform across all multi-device instances or undefined behavior occurs.
The lifetime of the communicator must be longer than the execution contexts it is attached to.
True if the communicator was set successfully, false otherwise.