pub struct ICudaEngine { /* private fields */ }Expand description
ICudaEngine
An engine for executing inference on a built network, with functionally unsafe features.
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.
Implementations§
Source§impl ICudaEngine
impl ICudaEngine
Sourcepub unsafe fn createExecutionContext1(
self: Pin<&mut Self>,
runtimeConfig: *mut IRuntimeConfig,
) -> *mut IExecutionContext
pub unsafe fn createExecutionContext1( self: Pin<&mut Self>, runtimeConfig: *mut IRuntimeConfig, ) -> *mut IExecutionContext
Create an execution context with TensorRT JIT runtime config.
runtimeConfigThe runtime config for TensorRT JIT.
See [IRuntimeConfig]
Sourcepub unsafe fn getTensorBytesPerComponent1(
&self,
tensorName: *const c_char,
profileIndex: i32,
) -> i32
pub unsafe fn getTensorBytesPerComponent1( &self, tensorName: *const c_char, profileIndex: i32, ) -> i32
Return the number of bytes per component of an element given of given profile, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor.
The vector component size is returned if getTensorVectorizedDim(tensorName, profileIndex) != -1.
tensorNameThe name of an input or output tensor.profileIndexThe profile index to query
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [getTensorVectorizedDim(tensorName], profileIndex)
Sourcepub unsafe fn getTensorComponentsPerElement1(
&self,
tensorName: *const c_char,
profileIndex: i32,
) -> i32
pub unsafe fn getTensorComponentsPerElement1( &self, tensorName: *const c_char, profileIndex: i32, ) -> i32
Return the number of components included in one element of given profile, or -1 if tensor is not vectorized or the provided name does not map to an input or output tensor.
The number of elements in the vectors is returned if getTensorVectorizedDim(tensorName, profileIndex) != -1.
tensorNameThe name of an input or output tensor.profileIndexThe profile index to query
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See [getTensorVectorizedDim(tensorName], profileIndex)
Sourcepub unsafe fn getTensorFormat1(
&self,
tensorName: *const c_char,
profileIndex: i32,
) -> TensorFormat
pub unsafe fn getTensorFormat1( &self, tensorName: *const c_char, profileIndex: i32, ) -> TensorFormat
Return the tensor format of given profile, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor.
tensorNameThe name of an input or output tensor.profileIndexThe profile index to query the format for.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn getTensorFormatDesc1(
&self,
tensorName: *const c_char,
profileIndex: i32,
) -> *const c_char
pub unsafe fn getTensorFormatDesc1( &self, tensorName: *const c_char, profileIndex: i32, ) -> *const c_char
Return the human readable description of the tensor format of given profile, or empty string if the provided name does not map to an input or output tensor.
The description includes the order, vectorization, data type, and strides. Examples are shown as follows: Example 1: kCHW + FP32 “Row-major linear FP32 format” Example 2: kCHW2 + FP16 “Two-wide channel vectorized row-major FP16 format” Example 3: kHWC8 + FP16 + Line Stride = 32 “Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
tensorNameThe name of an input or output tensor.profileIndexThe profile index to query the format for.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn getTensorVectorizedDim1(
&self,
tensorName: *const c_char,
profileIndex: i32,
) -> i32
pub unsafe fn getTensorVectorizedDim1( &self, tensorName: *const c_char, profileIndex: i32, ) -> i32
Return the dimension index that the buffer is vectorized of given profile, or -1 if the provided name does not map to an input or output tensor.
Specifically -1 is returned if scalars per vector is 1.
tensorNameThe name of an input.profileIndexThe profile index to query the format for.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Source§impl ICudaEngine
impl ICudaEngine
Sourcepub unsafe fn getTensorShape(
self: &ICudaEngine,
tensorName: *const c_char,
) -> Dims64
pub unsafe fn getTensorShape( self: &ICudaEngine, tensorName: *const c_char, ) -> Dims64
Get shape of an input or output tensor.
tensorNameThe name of an input or output tensor.
shape of the tensor, with -1 in place of each dynamic runtime dimension, or Dims{-1, {}} if the provided name does not map to an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn getTensorDataType(
self: &ICudaEngine,
tensorName: *const c_char,
) -> DataType
pub unsafe fn getTensorDataType( self: &ICudaEngine, tensorName: *const c_char, ) -> DataType
Determine the required data type for a buffer from its tensor name.
tensorNameThe name of an input or output tensor.
The type of the data in the buffer, or DataType::kFLOAT if the provided name does not map to an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub fn getNbLayers(self: &ICudaEngine) -> i32
pub fn getNbLayers(self: &ICudaEngine) -> i32
Get the number of layers in the network.
The number of layers in the network is not necessarily the number in the original network definition, as layers may be combined or eliminated as the engine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.
The number of layers in the network.
Sourcepub fn serialize(self: &ICudaEngine) -> *mut IHostMemory
pub fn serialize(self: &ICudaEngine) -> *mut IHostMemory
Serialize the network to a stream.
A IHostMemory object that contains the serialized engine.
The network may be deserialized with IRuntime::deserializeCudaEngine().
Sourcepub fn createExecutionContext(
self: Pin<&mut ICudaEngine>,
strategy: ExecutionContextAllocationStrategy,
) -> *mut IExecutionContext
pub fn createExecutionContext( self: Pin<&mut ICudaEngine>, strategy: ExecutionContextAllocationStrategy, ) -> *mut IExecutionContext
Create an execution context and specify the strategy for allocating internal activation memory.
The default value for the allocation strategy is ExecutionContextAllocationStrategy::kSTATIC, which means the context will pre-allocate a block of device memory that is sufficient for all profiles. The newly created execution context will be assigned optimization profile 0. If an error recorder has been set for the engine, it will also be passed to the execution context.
See IExecutionContext
See IExecutionContext::setOptimizationProfileAsync()
See ExecutionContextAllocationStrategy
Sourcepub unsafe fn getTensorLocation(
self: &ICudaEngine,
tensorName: *const c_char,
) -> TensorLocation
pub unsafe fn getTensorLocation( self: &ICudaEngine, tensorName: *const c_char, ) -> TensorLocation
Get whether an input or output tensor must be on GPU or CPU.
tensorNameThe name of an input or output tensor.
TensorLocation::kDEVICE if tensorName must be on GPU, or TensorLocation::kHOST if on CPU, or TensorLocation::kDEVICE if the provided name does not map to an input or output tensor.
The location is established at build time. E.g. shape tensors inputs are typically required to be on the CPU.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn isShapeInferenceIO(
self: &ICudaEngine,
tensorName: *const c_char,
) -> bool
pub unsafe fn isShapeInferenceIO( self: &ICudaEngine, tensorName: *const c_char, ) -> bool
True if tensor is required as input for shape calculations or is output from shape calculations.
Return true for either of the following conditions:
-
The tensor is a network input, and its value is required for IExecutionContext::getTensorShape() to return the shape of a network output.
-
The tensor is a network output, and inferShape() will compute its values.
For example, if a network uses an input tensor “foo” as an addend to an IElementWiseLayer that computes the “reshape dimensions” for IShuffleLayer, then isShapeInferenceIO(“foo”) == true. If the network copies said input tensor “foo” to an output “bar”, then isShapeInferenceIO(“bar”) == true and IExecutionContext::inferShapes() will write to “bar”.
Sourcepub unsafe fn getTensorIOMode(
self: &ICudaEngine,
tensorName: *const c_char,
) -> TensorIOMode
pub unsafe fn getTensorIOMode( self: &ICudaEngine, tensorName: *const c_char, ) -> TensorIOMode
Determine whether a tensor is an input or output tensor.
tensorNameThe name of an input or output tensor.
kINPUT if tensorName is an input, kOUTPUT if tensorName is an output, or kNONE if neither.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub unsafe fn getAliasedInputTensor(
self: &ICudaEngine,
tensorName: *const c_char,
) -> *const c_char
pub unsafe fn getAliasedInputTensor( self: &ICudaEngine, tensorName: *const c_char, ) -> *const c_char
Get the input tensor name that an output tensor should alias with.
Some operations (e.g., KVCacheUpdate) require that certain output tensors share memory with input tensors. This method returns the name of the input tensor that a given output tensor should alias with.
tensorNameThe name of an output tensor.
The name of the input tensor to alias with, or nullptr if tensorName is not an output tensor or the output does not alias with any input.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub fn createRuntimeConfig(self: Pin<&mut ICudaEngine>) -> *mut IRuntimeConfig
pub fn createRuntimeConfig(self: Pin<&mut ICudaEngine>) -> *mut IRuntimeConfig
Create a runtime config for TensorRT JIT. The caller is responsible for ownership of the returned IRuntimeConfig object.
A IRuntimeConfig object.
See IRuntimeConfig
Sourcepub fn getDeviceMemorySizeV2(self: &ICudaEngine) -> i64
pub fn getDeviceMemorySizeV2(self: &ICudaEngine) -> i64
Return the maximum device memory required by the context over all profiles.
This API is stateful, so its call returns different values based on the following calls:
- setWeightStreamingBudget()
- setWeightStreamingBudgetV2()
See IExecutionContext::setDeviceMemoryV2()
See [setWeightStreamingBudget()]
See [setWeightStreamingBudgetV2()]
Sourcepub fn getDeviceMemorySizeForProfileV2(
self: &ICudaEngine,
profileIndex: i32,
) -> i64
pub fn getDeviceMemorySizeForProfileV2( self: &ICudaEngine, profileIndex: i32, ) -> i64
Return the maximum device memory required by the context for a profile.
This API is stateful, so its call returns different values based on the following calls:
- setWeightStreamingBudget()
- setWeightStreamingBudgetV2()
See IExecutionContext::setDeviceMemoryV2()
See [setWeightStreamingBudget()]
See [setWeightStreamingBudgetV2()]
Sourcepub fn isRefittable(self: &ICudaEngine) -> bool
pub fn isRefittable(self: &ICudaEngine) -> bool
Return true if an engine can be refit.
See [nvinfer1::createInferRefitter()]
Sourcepub unsafe fn getTensorBytesPerComponent(
self: &ICudaEngine,
tensorName: *const c_char,
) -> i32
pub unsafe fn getTensorBytesPerComponent( self: &ICudaEngine, tensorName: *const c_char, ) -> i32
Return the number of bytes per component of an element, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor.
The vector component size is returned if getTensorVectorizedDim(tensorName) != -1.
tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator. The function can only return the result of profile 0, and issues a warning message when there are multiple profiles in the engine, use getTensorBytesPerComponent with profileIndex when there are multiple profiles.
See [getTensorVectorizedDim()]
See [getTensorBytesPerComponent(tensorName], profileIndex)
Sourcepub unsafe fn getTensorComponentsPerElement(
self: &ICudaEngine,
tensorName: *const c_char,
) -> i32
pub unsafe fn getTensorComponentsPerElement( self: &ICudaEngine, tensorName: *const c_char, ) -> i32
Return the number of components included in one element, or -1 if tensor is not vectorized or if the provided name does not map to an input or output tensor.
The number of elements in the vectors is returned if getTensorVectorizedDim(tensorName) != -1.
tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator. The function can only return the result of profile 0, and issues a warning message when there are multiple profiles in the engine, use getTensorComponentsPerElement with profileIndex when there are multiple profiles.
See [getTensorVectorizedDim()]
See [getTensorComponentsPerElement(tensorName], profileIndex)
Sourcepub unsafe fn getTensorFormat(
self: &ICudaEngine,
tensorName: *const c_char,
) -> TensorFormat
pub unsafe fn getTensorFormat( self: &ICudaEngine, tensorName: *const c_char, ) -> TensorFormat
Return the tensor format, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator. This API can only return the tensor format of profile 0, and issues a warning message when there are multiple profiles in the engine, use getTensorFormat with profileIndex when there are multiple profiles.
See [getTensorFormat(tensorName], profileIndex)
Sourcepub unsafe fn getTensorFormatDesc(
self: &ICudaEngine,
tensorName: *const c_char,
) -> *const c_char
pub unsafe fn getTensorFormatDesc( self: &ICudaEngine, tensorName: *const c_char, ) -> *const c_char
Return the human readable description of the tensor format, or empty string if the provided name does not map to an input or output tensor.
The description includes the order, vectorization, data type, and strides. Examples are shown as follows: Example 1: kCHW + FP32 “Row-major linear FP32 format” Example 2: kCHW2 + FP16 “Two-wide channel vectorized row-major FP16 format” Example 3: kHWC8 + FP16 + Line Stride = 32 “Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator. The function can only return the result of profile 0, and issues a warning message when there are multiple profiles in the engine, use getTensorFormatDesc with profileIndex when there are multiple profiles.
Sourcepub unsafe fn getTensorVectorizedDim(
self: &ICudaEngine,
tensorName: *const c_char,
) -> i32
pub unsafe fn getTensorVectorizedDim( self: &ICudaEngine, tensorName: *const c_char, ) -> i32
Return the dimension index that the buffer is vectorized, or -1 if the provided name does not map to an input or output tensor.
Specifically -1 is returned if scalars per vector is 1.
tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator. The function can only return the result of profile 0, and issues a warning message when there are multiple profiles in the engine, use getTensorVectorizedDim with profileIndex when there are multiple profiles.
Sourcepub fn getName(self: &ICudaEngine) -> *const c_char
pub fn getName(self: &ICudaEngine) -> *const c_char
Returns the name of the network associated with the engine.
The name is set during network creation and is retrieved after building or deserialization.
See INetworkDefinition::setName(), INetworkDefinition::getName()
A null-terminated C-style string representing the name of the network.
Sourcepub fn getNbOptimizationProfiles(self: &ICudaEngine) -> i32
pub fn getNbOptimizationProfiles(self: &ICudaEngine) -> i32
Get the number of optimization profiles defined for this engine.
Number of optimization profiles. It is always at least 1.
Sourcepub unsafe fn getProfileShape(
self: &ICudaEngine,
tensorName: *const c_char,
profileIndex: i32,
select: OptProfileSelector,
) -> Dims64
pub unsafe fn getProfileShape( self: &ICudaEngine, tensorName: *const c_char, profileIndex: i32, select: OptProfileSelector, ) -> Dims64
Get the minimum / optimum / maximum dimensions for an input tensor given its name under an optimization profile.
-
tensorNameThe name of an input tensor. -
profileIndexThe profile index, which must be between 0 and getNbOptimizationProfiles()-1. -
selectWhether to query the minimum, optimum, or maximum dimensions for this input tensor.
The minimum / optimum / maximum dimensions for an input tensor in this profile. If the profileIndex is invalid or provided name does not map to an input tensor, return Dims{-1, {}}
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
Sourcepub fn getEngineCapability(self: &ICudaEngine) -> EngineCapability
pub fn getEngineCapability(self: &ICudaEngine) -> EngineCapability
Determine what execution capability this engine has.
If the engine has EngineCapability::kSTANDARD, then all engine functionality is valid. If the engine has EngineCapability::kSAFETY, then only the functionality in safe engine is valid. If the engine has EngineCapability::kDLA_STANDALONE, then only serialize, destroy, and const-accessor functions are valid.
The EngineCapability flag that the engine was built for.
Sourcepub unsafe fn setErrorRecorder(
self: Pin<&mut ICudaEngine>,
recorder: *mut IErrorRecorder,
)
pub unsafe fn setErrorRecorder( self: Pin<&mut ICudaEngine>, recorder: *mut IErrorRecorder, )
Set the ErrorRecorder for this interface
Assigns the ErrorRecorder to this interface. The ErrorRecorder will track all errors during execution. This function will call incRefCount of the registered ErrorRecorder at least once. Setting recorder to nullptr unregisters the recorder with the interface, resulting in a call to decRefCount if a recorder has been registered.
If an error recorder is not set, messages will be sent to the global log stream.
recorderThe error recorder to register with this interface.
See [getErrorRecorder()]
Sourcepub fn getErrorRecorder(self: &ICudaEngine) -> *mut IErrorRecorder
pub fn getErrorRecorder(self: &ICudaEngine) -> *mut IErrorRecorder
Get the ErrorRecorder assigned to this interface.
Retrieves the assigned error recorder object for the given class. A nullptr will be returned if an error handler has not been set.
A pointer to the IErrorRecorder object that has been registered.
See [setErrorRecorder()]
Sourcepub fn hasImplicitBatchDimension(self: &ICudaEngine) -> bool
pub fn hasImplicitBatchDimension(self: &ICudaEngine) -> bool
Query whether the engine was built with an implicit batch dimension.
Always false since TensorRT 10.0 does not support an implicit batch dimension.
See [createNetworkV2]
Deprecated in TensorRT 10.0. Implicit batch is no supported since TensorRT 10.0.
Sourcepub fn getTacticSources(self: &ICudaEngine) -> u32
pub fn getTacticSources(self: &ICudaEngine) -> u32
return the tactic sources required by this engine.
The value returned is equal to zero or more tactics sources set at build time via setTacticSources() in IBuilderConfig. Sources set by the latter but not returned by ICudaEngine::getTacticSources do not reduce overall engine execution time, and can be removed from future builds to reduce build time.
Sourcepub fn getProfilingVerbosity(self: &ICudaEngine) -> ProfilingVerbosity
pub fn getProfilingVerbosity(self: &ICudaEngine) -> ProfilingVerbosity
Return the ProfilingVerbosity the builder config was set to when the engine was built.
the profiling verbosity the builder config was set to when the engine was built.
Sourcepub fn createEngineInspector(self: &ICudaEngine) -> *mut IEngineInspector
pub fn createEngineInspector(self: &ICudaEngine) -> *mut IEngineInspector
Create a new engine inspector which prints the layer information in an engine or an execution context.
See IEngineInspector.
Sourcepub fn getNbIOTensors(self: &ICudaEngine) -> i32
pub fn getNbIOTensors(self: &ICudaEngine) -> i32
Return number of IO tensors.
It is the number of input and output tensors for the network from which the engine was built. The names of the IO tensors can be discovered by calling getIOTensorName(i) for i in 0 to getNbIOTensors()-1.
See [getIOTensorName()]
Sourcepub fn getIOTensorName(self: &ICudaEngine, index: i32) -> *const c_char
pub fn getIOTensorName(self: &ICudaEngine, index: i32) -> *const c_char
Return name of an IO tensor.
indexvalue between 0 and getNbIOTensors()-1
See [getNbIOTensors()]
Sourcepub fn getHardwareCompatibilityLevel(
self: &ICudaEngine,
) -> HardwareCompatibilityLevel
pub fn getHardwareCompatibilityLevel( self: &ICudaEngine, ) -> HardwareCompatibilityLevel
Return the hardware compatibility level of this engine.
hardwareCompatibilityLevel The level of hardware compatibility.
Sourcepub fn getNbAuxStreams(self: &ICudaEngine) -> i32
pub fn getNbAuxStreams(self: &ICudaEngine) -> i32
Return the number of auxiliary streams used by this engine.
This number will be less than or equal to the maximum allowed number of auxiliary streams set by IBuilderConfig::setMaxAuxStreams() API call when the engine was built.
the number of auxiliary streams used by this engine.
See IBuilderConfig::setMaxAuxStreams(), IExecutionContext::setAuxStreams()
Sourcepub fn createSerializationConfig(
self: Pin<&mut ICudaEngine>,
) -> *mut ISerializationConfig
pub fn createSerializationConfig( self: Pin<&mut ICudaEngine>, ) -> *mut ISerializationConfig
Create a serialization configuration object.
Sourcepub fn serializeWithConfig(
self: &ICudaEngine,
config: Pin<&mut ISerializationConfig>,
) -> *mut IHostMemory
pub fn serializeWithConfig( self: &ICudaEngine, config: Pin<&mut ISerializationConfig>, ) -> *mut IHostMemory
Serialize the network to a stream with the provided SerializationConfig.
An IHostMemory object that contains the serialized engine.
The network may be deserialized with IRuntime::deserializeCudaEngine(). Serializing plan file with SerializationFlag::kEXCLUDE_WEIGHTS requires building the engine with kREFIT, kREFIT_IDENTICAL or kREFIT_INDIVIDUAL.
The only applicable scenario for SerializationFlag::kINCLUDE_REFIT is when serializing weight-stripping engines without kEXCLUDE_WEIGHTS. By default, the resulting serialized engine is unrefittable. Setting SerializationFlag::kINCLUDE_REFIT ensures that the serialized engine remains refittable.
Sourcepub fn getStreamableWeightsSize(self: &ICudaEngine) -> i64
pub fn getStreamableWeightsSize(self: &ICudaEngine) -> i64
Get the total size in bytes of all streamable weights.
The set of streamable weights is a subset of all network weights. The total size may exceed free GPU memory.
- Returns The total size in bytes of all streamable weights. Returns 0 if BuilderFlag::kWEIGHT_STREAMING is unset during engine building.
See [setWeightStreamingBudget()]
Sourcepub fn setWeightStreamingBudgetV2(
self: Pin<&mut ICudaEngine>,
gpuMemoryBudget: i64,
) -> bool
pub fn setWeightStreamingBudgetV2( self: Pin<&mut ICudaEngine>, gpuMemoryBudget: i64, ) -> bool
Limit the maximum amount of GPU memory usable for network weights in bytes.
gpuMemoryBudgetThis parameter must be a non-negative value. 0: Only small amounts of scratch memory will required to run the model.
= getStreamableWeightsSize (default): Disables weight streaming. The execution may fail if the network is too large for GPU memory.
By setting a weight limit, users can expect a GPU memory usage reduction on the order of (total bytes for network weights) - gpuMemoryBudget bytes. Maximum memory savings occur when gpuMemoryBudget is set to 0. Each IExecutionContext will require getWeightStreamingScratchMemorySize() bytes of additional device memory if the engine is streaming its weights (budget < getStreamableWeightsSize()).
Streaming larger amounts of memory will likely result in lower performance except in some boundary cases where streaming weights allows the user to run larger batch sizes. The higher throughput offsets the increased latency in these cases. Tuning the value of the memory limit is recommended for best performance.
GPU memory for the weights is allocated in this call and will be deallocated by enabling weight streaming or destroying the ICudaEngine.
BuilderFlag::kWEIGHT_STREAMING must be set during engine building.
The weights streaming budget cannot be modified while there are active IExecutionContexts.
Using the V2 weight streaming APIs with V1 APIs (setWeightStreamingBudget(), getWeightStreamingBudget(), getWeightStreamingMinimumBudget()) leads to undefined behavior.
true if the memory limit is valid and the call was successful, false otherwise.
See BuilderFlag::kWEIGHT_STREAMING
See [getWeightStreamingBudgetV2()]
See [getWeightStreamingScratchMemorySize()]
See [getWeightStreamingAutomaticBudget()]
See [getStreamableWeightsSize()]
Sourcepub fn getWeightStreamingBudgetV2(self: &ICudaEngine) -> i64
pub fn getWeightStreamingBudgetV2(self: &ICudaEngine) -> i64
Returns the current weight streaming device memory budget in bytes.
BuilderFlag::kWEIGHT_STREAMING must be set during engine building.
- Returns The weight streaming budget in bytes. Please see setWeightStreamingBudgetV2() for the possible return values. Returns getStreamableWeightsSize() if weight streaming is disabled.
See BuilderFlag::kWEIGHT_STREAMING
See [setWeightStreamingBudget()]
See [getMinimumWeightStreamingBudget()]
See [getStreamableWeightsSize()]
Sourcepub fn getWeightStreamingAutomaticBudget(self: &ICudaEngine) -> i64
pub fn getWeightStreamingAutomaticBudget(self: &ICudaEngine) -> i64
TensorRT automatically determines a device memory budget for the model to run. The budget is close to the current free memory size, leaving some space for other memory needs in the user’s application. If the budget exceeds the size obtained from getStreamableWeightsSize(), it is capped to that size, effectively disabling weight streaming. Since TensorRT lacks information about the user’s allocations, the remaining memory size might be larger than required, leading to wasted memory, or smaller than required, causing an out-of-memory error. For optimal memory allocation, it is recommended to manually calculate and set the budget.
BuilderFlag::kWEIGHT_STREAMING must be set during engine building.
The return value may change between TensorRT minor versions.
Setting the returned budget with V1 APIs (setWeightStreamingBudget()) will lead to undefined behavior. Please use V2 APIs.
- Returns The weight streaming budget in bytes. Please set with setWeightStreamingBudgetV2().
See BuilderFlag::kWEIGHT_STREAMING
See [setWeightStreamingBudgetV2()]
Sourcepub fn getWeightStreamingScratchMemorySize(self: &ICudaEngine) -> i64
pub fn getWeightStreamingScratchMemorySize(self: &ICudaEngine) -> i64
Returns the size of the scratch memory required by the current weight streaming budget.
Weight streaming requires small amounts of scratch memory on the GPU to stage CPU weights right before execution. This value is typically much smaller than the total streamable weights size. Each IExecutionContext will then allocate this additional memory or the user can provide the additional memory through getDeviceMemorySizeV2() and IExecutionContext::setDeviceMemoryV2().
The return value of this call depends on
- setWeightStreamingBudget()
- setWeightStreamingBudgetV2()
BuilderFlag::kWEIGHT_STREAMING must be set during engine building.
- Returns The weight streaming scratch memory in bytes. Returns 0 if weight streaming is disabled.
See BuilderFlag::kWEIGHT_STREAMING
See [setWeightStreamingBudgetV2()]
See [getStreamableWeightsSize()]
See [getDeviceMemorySizeV2()]
See [getDeviceMemorySizeForProfileV2()]
See IExecutionContext::setDeviceMemoryV2()
Sourcepub unsafe fn isDebugTensor(self: &ICudaEngine, name: *const c_char) -> bool
pub unsafe fn isDebugTensor(self: &ICudaEngine, name: *const c_char) -> bool
Check if a tensor is marked as a debug tensor.
Determine whether the given name corresponds to a debug tensor.
- Returns True if tensor is a debug tensor, false otherwise.
Sourcepub unsafe fn getProfileTensorValuesV2(
self: &ICudaEngine,
tensorName: *const c_char,
profileIndex: i32,
select: OptProfileSelector,
) -> *const i64
pub unsafe fn getProfileTensorValuesV2( self: &ICudaEngine, tensorName: *const c_char, profileIndex: i32, select: OptProfileSelector, ) -> *const i64
Get the minimum / optimum / maximum values (not dimensions) for an input tensor given its name under an optimization profile. These correspond to the values set using IOptimizationProfile::setShapeValuesV2 when the engine was built.
-
tensorNameThe name of an input tensor. -
profileIndexThe profile index, which must be between 0 and getNbOptimizationProfiles()-1. -
selectWhether to query the minimum, optimum, or maximum values for this input tensor.
The minimum / optimum / maximum values for an input tensor in this profile. If the profileIndex is invalid or the provided name does not map to an input tensor, or the tensor is not a shape binding, return nullptr.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
If input shapes are set with setShapeValues, getProfileTensorValuesV2 will return nullptr
Sourcepub fn getEngineStat(self: &ICudaEngine, stat: EngineStat) -> i64
pub fn getEngineStat(self: &ICudaEngine, stat: EngineStat) -> i64
Get engine statistics according to the given enum value.
statThe kind of statistics to query.
If stat is kTOTAL_WEIGHTS_SIZE, the return value is the total weights size in bytes in the engine. If stat is kSTRIPPED_WEIGHTS_SIZE, the return value is the stripped weight size in bytes for engines built with BuilderFlag::kSTRIP_PLAN.
When the BuilderFlag::kWEIGHT_STREAMING flag is enabled, engine weights may not be fully copied to the device. The reported total weight size reflects the sum of all weights utilized by the engine, which does not necessarily correspond to the actual GPU memory allocated.
The kind of statistics specified by EngineStat.
if kSTRIPPED_WEIGHTS_SIZE is passed to query a normal engine, this function will return -1 to indicate invalid enum value.
See EngineStat
See BuilderFlag::kWEIGHT_STREAMING
See [setWeightStreamingBudget()]
See [getStreamableWeightsSize()]