Expand description
Safe wrapper around nvinfer1::IExecutionContext and the
IRuntime deserialiser. This is the inference-time hot path.
The runtime actor (TrtActor, see actor.rs) drives an
ExecutionContext per inference, calling enqueueV3 on a CUDA
stream provided by atomr-accel-cuda::DeviceActor. The actor
never blocks on the GPU; completion is signalled via the same
host-fn-completion mechanism the BLAS/cuDNN actors use.
Structs§
- Enqueue
Request - Standalone enqueue request type — embedded into the
TrtActor’s message enum but exposed here so the message dispatcher inactor.rsand tests can construct it without crossing module boundaries. - Execution
Bindings - Per-call inputs/outputs: tensor name → device pointer.
Pointers are raw
u64s (CUDA device addresses) so the message isSend + Syncwithout lifetimes fromArc<CudaSlice<T>>. - Execution
Context - Wrapper around an owned
IExecutionContext*. Held inside theTrtActorand is notSendto outside callers — the actor owns it for life and serialises access. - Tensor
Shape - Shape of a dynamic tensor input. Exactly mirrors
nvinfer1::Dims(max 8 dims). - TrtRuntime
- Owned wrapper for
IRuntime, used to deserialise plan blobs.
Type Aliases§
- Enqueue
Reply - Reply payload for an enqueue request. Ok = stream submission succeeded (kernel still running on the GPU); the caller awaits real completion via the shared CUDA stream completion sentinel.