Safe-ish TensorRT runtime-side bindings.
Scope is inference only: load a serialized engine (e.g. produced by
trtexec or the TRT Python API), build an execution context, bind tensor
addresses, and enqueue execution on a CUDA stream. Engine construction
(the builder / network definition API) is C++-only and is not wrapped here.