Expand description
§inference-runtime-tensorrt
NVIDIA TensorRT runner — wraps atomr-accel-tensorrt’s
TrtRuntime / ExecutionContext / ExecutionBindings behind
the ModelRunner trait. Doc §2.2, §10.3.
§Feature flags
tensorrt— pull in the upstream Phase 8 crate. Without this feature the runner compiles to a typed-error stub so acargo build --features remote-onlyconsumer never pulls cudarc / libnvinfer / nvonnxparser.tensorrt-link— actually linklibnvinfer.soat build time. Off-by-default: with thetensorrtfeature alone, the runner compiles and unit-tests work without TensorRT installed; runtime calls returnatomr_accel_tensorrt::error::TrtError::NotLinkedmapped toInferenceError::Internal.tensorrt-onnx/tensorrt-int8/tensorrt-fp8/tensorrt-plugin— forwarded straight to the upstream crate so callers can compose ONNX import, INT8 PTQ, FP8 PTQ, and IPluginV3 trampolines with the same dep on this crate.
§What this runner does
- Reads the engine plan bytes from
config.plan_pathat construction time. Missing / unreadable plan ⇒InferenceError::Internal. - Lazily builds a
TrtRuntime, deserialises the plan into a sharedArc<TrtEngine>, and constructs the per-requestExecutionContextinsideModelRunner::execute. - Allocates a CUDA stream on the configured
device_idsoenqueueV3can ride a real timeline. Operators wiring this runner alongsideatomr-accel-cuda::DeviceActorshould swap the lazy stream out viaTensorRtRunner::with_stream(under thetensorrtfeature) so the two actors share one execution timeline.
§What this runner does not do
Tokenisation. The ExecuteBatch shape is a chat-style
Vec<Message> + sampling params; TensorRT engines consume raw
tensors. The runner therefore exposes a TensorRtRunner::enqueue
method (under the tensorrt feature) for callers that have
already produced device pointers via ExecutionBindings, and
ModelRunner::execute returns a typed InferenceError::Internal
pointing the caller at the tokeniser-specific path. A future
revision can layer an LLM-aware adapter on top.
Structs§
- Tensor
RtConfig - Engine-loading configuration.
- Tensor
RtRunner ModelRunnerthat drives an immutable TensorRT engine.
Enums§
- TrtPrecision
- Serializable mirror of
atomr_accel_tensorrt::builder::Precisionso configs can be parsed without pulling the upstream crate.