atomr-infer-runtime-tensorrt 0.8.0

TensorRT runner for atomr-infer. Wraps atomr-accel-tensorrt's TrtRuntime / ExecutionContext / ExecutionBindings behind the ModelRunner trait.
Documentation

atomr-infer-runtime-tensorrt

NVIDIA TensorRT runtime — pre-compiled nvinfer plans driven via FFI. Doc §2.2.

Build profiles

Build Result
cargo build -p atomr-infer-runtime-tensorrt (default) Stub — no extern "C" block, no link to libnvinfer.so.
cargo build -p atomr-infer-runtime-tensorrt --features tensorrt Real path: opens the FFI surface, requires libnvinfer.so at link time.

The default-stub-when-feature-off pattern lets CI on machines without TensorRT installed still run cargo check --workspace.

Configuration

use inference_runtime_tensorrt::TensorRtConfig;

let cfg = TensorRtConfig {
    plan_path: "/etc/models/whisper-large-v3.plan".into(),
    max_batch_size: 8,
};

What it's for

TensorRT shines on workloads with stable shapes and pre-compiled optimisation: Whisper, vision pipelines, embedding models, OCR. The engine is opaque (a serialised ICudaEngine) and uses one ExecutionContext per concurrent request; concurrency is bounded by your max-batch-size + max-concurrent budget.

The cudarc driver / context lifecycle is handled by atomr_accel_cuda::device::DeviceActor (when the rollup's cuda feature is on), so the TensorRT runner doesn't manage a CUDA context itself — it lives inside the two-tier supervision tree just like any other local runtime.