atomr-infer-runtime-tensorrt

NVIDIA TensorRT runtime — pre-compiled nvinfer plans driven via FFI. Doc §2.2.

Build profiles

Build	Result
`cargo build -p atomr-infer-runtime-tensorrt` (default)	Stub — no `extern "C"` block, no link to `libnvinfer.so`.
`cargo build -p atomr-infer-runtime-tensorrt --features tensorrt`	Real path: opens the FFI surface, requires `libnvinfer.so` at link time.

The default-stub-when-feature-off pattern lets CI on machines without TensorRT installed still run cargo check --workspace.

Configuration

use inference_runtime_tensorrt::TensorRtConfig;

let cfg = TensorRtConfig {
    plan_path: "/etc/models/whisper-large-v3.plan".into(),
    max_batch_size: 8,
};

What it's for

TensorRT shines on workloads with stable shapes and pre-compiled optimisation: Whisper, vision pipelines, embedding models, OCR. The engine is opaque (a serialised ICudaEngine) and uses one ExecutionContext per concurrent request; concurrency is bounded by your max-batch-size + max-concurrent budget.

The cudarc driver / context lifecycle is handled by atomr_accel_cuda::device::DeviceActor (when the rollup's cuda feature is on), so the TensorRT runner doesn't manage a CUDA context itself — it lives inside the two-tier supervision tree just like any other local runtime.

atomr-infer-runtime-tensorrt 0.8.0

atomr-infer-runtime-tensorrt

Build profiles

Configuration

What it's for