pub struct TensorRtRunner { /* private fields */ }Expand description
ModelRunner that drives an immutable TensorRT engine.
Implementations§
Source§impl TensorRtRunner
impl TensorRtRunner
Sourcepub fn new(config: TensorRtConfig) -> InferenceResult<Self>
pub fn new(config: TensorRtConfig) -> InferenceResult<Self>
Read the plan file and prepare the runner. The TensorRT
runtime / engine are not built until the first call to
execute (so a runner can be instantiated on a host without
libnvinfer for testing the config layer).
Trait Implementations§
Source§impl ModelRunner for TensorRtRunner
impl ModelRunner for TensorRtRunner
Source§fn execute<'life0, 'async_trait>(
&'life0 mut self,
_batch: ExecuteBatch,
) -> Pin<Box<dyn Future<Output = InferenceResult<RunHandle>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
fn execute<'life0, 'async_trait>(
&'life0 mut self,
_batch: ExecuteBatch,
) -> Pin<Box<dyn Future<Output = InferenceResult<RunHandle>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
Run an inference. For local runtimes, dispatches kernels; for
remote runtimes, sends an HTTP request. Returns immediately;
completion is observed via the returned
RunHandle stream.Source§fn rebuild_session<'life0, 'async_trait>(
&'life0 mut self,
cause: SessionRebuildCause,
) -> Pin<Box<dyn Future<Output = InferenceResult<()>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
fn rebuild_session<'life0, 'async_trait>(
&'life0 mut self,
cause: SessionRebuildCause,
) -> Pin<Box<dyn Future<Output = InferenceResult<()>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
Local runtimes rebuild after CUDA context poison; remote
runtimes rebuild after auth failure or config change.
fn runtime_kind(&self) -> RuntimeKind
fn transport_kind(&self) -> TransportKind
Source§fn load_weights<'life0, 'life1, 'async_trait>(
&'life0 mut self,
_ctx: Option<&'life1 Arc<dyn Any + Sync + Send>>,
_source: WeightSource,
) -> Pin<Box<dyn Future<Output = Result<(), InferenceError>> + Send + 'async_trait>>where
'life0: 'async_trait,
'life1: 'async_trait,
Self: 'async_trait,
fn load_weights<'life0, 'life1, 'async_trait>(
&'life0 mut self,
_ctx: Option<&'life1 Arc<dyn Any + Sync + Send>>,
_source: WeightSource,
) -> Pin<Box<dyn Future<Output = Result<(), InferenceError>> + Send + 'async_trait>>where
'life0: 'async_trait,
'life1: 'async_trait,
Self: 'async_trait,
Local runtimes load weights to GPU; remote runtimes default to
a no-op.
fn gil_pinned(&self) -> bool
Source§fn rate_limits(&self) -> Option<&RateLimits>
fn rate_limits(&self) -> Option<&RateLimits>
Rate-limit metadata. Returns
None for local runtimes; remote
runtimes return their configured limits so the
RateLimiterActor can be initialized at deploy time.Source§fn estimate_cost_usd(&self, _batch: &ExecuteBatch) -> f64
fn estimate_cost_usd(&self, _batch: &ExecuteBatch) -> f64
Best-effort cost estimate for the given batch (USD). Used by
TieredRouter-style actors and budget enforcement. Local
runtimes default to 0 (compute cost is amortized).Auto Trait Implementations§
impl Freeze for TensorRtRunner
impl RefUnwindSafe for TensorRtRunner
impl Send for TensorRtRunner
impl Sync for TensorRtRunner
impl Unpin for TensorRtRunner
impl UnsafeUnpin for TensorRtRunner
impl UnwindSafe for TensorRtRunner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more