pub struct InferenceEngine { /* private fields */ }Expand description
Inference engine for accelerated forward passes.
Implementations§
Source§impl InferenceEngine
impl InferenceEngine
Sourcepub fn new(config: InferenceConfig) -> Result<InferenceEngine, InferenceError>
pub fn new(config: InferenceConfig) -> Result<InferenceEngine, InferenceError>
Create a new inference engine.
Sourcepub fn linear(
&self,
input: &Tensor,
weight: &Tensor,
bias: Option<&Tensor>,
) -> Result<Tensor, InferenceError>
pub fn linear( &self, input: &Tensor, weight: &Tensor, bias: Option<&Tensor>, ) -> Result<Tensor, InferenceError>
Linear layer forward pass with optional ternary quantization.
Computes: output = input @ weight.T + bias
Sourcepub fn ternary_linear(
&self,
input: &Tensor,
weight: &Tensor,
bias: Option<&Tensor>,
) -> Result<Tensor, InferenceError>
pub fn ternary_linear( &self, input: &Tensor, weight: &Tensor, bias: Option<&Tensor>, ) -> Result<Tensor, InferenceError>
Ternary linear layer (quantized weights).
Quantizes weights to ternary for memory-efficient inference.
Sourcepub fn batched_forward<F>(
&self,
inputs: &Tensor,
forward_fn: F,
) -> Result<Tensor, InferenceError>
pub fn batched_forward<F>( &self, inputs: &Tensor, forward_fn: F, ) -> Result<Tensor, InferenceError>
Batched inference with automatic chunking.
Splits large batches into smaller chunks to fit in memory.
Sourcepub fn softmax(
&self,
input: &Tensor,
dim: usize,
) -> Result<Tensor, InferenceError>
pub fn softmax( &self, input: &Tensor, dim: usize, ) -> Result<Tensor, InferenceError>
Apply softmax along specified dimension.
Sourcepub fn layer_norm(
&self,
input: &Tensor,
weight: &Tensor,
bias: &Tensor,
eps: f64,
) -> Result<Tensor, InferenceError>
pub fn layer_norm( &self, input: &Tensor, weight: &Tensor, bias: &Tensor, eps: f64, ) -> Result<Tensor, InferenceError>
Apply layer normalization.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for InferenceEngine
impl RefUnwindSafe for InferenceEngine
impl Send for InferenceEngine
impl Sync for InferenceEngine
impl Unpin for InferenceEngine
impl UnwindSafe for InferenceEngine
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more