Skip to main content

Engine

Struct Engine 

Source
pub struct Engine { /* private fields */ }
Expand description

High-level inference engine that wraps model loading, tokenization, and generation.

Engine is Send + Sync safe for the immutable model and config, but the mutable inference context and sampler are created per-call via internal state.

Implementations§

Source§

impl Engine

Source

pub fn load(config: EngineConfig) -> Result<Self, EngineError>

Load a model and create an inference engine.

This opens the model file (GGUF or ONNX), loads the tokenizer and model weights, and selects the appropriate backend (CPU or GPU).

Format is auto-detected by file extension:

  • .gguf – GGUF format (default)
  • .onnx – ONNX format (requires onnx feature, companion config.json + tokenizer.json)
Source

pub fn select_gpu_backend(model: &LlamaModel) -> Arc<dyn Backend>

Select the best available GPU backend.

Priority: CUDA > Metal > DX12 > Vulkan > CPU fallback.

Source

pub fn model_config(&self) -> &ModelConfig

Get the model configuration.

Source

pub fn chat_template(&self) -> &ChatTemplate

Get the detected chat template.

Source

pub fn gguf(&self) -> Option<&GgufFile>

Get the GGUF file metadata (None for ONNX-loaded models).

Source

pub fn tokenizer(&self) -> &Tokenizer

Get the tokenizer.

Source

pub fn engine_config(&self) -> &EngineConfig

Get the engine configuration.

Source

pub fn model(&self) -> &dyn Model

Get the underlying model (for advanced usage like perplexity computation).

Source

pub fn backend(&self) -> &Arc<dyn Backend>

Get the backend.

Source

pub fn add_bos(&self) -> bool

Whether to add a BOS token when encoding prompts.

Source

pub fn create_inference_context(&self) -> InferenceContext

Generate text from a prompt.

The prompt is automatically wrapped with the detected chat template unless it already contains chat formatting tokens.

Create an InferenceContext respecting the configured KV cache type.

Source

pub fn generate( &self, prompt: &str, max_tokens: usize, ) -> Result<String, EngineError>

Returns the generated text (not including the prompt).

Source

pub fn generate_streaming( &self, prompt: &str, max_tokens: usize, ) -> GenerationStream<'_>

Generate text from a prompt, yielding tokens as they are produced.

Each item in the returned iterator is a Result<String, EngineError> containing the decoded text of one or more tokens.

Source

pub fn embed(&self, text: &str) -> Result<Vec<f32>, EngineError>

Extract embeddings from text using the model.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<A, B, T> HttpServerConnExec<A, B> for T
where B: Body,