pub struct Engine { /* private fields */ }Expand description
High-level inference engine that wraps model loading, tokenization, and generation.
Engine is Send + Sync safe for the immutable model and config, but the
mutable inference context and sampler are created per-call via internal state.
Implementations§
Source§impl Engine
impl Engine
Sourcepub fn load(config: EngineConfig) -> Result<Self, EngineError>
pub fn load(config: EngineConfig) -> Result<Self, EngineError>
Load a model and create an inference engine.
This opens the model file (GGUF or ONNX), loads the tokenizer and model weights, and selects the appropriate backend (CPU or GPU).
Format is auto-detected by file extension:
.gguf– GGUF format (default).onnx– ONNX format (requiresonnxfeature, companion config.json + tokenizer.json)
Sourcepub fn select_gpu_backend(model: &LlamaModel) -> Arc<dyn Backend>
pub fn select_gpu_backend(model: &LlamaModel) -> Arc<dyn Backend>
Select the best available GPU backend.
Priority: CUDA > Metal > DX12 > Vulkan > CPU fallback.
Sourcepub fn model_config(&self) -> &ModelConfig
pub fn model_config(&self) -> &ModelConfig
Get the model configuration.
Sourcepub fn chat_template(&self) -> &ChatTemplate
pub fn chat_template(&self) -> &ChatTemplate
Get the detected chat template.
Sourcepub fn gguf(&self) -> Option<&GgufFile>
pub fn gguf(&self) -> Option<&GgufFile>
Get the GGUF file metadata (None for ONNX-loaded models).
Sourcepub fn engine_config(&self) -> &EngineConfig
pub fn engine_config(&self) -> &EngineConfig
Get the engine configuration.
Sourcepub fn model(&self) -> &dyn Model
pub fn model(&self) -> &dyn Model
Get the underlying model (for advanced usage like perplexity computation).
Sourcepub fn create_inference_context(&self) -> InferenceContext
pub fn create_inference_context(&self) -> InferenceContext
Generate text from a prompt.
The prompt is automatically wrapped with the detected chat template unless it already contains chat formatting tokens.
Create an InferenceContext respecting the configured KV cache type.
Sourcepub fn generate(
&self,
prompt: &str,
max_tokens: usize,
) -> Result<String, EngineError>
pub fn generate( &self, prompt: &str, max_tokens: usize, ) -> Result<String, EngineError>
Returns the generated text (not including the prompt).
Sourcepub fn generate_streaming(
&self,
prompt: &str,
max_tokens: usize,
) -> GenerationStream<'_> ⓘ
pub fn generate_streaming( &self, prompt: &str, max_tokens: usize, ) -> GenerationStream<'_> ⓘ
Generate text from a prompt, yielding tokens as they are produced.
Each item in the returned iterator is a Result<String, EngineError> containing
the decoded text of one or more tokens.
Auto Trait Implementations§
impl Freeze for Engine
impl !RefUnwindSafe for Engine
impl Send for Engine
impl Sync for Engine
impl Unpin for Engine
impl UnsafeUnpin for Engine
impl !UnwindSafe for Engine
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more