Skip to main content

InferenceEngine

Struct InferenceEngine 

Source
pub struct InferenceEngine {
    pub backend: Arc<LlamaBackend>,
    pub model: Arc<LlamaModel>,
    pub config: InferenceConfig,
}
Expand description

Shared inference engine that holds the backend + model in Arcs.

This is the resource-heavy object: it loads the DLL and the model weights. Multiple agents can share the same InferenceEngine — each agent just creates its own lightweight LlamaContext.

§Example

use llama_cpp_v3_agent_sdk::inference::{InferenceEngine, InferenceConfig};
use llama_cpp_v3::backend::Backend;
use std::sync::Arc;

let engine = Arc::new(InferenceEngine::load(InferenceConfig {
    backend: Backend::Vulkan,
    model_path: "model.gguf".into(),
    n_gpu_layers: 99,
    ..Default::default()
}).expect("Failed to load model"));

// Share with multiple agents:
let agent1 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
    .engine(engine.clone())
    .build().unwrap();
let agent2 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
    .engine(engine.clone())
    .system_prompt("You are a different agent.")
    .build().unwrap();

Fields§

§backend: Arc<LlamaBackend>§model: Arc<LlamaModel>§config: InferenceConfig

Implementations§

Source§

impl InferenceEngine

Source

pub fn load(config: InferenceConfig) -> Result<Self, AgentError>

Load a model from the given configuration.

This performs the expensive operations (DLL loading, model weight loading) exactly once. The returned engine can be wrapped in Arc and shared.

Source

pub fn create_context( &self, n_ctx_override: Option<u32>, ) -> Result<LlamaContext, AgentError>

Create a new LlamaContext from this engine.

Each agent should have its own context (it holds the KV cache). The n_ctx override lets callers use a different context size than the engine default.

Source

pub fn model(&self) -> &LlamaModel

Access the raw LlamaModel.

Source

pub fn backend(&self) -> &LlamaBackend

Access the raw LlamaBackend.

Source

pub fn lib(&self) -> Arc<LlamaLib>

Get the Arc<LlamaLib> for creating samplers and batches.

Trait Implementations§

Source§

impl Clone for InferenceEngine

Source§

fn clone(&self) -> InferenceEngine

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.