Skip to main content

InferenceEngine

llama_cpp_v3_agent_sdk::inference

Struct InferenceEngine

pub struct InferenceEngine {
    pub backend: Arc<LlamaBackend>,
    pub model: Arc<LlamaModel>,
    pub config: InferenceConfig,
}

Expand description

Shared inference engine that holds the backend + model in Arcs.

This is the resource-heavy object: it loads the DLL and the model weights. Multiple agents can share the same InferenceEngine — each agent just creates its own lightweight LlamaContext.

§Example

use llama_cpp_v3_agent_sdk::inference::{InferenceEngine, InferenceConfig};
use llama_cpp_v3::backend::Backend;
use std::sync::Arc;

let engine = Arc::new(InferenceEngine::load(InferenceConfig {
    backend: Backend::Vulkan,
    model_path: "model.gguf".into(),
    n_gpu_layers: 99,
    ..Default::default()
}).expect("Failed to load model"));

// Share with multiple agents:
let agent1 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
    .engine(engine.clone())
    .build().unwrap();
let agent2 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
    .engine(engine.clone())
    .system_prompt("You are a different agent.")
    .build().unwrap();

Fields§

§backend: Arc<LlamaBackend>§model: Arc<LlamaModel>§config: InferenceConfig

Implementations§

impl InferenceEngine

pub fn load(config: InferenceConfig) -> Result<Self, AgentError>

Load a model from the given configuration.

This performs the expensive operations (DLL loading, model weight loading) exactly once. The returned engine can be wrapped in Arc and shared.

pub fn create_context( &self, n_ctx_override: Option<u32>, ) -> Result<LlamaContext, AgentError>

Create a new LlamaContext from this engine.

Each agent should have its own context (it holds the KV cache). The n_ctx override lets callers use a different context size than the engine default.

pub fn model(&self) -> &LlamaModel

Access the raw LlamaModel.

pub fn backend(&self) -> &LlamaBackend

Access the raw LlamaBackend.

pub fn lib(&self) -> Arc<LlamaLib>

Get the Arc<LlamaLib> for creating samplers and batches.

Trait Implementations§

impl Clone for InferenceEngine

fn clone(&self) -> InferenceEngine

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Auto Trait Implementations§

impl Freeze for InferenceEngine

impl RefUnwindSafe for InferenceEngine

impl Send for InferenceEngine

impl Sync for InferenceEngine

impl Unpin for InferenceEngine

impl UnsafeUnpin for InferenceEngine

impl UnwindSafe for InferenceEngine

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> Same for T

type Output = T

Should always be Self

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.