pub struct InferenceEngine {
pub backend: Arc<LlamaBackend>,
pub model: Arc<LlamaModel>,
pub config: InferenceConfig,
}Expand description
Shared inference engine that holds the backend + model in Arcs.
This is the resource-heavy object: it loads the DLL and the model weights.
Multiple agents can share the same InferenceEngine — each agent just
creates its own lightweight LlamaContext.
§Example
use llama_cpp_v3_agent_sdk::inference::{InferenceEngine, InferenceConfig};
use llama_cpp_v3::backend::Backend;
use std::sync::Arc;
let engine = Arc::new(InferenceEngine::load(InferenceConfig {
backend: Backend::Vulkan,
model_path: "model.gguf".into(),
n_gpu_layers: 99,
..Default::default()
}).expect("Failed to load model"));
// Share with multiple agents:
let agent1 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
.engine(engine.clone())
.build().unwrap();
let agent2 = llama_cpp_v3_agent_sdk::AgentBuilder::new()
.engine(engine.clone())
.system_prompt("You are a different agent.")
.build().unwrap();Fields§
§backend: Arc<LlamaBackend>§model: Arc<LlamaModel>§config: InferenceConfigImplementations§
Source§impl InferenceEngine
impl InferenceEngine
Sourcepub fn load(config: InferenceConfig) -> Result<Self, AgentError>
pub fn load(config: InferenceConfig) -> Result<Self, AgentError>
Load a model from the given configuration.
This performs the expensive operations (DLL loading, model weight loading)
exactly once. The returned engine can be wrapped in Arc and shared.
Sourcepub fn create_context(
&self,
n_ctx_override: Option<u32>,
) -> Result<LlamaContext, AgentError>
pub fn create_context( &self, n_ctx_override: Option<u32>, ) -> Result<LlamaContext, AgentError>
Create a new LlamaContext from this engine.
Each agent should have its own context (it holds the KV cache).
The n_ctx override lets callers use a different context size
than the engine default.
Sourcepub fn model(&self) -> &LlamaModel
pub fn model(&self) -> &LlamaModel
Access the raw LlamaModel.
Sourcepub fn backend(&self) -> &LlamaBackend
pub fn backend(&self) -> &LlamaBackend
Access the raw LlamaBackend.
Trait Implementations§
Source§impl Clone for InferenceEngine
impl Clone for InferenceEngine
Source§fn clone(&self) -> InferenceEngine
fn clone(&self) -> InferenceEngine
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for InferenceEngine
impl RefUnwindSafe for InferenceEngine
impl Send for InferenceEngine
impl Sync for InferenceEngine
impl Unpin for InferenceEngine
impl UnsafeUnpin for InferenceEngine
impl UnwindSafe for InferenceEngine
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more