Struct InferenceScheduler

Source

pub struct InferenceScheduler { /* private fields */ }

Expand description

Controls how many agents can perform inference at the same time.

This is a simple counting semaphore: agents call acquire() before running their inference loop and release() when done. If max_concurrent slots are already in use, acquire() blocks until one is freed.

§Why?

Each agent has its own LlamaContext (KV cache) which is independent and thread-safe. But all contexts share the same GPU for compute. Running too many inferences in parallel can:

Exhaust GPU VRAM (multiple KV caches)
Thrash the GPU scheduler (context switches)
Cause OOM errors on smaller GPUs

A scheduler with max_concurrent = 1 serializes all inference (like the worker-thread pattern in vnai::ai), while higher values allow controlled parallelism.

§Example

use llama_cpp_v3_agent_sdk::InferenceScheduler;
use std::sync::Arc;

// Allow at most 2 agents to infer concurrently:
let scheduler = Arc::new(InferenceScheduler::new(2));

// Use with AgentBuilder:
// AgentBuilder::new()
//     .engine(engine.clone())
//     .scheduler(scheduler.clone())
//     .build()?;

Struct InferenceScheduler Copy item path

§Why?

§Example

Implementations§

impl InferenceScheduler

pub fn new(max_concurrent: usize) -> Self

pub fn init_pool( &self, engine: &InferenceEngine, n_ctx: Option<u32>, ) -> Result<(), AgentError>

pub fn acquire(&self) -> InferencePermit<'_>

pub fn try_acquire(&self) -> Option<InferencePermit<'_>>

pub fn active_count(&self) -> usize

pub fn max_concurrent(&self) -> usize

Auto Trait Implementations§

impl !Freeze for InferenceScheduler

impl RefUnwindSafe for InferenceScheduler

impl Send for InferenceScheduler

impl Sync for InferenceScheduler

impl Unpin for InferenceScheduler

impl UnsafeUnpin for InferenceScheduler

impl UnwindSafe for InferenceScheduler

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Same for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct InferenceScheduler

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,