Skip to main content

SchedulerConfig

inference_lab::config::scheduler

Struct SchedulerConfig

pub struct SchedulerConfig {
    pub max_num_batched_tokens: u32,
    pub max_num_seqs: u32,
    pub policy: String,
    pub enable_chunked_prefill: bool,
    pub long_prefill_token_threshold: u32,
    pub max_num_partial_prefills: u32,
    pub block_size: u32,
    pub enable_preemption_free: bool,
}

Fields§

§max_num_batched_tokens: u32

Maximum number of tokens processed in a single iteration

§max_num_seqs: u32

Maximum number of sequences that can run concurrently

§policy: String

Scheduling policy: “fcfs” or “priority”

§enable_chunked_prefill: bool

Enable chunked prefilling

§long_prefill_token_threshold: u32

Maximum tokens to prefill in a single iteration (vLLM’s long_prefill_token_threshold) Defaults to 4% of max_model_len if not specified

§max_num_partial_prefills: u32

Maximum number of sequences that can be partially prefilled concurrently (vLLM default: 1) This limits how many NEW waiting requests can start prefilling per iteration

§block_size: u32

Block size for KV cache (in tokens)

§enable_preemption_free: bool

Enable preemption-free scheduling mode When enabled, uses conservative admission control to guarantee zero preemptions

Implementations§

impl SchedulerConfig

pub fn set_default_prefill_threshold(&mut self, max_model_len: u32)

Set default prefill threshold based on max model length (vLLM uses 4%) Only sets threshold if max_num_partial_prefills > 1 (matching vLLM behavior)

Trait Implementations§

impl Clone for SchedulerConfig

fn clone(&self) -> SchedulerConfig

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for SchedulerConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<'de> Deserialize<'de> for SchedulerConfig

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

Auto Trait Implementations§

impl Freeze for SchedulerConfig

impl RefUnwindSafe for SchedulerConfig

impl Send for SchedulerConfig

impl Sync for SchedulerConfig

impl Unpin for SchedulerConfig

impl UnsafeUnpin for SchedulerConfig

impl UnwindSafe for SchedulerConfig

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,