pub struct SchedulerConfig {
pub max_num_batched_tokens: u32,
pub max_num_seqs: u32,
pub policy: String,
pub enable_chunked_prefill: bool,
pub long_prefill_token_threshold: u32,
pub max_num_partial_prefills: u32,
pub block_size: u32,
pub enable_preemption_free: bool,
}Fields§
§max_num_batched_tokens: u32Maximum number of tokens processed in a single iteration
max_num_seqs: u32Maximum number of sequences that can run concurrently
policy: StringScheduling policy: “fcfs” or “priority”
enable_chunked_prefill: boolEnable chunked prefilling
long_prefill_token_threshold: u32Maximum tokens to prefill in a single iteration (vLLM’s long_prefill_token_threshold) Defaults to 4% of max_model_len if not specified
max_num_partial_prefills: u32Maximum number of sequences that can be partially prefilled concurrently (vLLM default: 1) This limits how many NEW waiting requests can start prefilling per iteration
block_size: u32Block size for KV cache (in tokens)
enable_preemption_free: boolEnable preemption-free scheduling mode When enabled, uses conservative admission control to guarantee zero preemptions
Implementations§
Source§impl SchedulerConfig
impl SchedulerConfig
Sourcepub fn set_default_prefill_threshold(&mut self, max_model_len: u32)
pub fn set_default_prefill_threshold(&mut self, max_model_len: u32)
Set default prefill threshold based on max model length (vLLM uses 4%) Only sets threshold if max_num_partial_prefills > 1 (matching vLLM behavior)
Trait Implementations§
Source§impl Clone for SchedulerConfig
impl Clone for SchedulerConfig
Source§fn clone(&self) -> SchedulerConfig
fn clone(&self) -> SchedulerConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more